1
|
Al-Yazeedi T, Adams S, Tandonnet S, Turner A, Kim J, Lee J, Pires-daSilva A. The contribution of an X chromosome QTL to non-Mendelian inheritance and unequal chromosomal segregation in Auanema freiburgense. Genetics 2024; 227:iyae032. [PMID: 38431281 DOI: 10.1093/genetics/iyae032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 02/14/2024] [Accepted: 02/15/2024] [Indexed: 03/05/2024] Open
Abstract
Auanema freiburgense is a nematode with males, females, and selfing hermaphrodites. When XO males mate with XX females, they typically produce a low proportion of XO offspring because they eliminate nullo-X spermatids. This process ensures that most sperm carry an X chromosome, increasing the likelihood of X chromosome transmission compared to random segregation. This occurs because of an unequal distribution of essential cellular organelles during sperm formation, likely dependent on the X chromosome. Some sperm components are selectively segregated into the X chromosome's daughter cell, while others are discarded with the nullo-X daughter cell. Intriguingly, the interbreeding of 2 A. freiburgense strains results in hybrid males capable of producing viable nullo-X sperm. Consequently, when these hybrid males mate with females, they yield a high percentage of male offspring. To uncover the genetic basis of nullo-spermatid elimination and X chromosome drive, we generated a genome assembly for A. freiburgense and genotyped the intercrossed lines. This analysis identified a quantitative trait locus spanning several X chromosome genes linked to the non-Mendelian inheritance patterns observed in A. freiburgense. This finding provides valuable clues to the underlying factors involved in asymmetric organelle partitioning during male meiotic division and thus non-Mendelian transmission of the X chromosome and sex ratios.
Collapse
Affiliation(s)
- Talal Al-Yazeedi
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | - Sally Adams
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | - Sophie Tandonnet
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | - Anisa Turner
- School of Life Sciences, University of Warwick, Coventry CV4 7AL, UK
| | - Jun Kim
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| | - Junho Lee
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| | | |
Collapse
|
2
|
Baudic M, Murata H, Bosada FM, Melo US, Aizawa T, Lindenbaum P, van der Maarel LE, Guedon A, Baron E, Fremy E, Foucal A, Ishikawa T, Ushinohama H, Jurgens SJ, Choi SH, Kyndt F, Le Scouarnec S, Wakker V, Thollet A, Rajalu A, Takaki T, Ohno S, Shimizu W, Horie M, Kimura T, Ellinor PT, Petit F, Dulac Y, Bru P, Boland A, Deleuze JF, Redon R, Le Marec H, Le Tourneau T, Gourraud JB, Yoshida Y, Makita N, Vieyres C, Makiyama T, Mundlos S, Christoffels VM, Probst V, Schott JJ, Barc J. TAD boundary deletion causes PITX2-related cardiac electrical and structural defects. Nat Commun 2024; 15:3380. [PMID: 38643172 PMCID: PMC11032321 DOI: 10.1038/s41467-024-47739-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 04/08/2024] [Indexed: 04/22/2024] Open
Abstract
While 3D chromatin organization in topologically associating domains (TADs) and loops mediating regulatory element-promoter interactions is crucial for tissue-specific gene regulation, the extent of their involvement in human Mendelian disease is largely unknown. Here, we identify 7 families presenting a new cardiac entity associated with a heterozygous deletion of 2 CTCF binding sites on 4q25, inducing TAD fusion and chromatin conformation remodeling. The CTCF binding sites are located in a gene desert at 1 Mb from the Paired-like homeodomain transcription factor 2 gene (PITX2). By introducing the ortholog of the human deletion in the mouse genome, we recapitulate the patient phenotype and characterize an opposite dysregulation of PITX2 expression in the sinoatrial node (ectopic activation) and ventricle (reduction), respectively. Chromatin conformation assay performed in human induced pluripotent stem cell-derived cardiomyocytes harboring the minimal deletion identified in family#1 reveals a conformation remodeling and fusion of TADs. We conclude that TAD remodeling mediated by deletion of CTCF binding sites causes a new autosomal dominant Mendelian cardiac disorder.
Collapse
Affiliation(s)
- Manon Baudic
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Hiroshige Murata
- The Department of Cardiovascular Medicine, Nippon Medical School Hospital, Tokyo, Japan
| | - Fernanda M Bosada
- Department of Medical Biology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centers, University of Amsterdam, 1105 AZ, Amsterdam, The Netherlands
| | - Uirá Souto Melo
- Max Planck Institute for Molecular Genetics, RG Development and Disease, 13353, Berlin, Germany
| | - Takanori Aizawa
- Department of Cardiovascular Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Pierre Lindenbaum
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Lieve E van der Maarel
- Department of Medical Biology, Amsterdam Cardiovascular Sciences, Amsterdam Reproduction and Development, Amsterdam University Medical Centers, University of Amsterdam, 1105 AZ, Amsterdam, The Netherlands
| | - Amaury Guedon
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Estelle Baron
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Enora Fremy
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Adrien Foucal
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Taisuke Ishikawa
- Omics Research Center, National Cerebral and Cardiovascular Center, Suita, Japan
| | - Hiroya Ushinohama
- Department of Cardiology, Fukuoka Children's Hospital, Fukuoka, Japan
| | - Sean J Jurgens
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Experimental Cardiology, Heart Center, Amsterdam Cardiovascular Sciences, Amsterdam UMC Location University of Amsterdam, Amsterdam, The Netherlands
| | - Seung Hoan Choi
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Florence Kyndt
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Solena Le Scouarnec
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Vincent Wakker
- Department of Medical Biology, Amsterdam Cardiovascular Sciences, Amsterdam University Medical Centers, University of Amsterdam, 1105 AZ, Amsterdam, The Netherlands
| | - Aurélie Thollet
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Annabelle Rajalu
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Tadashi Takaki
- Department of Cell Growth and Differentiation, Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan
- Takeda-CiRA Joint Program for iPS Cell Applications, Fujisawa, Japan
- Department of Pancreatic Islet Cell Transplantation, National Center for Global Health and Medicine, Tokyo, Japan
| | - Seiko Ohno
- Department of Bioscience and Genetics, National Cerebral and Cardiovascular Center Research Institute, Suita, Japan
| | - Wataru Shimizu
- The Department of Cardiovascular Medicine, Nippon Medical School Hospital, Tokyo, Japan
| | - Minoru Horie
- Department of Cardiovascular Medicine, Shiga University of Medical Science, Ohtsu, Japan
| | - Takeshi Kimura
- Department of Cardiovascular Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Patrick T Ellinor
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, USA
| | - Florence Petit
- Service de Génétique Clinique, CHU Lille, Hôpital Jeanne de Flandre, F-59000, Lille, France
- University of Lille, EA 7364-RADEME, F-59000, Lille, France
| | - Yves Dulac
- Unité de Cardiologie Pédiatrique, Hôpital des Enfants, F-31000, Toulouse, France
| | - Paul Bru
- Service de Cardiologie, GH La Rochelle, F-17019, La Rochelle, France
| | - Anne Boland
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Jean-François Deleuze
- Université Paris-Saclay, CEA, Centre National de Recherche en Génomique Humaine (CNRGH), 91057, Evry, France
| | - Richard Redon
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Hervé Le Marec
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Thierry Le Tourneau
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
| | - Jean-Baptiste Gourraud
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
- European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart: ERN GUARD-Heart, Amsterdam, The Netherlands
| | - Yoshinori Yoshida
- Department of Cell Growth and Differentiation, Center for iPS Cell Research and Application, Kyoto University, Kyoto, Japan
| | - Naomasa Makita
- Omics Research Center, National Cerebral and Cardiovascular Center, Suita, Japan
- Department of Cardiology, Sapporo Teishinkai Hospital, Sapporo, Japan
| | - Claude Vieyres
- Cabinet Cardiologique, Clinique St. Joseph, F-16000, Angoulême, France
| | - Takeru Makiyama
- Department of Cardiovascular Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan
- Department of Community Medicine Supporting System, Kyoto University Graduate School of Medicine, Kyoto, Japan
| | - Stephan Mundlos
- Max Planck Institute for Molecular Genetics, RG Development and Disease, 13353, Berlin, Germany
| | - Vincent M Christoffels
- Department of Medical Biology, Amsterdam Cardiovascular Sciences, Amsterdam Reproduction and Development, Amsterdam University Medical Centers, University of Amsterdam, 1105 AZ, Amsterdam, The Netherlands
| | - Vincent Probst
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France
- European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart: ERN GUARD-Heart, Amsterdam, The Netherlands
| | - Jean-Jacques Schott
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France.
- European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart: ERN GUARD-Heart, Amsterdam, The Netherlands.
| | - Julien Barc
- Nantes Université, CHU Nantes, CNRS, INSERM, l'institut du Thorax, F-44000, Nantes, France.
- European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart: ERN GUARD-Heart, Amsterdam, The Netherlands.
| |
Collapse
|
3
|
Keskus A, Bryant A, Ahmad T, Yoo B, Aganezov S, Goretsky A, Donmez A, Lansdon LA, Rodriguez I, Park J, Liu Y, Cui X, Gardner J, McNulty B, Sacco S, Shetty J, Zhao Y, Tran B, Narzisi G, Helland A, Cook DE, Chang PC, Kolesnikov A, Carroll A, Molloy EK, Pushel I, Guest E, Pastinen T, Shafin K, Miga KH, Malikic S, Day CP, Robine N, Sahinalp C, Dean M, Farooqi MS, Paten B, Kolmogorov M. Severus: accurate detection and characterization of somatic structural variation in tumor genomes using long reads. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304756. [PMID: 38585974 PMCID: PMC10996739 DOI: 10.1101/2024.03.22.24304756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Most current studies rely on short-read sequencing to detect somatic structural variation (SV) in cancer genomes. Long-read sequencing offers the advantage of better mappability and long-range phasing, which results in substantial improvements in germline SV detection. However, current long-read SV detection methods do not generalize well to the analysis of somatic SVs in tumor genomes with complex rearrangements, heterogeneity, and aneuploidy. Here, we present Severus: a method for the accurate detection of different types of somatic SVs using a phased breakpoint graph approach. To benchmark various short- and long-read SV detection methods, we sequenced five tumor/normal cell line pairs with Illumina, Nanopore, and PacBio sequencing platforms; on this benchmark Severus showed the highest F1 scores (harmonic mean of the precision and recall) as compared to long-read and short-read methods. We then applied Severus to three clinical cases of pediatric cancer, demonstrating concordance with known genetic findings as well as revealing clinically relevant cryptic rearrangements missed by standard genomic panels.
Collapse
Affiliation(s)
- Ayse Keskus
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Asher Bryant
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Tanveer Ahmad
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Byunggil Yoo
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Anton Goretsky
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Ataberk Donmez
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Lisa A. Lansdon
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Isabel Rodriguez
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Jimin Park
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Yuelin Liu
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Xiwen Cui
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | | | - Samuel Sacco
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | | | | | | | | | | | - Erin K. Molloy
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Irina Pushel
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Erin Guest
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Tomi Pastinen
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | - Kishwar Shafin
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Salem Malikic
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Chi-Ping Day
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Cenk Sahinalp
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Michael Dean
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Rockville, MD, USA
| | - Midhat S. Farooqi
- Children’s Mercy Hospital, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| | | | - Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, USA
| |
Collapse
|
4
|
Zhang Z, Gomes Viana JP, Zhang B, Walden KKO, Müller Paul H, Moose SP, Morris GP, Daum C, Barry KW, Shakoor N, Hudson ME. Major impacts of widespread structural variation on sorghum. Genome Res 2024; 34:286-299. [PMID: 38479835 PMCID: PMC10984582 DOI: 10.1101/gr.278396.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 01/22/2024] [Indexed: 03/22/2024]
Abstract
Genetic diversity is critical to crop breeding and improvement, and dissection of the genomic variation underlying agronomic traits can both assist breeding and give insight into basic biological mechanisms. Although recent genome analyses in plants reveal many structural variants (SVs), most current studies of crop genetic variation are dominated by single-nucleotide polymorphisms (SNPs). The extent of the impact of SVs on global trait variation, as well as their utility in genome-wide selection, is not yet understood. In this study, we built an SV data set based on whole-genome resequencing of diverse sorghum lines (n = 363), validated the correlation of photoperiod sensitivity and variety type, and identified SV hotspots underlying the divergent evolution of cellulosic and sweet sorghum. In addition, we showed the complementary contribution of SVs for heritability of traits related to sorghum adaptation. Importantly, inclusion of SV polymorphisms in association studies revealed genotype-phenotype associations not observed with SNPs alone. Three-way genome-wide association studies (GWAS) based on whole-genome SNP, SV, and integrated SNP + SV data sets showed substantial associations between SVs and sorghum traits. The addition of SVs to GWAS substantially increased heritability estimates for some traits, indicating their important contribution to functional allelic variation at the genome level. Our discovery of the widespread impacts of SVs on heritable gene expression variation could render a plausible mechanism for their disproportionate impact on phenotypic variation. This study expands our knowledge of SVs and emphasizes the extensive impacts of SVs on sorghum.
Collapse
Affiliation(s)
- Zhihai Zhang
- DOE Center for Advanced Bioenergy and Bioproducts Innovation (CABBI), University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Joao Paulo Gomes Viana
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Bosen Zhang
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Kimberly K O Walden
- High Performance Computing in Biology, Carver Biotechnology Center, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Hans Müller Paul
- DOE Center for Advanced Bioenergy and Bioproducts Innovation (CABBI), University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Stephen P Moose
- DOE Center for Advanced Bioenergy and Bioproducts Innovation (CABBI), University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Geoffrey P Morris
- Department of Soil and Crop Science, Colorado State University, Fort Collins, Colorado 80523, USA
| | - Chris Daum
- United States Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Kerrie W Barry
- United States Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Nadia Shakoor
- Donald Danforth Plant Science Center, St. Louis, Missouri 63132, USA
| | - Matthew E Hudson
- DOE Center for Advanced Bioenergy and Bioproducts Innovation (CABBI), University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA;
- Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| |
Collapse
|
5
|
Tanudisastro HA, Deveson IW, Dashnow H, MacArthur DG. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet 2024:10.1038/s41576-024-00692-3. [PMID: 38366034 DOI: 10.1038/s41576-024-00692-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2023] [Indexed: 02/18/2024]
Abstract
Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1-6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the 'missing heritability' of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.
Collapse
Affiliation(s)
- Hope A Tanudisastro
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, New South Wales, Australia
| | - Ira W Deveson
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia
- Genomics and Inherited Disease Program, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
| | - Harriet Dashnow
- Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
- Faculty of Medicine and Health, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
6
|
Groza C, Schwendinger-Schreck C, Cheung WA, Farrow EG, Thiffault I, Lake J, Rizzo WB, Evrony G, Curran T, Bourque G, Pastinen T. Pangenome graphs improve the analysis of structural variants in rare genetic diseases. Nat Commun 2024; 15:657. [PMID: 38253606 PMCID: PMC10803329 DOI: 10.1038/s41467-024-44980-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 01/10/2024] [Indexed: 01/24/2024] Open
Abstract
Rare DNA alterations that cause heritable diseases are only partially resolvable by clinical next-generation sequencing due to the difficulty of detecting structural variation (SV) in all genomic contexts. Long-read, high fidelity genome sequencing (HiFi-GS) detects SVs with increased sensitivity and enables assembling personal and graph genomes. We leverage standard reference genomes, public assemblies (n = 94) and a large collection of HiFi-GS data from a rare disease program (Genomic Answers for Kids, GA4K, n = 574 assemblies) to build a graph genome representing a unified SV callset in GA4K, identify common variation and prioritize SVs that are more likely to cause genetic disease (MAF < 0.01). Using graphs, we obtain a higher level of reproducibility than the standard reference approach. We observe over 200,000 SV alleles unique to GA4K, including nearly 1000 rare variants that impact coding sequence. With improved specificity for rare SVs, we isolate 30 candidate SVs in phenotypically prioritized genes, including known disease SVs. We isolate a novel diagnostic SV in KMT2E, demonstrating use of personal assemblies coupled with pangenome graphs for rare disease genomics. The community may interrogate our pangenome with additional assemblies to discover new SVs within the allele frequency spectrum relevant to genetic diseases.
Collapse
Affiliation(s)
- Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, QC, Canada
| | | | - Warren A Cheung
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | - Emily G Farrow
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | - Isabelle Thiffault
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA
| | | | - William B Rizzo
- Child Health Research Institute, Department of Pediatrics, Nebraska Medical Center, Omaha, NE, USA
| | - Gilad Evrony
- Center for Human Genetics and Genomics, Department of Pediatrics, Neuroscience & Physiology, New York University Grossman School of Medicine, New York, NY, USA
| | - Tom Curran
- Children's Mercy Research Institute, Kansas City, MO, USA
| | - Guillaume Bourque
- Canadian Center for Computational Genomics, McGill University, Montréal, QC, Canada.
- Department of Human Genetics, McGill University, Montréal, QC, Canada.
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan.
- Victor Phillip Dahdaleh Institute of Genomic Medicine at McGill University, Montréal, QC, Canada.
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy Hospital and Research Institute, KC, MO, USA.
| |
Collapse
|
7
|
Garrison MA, Jang Y, Bae T, Cherskov A, Emery SB, Fasching L, Jones A, Moldovan JB, Molitor C, Pochareddy S, Peters MA, Shin JH, Wang Y, Yang X, Akbarian S, Chess A, Gage FH, Gleeson JG, Kidd JM, McConnell M, Mills RE, Moran JV, Park PJ, Sestan N, Urban AE, Vaccarino FM, Walsh CA, Weinberger DR, Wheelan SJ, Abyzov A. Genomic data resources of the Brain Somatic Mosaicism Network for neuropsychiatric diseases. Sci Data 2023; 10:813. [PMID: 37985666 PMCID: PMC10662356 DOI: 10.1038/s41597-023-02645-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 10/16/2023] [Indexed: 11/22/2023] Open
Abstract
Somatic mosaicism is defined as an occurrence of two or more populations of cells having genomic sequences differing at given loci in an individual who is derived from a single zygote. It is a characteristic of multicellular organisms that plays a crucial role in normal development and disease. To study the nature and extent of somatic mosaicism in autism spectrum disorder, bipolar disorder, focal cortical dysplasia, schizophrenia, and Tourette syndrome, a multi-institutional consortium called the Brain Somatic Mosaicism Network (BSMN) was formed through the National Institute of Mental Health (NIMH). In addition to genomic data of affected and neurotypical brains, the BSMN also developed and validated a best practices somatic single nucleotide variant calling workflow through the analysis of reference brain tissue. These resources, which include >400 terabytes of data from 1087 subjects, are now available to the research community via the NIMH Data Archive (NDA) and are described here.
Collapse
Affiliation(s)
- McKinzie A Garrison
- Program in Biochemistry, Molecular and Cellular Biology, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
| | - Yeongjun Jang
- Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - Taejeong Bae
- Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA
| | - Adriana Cherskov
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT, 06520, USA
| | - Sarah B Emery
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Liana Fasching
- Child Study Center, Yale University, New Haven, CT, 06520, USA
| | - Attila Jones
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Cell, Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - John B Moldovan
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Cindy Molitor
- Sage Bionetworks, 2901 Third Ave., Suite 330, Seattle, WA, 98121, USA
| | - Sirisha Pochareddy
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT, 06520, USA
| | - Mette A Peters
- Sage Bionetworks, 2901 Third Ave., Suite 330, Seattle, WA, 98121, USA
| | - Joo Heon Shin
- Lieber Institute for Brain Development, Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Yifan Wang
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
| | - Xiaoxu Yang
- Rady Children's Institute for Genomic Medicine, 7910 Frost St., Suite #300, San Diego, CA, 92123, USA
- Department of Neurosciences, University of California San Diego, La Jolla, California, USA
| | - Schahram Akbarian
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technologies, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Andrew Chess
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Cell, Developmental and Regenerative Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute for Data Science and Genomic Technologies, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Fred H Gage
- Laboratory of Genetics LOG-G, Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Joseph G Gleeson
- Rady Children's Institute for Genomic Medicine, 7910 Frost St., Suite #300, San Diego, CA, 92123, USA
- Department of Neurosciences, University of California San Diego, La Jolla, California, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, 48109, USA
| | | | - Ryan E Mills
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, Michigan, 48109, USA
| | - John V Moran
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109, USA
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, Michigan, 48109, USA
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Nenad Sestan
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT, 06520, USA
| | - Alexander E Urban
- Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California, 94305, USA
- Department of Genetics, Stanford University School of Medicine, Stanford, California, 94305, USA
| | - Flora M Vaccarino
- Department of Neuroscience, Yale University School of Medicine, New Haven, CT, 06520, USA
- Child Study Center, Yale University, New Haven, CT, 06520, USA
| | - Christopher A Walsh
- Division of Genetics and Genomics and Howard Hughes Medical Institute, Boston Children's Hospital, Departments of Pediatrics and Neurology, Harvard Medical School, Boston, MA, USA
| | - Daniel R Weinberger
- Lieber Institute for Brain Development, Baltimore, MD, 21205, USA
- Department of Neurology, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- McKusick Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Sarah J Wheelan
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- National Human Genome Research Institute, National Institutes of Health, 6700B Rockledge Dr, Bethesda, MD, 20892, USA
| | - Alexej Abyzov
- Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, 55905, USA.
| |
Collapse
|
8
|
Majidian S, Agustinho DP, Chin CS, Sedlazeck FJ, Mahmoud M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol 2023; 24:221. [PMID: 37798733 PMCID: PMC10552390 DOI: 10.1186/s13059-023-03061-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | | | | | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Medhat Mahmoud
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
9
|
Kolmogorov M, Billingsley KJ, Mastoras M, Meredith M, Monlong J, Lorig-Roach R, Asri M, Alvarez Jerez P, Malik L, Dewan R, Reed X, Genner RM, Daida K, Behera S, Shafin K, Pesout T, Prabakaran J, Carnevali P, Yang J, Rhie A, Scholz SW, Traynor BJ, Miga KH, Jain M, Timp W, Phillippy AM, Chaisson M, Sedlazeck FJ, Blauwendraat C, Paten B. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. Nat Methods 2023; 20:1483-1492. [PMID: 37710018 DOI: 10.1038/s41592-023-01993-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 08/04/2023] [Indexed: 09/16/2023]
Abstract
Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer's and Related Dementias. Using a single PromethION flow cell, we can detect single nucleotide polymorphisms with F1-score comparable to Illumina short-read sequencing. Small indel calling remains difficult within homopolymers and tandem repeats, but achieves good concordance to Illumina indel calls elsewhere. Further, we can discover structural variants with F1-score on par with state-of-the-art de novo assembly methods. Our protocol phases small and structural variants at megabase scales and produces highly accurate, haplotype-specific methylation calls.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Kimberley J Billingsley
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
| | - Mira Mastoras
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Ramita Dewan
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Xylena Reed
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Rylee M Genner
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Kensuke Daida
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jeshuwin Prabakaran
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, USA
| | | | - Jianzhi Yang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Bryan J Traynor
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Miten Jain
- Department of Bioengineering, Northeastern University, Boston, MA, USA
- Department of Physics, Northeastern University, Boston, MA, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer's and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA.
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA.
| | | |
Collapse
|
10
|
Liu Y, Shen X, Gong Y, Liu Y, Song B, Zeng X. Sequence Alignment/Map format: a comprehensive review of approaches and applications. Brief Bioinform 2023; 24:bbad320. [PMID: 37668049 DOI: 10.1093/bib/bbad320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 08/16/2023] [Accepted: 08/18/2023] [Indexed: 09/06/2023] Open
Abstract
The Sequence Alignment/Map (SAM) format file is the text file used to record alignment information. Alignment is the core of sequencing analysis, and downstream tasks accept mapping results for further processing. Given the rapid development of the sequencing industry today, a comprehensive understanding of the SAM format and related tools is necessary to meet the challenges of data processing and analysis. This paper is devoted to retrieving knowledge in the broad field of SAM. First, the format of SAM is introduced to understand the overall process of the sequencing analysis. Then, existing work is systematically classified in accordance with generation, compression and application, and the involved SAM tools are specifically mined. Lastly, a summary and some thoughts on future directions are provided.
Collapse
Affiliation(s)
- Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, 410086, Changsha, China
| | - Xiangzhen Shen
- College of Computer Science and Electronic Engineering, Hunan University, 410086, Changsha, China
| | - Yongshun Gong
- School of Software, Shandong University, 250100, Jinan, China
| | - Yiping Liu
- College of Computer Science and Electronic Engineering, Hunan University, 410086, Changsha, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, 410086, Changsha, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, 410086, Changsha, China
| |
Collapse
|
11
|
Lange LM, Avenali M, Ellis M, Illarionova A, Keller Sarmiento IJ, Tan AH, Madoev H, Galandra C, Junker J, Roopnarain K, Solle J, Wegel C, Fang ZH, Heutink P, Kumar KR, Lim SY, Valente EM, Nalls M, Blauwendraat C, Singleton A, Mencacci N, Lohmann K, Klein C. Elucidating causative gene variants in hereditary Parkinson's disease in the Global Parkinson's Genetics Program (GP2). NPJ Parkinsons Dis 2023; 9:100. [PMID: 37369645 PMCID: PMC10300084 DOI: 10.1038/s41531-023-00526-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Accepted: 05/15/2023] [Indexed: 06/29/2023] Open
Abstract
The Monogenic Network of the Global Parkinson's Genetics Program (GP2) aims to create an efficient infrastructure to accelerate the identification of novel genetic causes of Parkinson's disease (PD) and to improve our understanding of already identified genetic causes, such as reduced penetrance and variable clinical expressivity of known disease-causing variants. We aim to perform short- and long-read whole-genome sequencing for up to 10,000 patients with parkinsonism. Important features of this project are global involvement and focusing on historically underrepresented populations.
Collapse
Affiliation(s)
- Lara M Lange
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Micol Avenali
- IRCCS Mondino Foundation, Pavia, Italy
- Department of Brain and Behavioral Sciences, University of Pavia, Pavia, Italy
| | - Melina Ellis
- Northcott Neuroscience Laboratory, ANZAC Research Institute, Concord, NSW, Australia
- Faculty of Medicine and Health, University of Sydney, Sydney, NSW, Australia
| | | | | | - Ai-Huey Tan
- Division of Neurology, Department of Medicine, and the Mah Pooi Soo and Tan Chin Nam Centre for Parkinson's and Related Disorders, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Harutyun Madoev
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Caterina Galandra
- IRCCS Mondino Foundation, Pavia, Italy
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
| | - Johanna Junker
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | | | - Justin Solle
- Department of Clinical Research, Michael J. Fox Foundation for Parkinson's Research, New York City, NY, USA
| | - Claire Wegel
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Zih-Hua Fang
- German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | - Peter Heutink
- German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | - Kishore R Kumar
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
- Molecular Medicine Laboratory and Neurology Department, Concord Repatriation General Hospital, The University of Sydney, Concord, NSW, Australia
| | - Shen-Yang Lim
- Division of Neurology, Department of Medicine, and the Mah Pooi Soo and Tan Chin Nam Centre for Parkinson's and Related Disorders, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
| | - Enza Maria Valente
- IRCCS Mondino Foundation, Pavia, Italy
- Department of Molecular Medicine, University of Pavia, Pavia, Italy
| | - Mike Nalls
- Data Tecnica International, Washington, DC, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Integrative Genomics Unit, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Andrew Singleton
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Niccolo Mencacci
- Department of Neurology, Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Katja Lohmann
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany
| | - Christine Klein
- Institute of Neurogenetics, University of Lübeck, Lübeck, Germany.
| |
Collapse
|
12
|
Rajaby R, Liu DX, Au CH, Cheung YT, Lau AYT, Yang QY, Sung WK. INSurVeyor: improving insertion calling from short read sequencing data. Nat Commun 2023; 14:3243. [PMID: 37277343 DOI: 10.1038/s41467-023-38870-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 05/18/2023] [Indexed: 06/07/2023] Open
Abstract
Insertions are one of the major types of structural variations and are defined as the addition of 50 nucleotides or more into a DNA sequence. Several methods exist to detect insertions from next-generation sequencing short read data, but they generally have low sensitivity. Our contribution is two-fold. First, we introduce INSurVeyor, a fast, sensitive and precise method that detects insertions from next-generation sequencing paired-end data. Using publicly available benchmark datasets (both human and non-human), we show that INSurVeyor is not only more sensitive than any individual caller we tested, but also more sensitive than all of them combined. Furthermore, for most types of insertions, INSurVeyor is almost as sensitive as long reads callers. Second, we provide state-of-the-art catalogues of insertions for 1047 Arabidopsis Thaliana genomes from the 1001 Genomes Project and 3202 human genomes from the 1000 Genomes Project, both generated with INSurVeyor. We show that they are more complete and precise than existing resources, and important insertions are missed by existing methods.
Collapse
Affiliation(s)
- Ramesh Rajaby
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
- A*STAR Genome Institute of Singapore, 60 Biopolis Street, Singapore, 138672, Singapore
| | - Dong-Xu Liu
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Chun Hang Au
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Yuen-Ting Cheung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Amy Yuet Ting Lau
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China
| | - Qing-Yong Yang
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wing-Kin Sung
- Hong Kong Genome Institute, Hong Kong Science Park, Shatin, Hong Kong, China.
- A*STAR Genome Institute of Singapore, 60 Biopolis Street, Singapore, 138672, Singapore.
- National Key Laboratory of Crop Genetic Improvement, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China.
- Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore, 117417, Singapore.
| |
Collapse
|
13
|
Paul TC, Johnson KA, Hagen GM. Super-resolution imaging of neuronal structure with structured illumination microscopy. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.26.542523. [PMID: 37292949 PMCID: PMC10245995 DOI: 10.1101/2023.05.26.542523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Super-resolution structured illumination microscopy (SR-SIM) is a method in optical fluorescence microscopy which is suitable for imaging a wide variety of cells and tissues in biological and biomedical research. Typically, SIM methods use high spatial frequency illumination patterns generated by laser interference. This approach provides high resolution but is limited to thin samples such as cultured cells. Using a different strategy for processing the raw data and coarser illumination patterns, we imaged through a 150 µm thick coronal section of a mouse brain expressing GFP in a subset of neurons. The resolution reached 144 nm, an improvement of 1.7 fold beyond conventional widefield imaging.
Collapse
Affiliation(s)
- Tristan C. Paul
- UCCS BioFrontiers Center, University of Colorado Colorado Springs, 1420 Austin Bluffs Parkway, Colorado Springs, Colorado, 80918
| | - Karl A. Johnson
- UCCS BioFrontiers Center, University of Colorado Colorado Springs, 1420 Austin Bluffs Parkway, Colorado Springs, Colorado, 80918
| | - Guy M. Hagen
- UCCS BioFrontiers Center, University of Colorado Colorado Springs, 1420 Austin Bluffs Parkway, Colorado Springs, Colorado, 80918
| |
Collapse
|
14
|
Kolmogorov M, Billingsley KJ, Mastoras M, Meredith M, Monlong J, Lorig-Roach R, Asri M, Jerez PA, Malik L, Dewan R, Reed X, Genner RM, Daida K, Behera S, Shafin K, Pesout T, Prabakaran J, Carnevali P, Yang J, Rhie A, Scholz SW, Traynor BJ, Miga KH, Jain M, Timp W, Phillippy AM, Chaisson M, Sedlazeck FJ, Blauwendraat C, Paten B. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.12.523790. [PMID: 36711673 PMCID: PMC9882142 DOI: 10.1101/2023.01.12.523790] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Long-read sequencing technologies substantially overcome the limitations of short-reads but to date have not been considered as feasible replacement at scale due to a combination of being too expensive, not scalable enough, or too error-prone. Here, we develop an efficient and scalable wet lab and computational protocol for Oxford Nanopore Technologies (ONT) long-read sequencing that seeks to provide a genuine alternative to short-reads for large-scale genomics projects. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the NIH Center for Alzheimer's and Related Dementias (CARD). Using a single PromethION flow cell, we can detect SNPs with F1-score better than Illumina short-read sequencing. Small indel calling remains to be difficult inside homopolymers and tandem repeats, but is comparable to Illumina calls elsewhere. Further, we can discover structural variants with F1-score comparable to state-of the-art methods involving Pacific Biosciences HiFi sequencing and trio information (but at a lower cost and greater throughput). Using ONT based phasing, we can then combine and phase small and structural variants at megabase scales. Our protocol also produces highly accurate, haplotype-specific methylation calls. Overall, this makes large-scale long-read sequencing projects feasible; the protocol is currently being used to sequence thousands of brain-based genomes as a part of the NIH CARD initiative. We provide the protocol and software as open-source integrated pipelines for generating phased variant calls and assemblies.
Collapse
Affiliation(s)
- Mikhail Kolmogorov
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, USA
| | - Kimberley J. Billingsley
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Mira Mastoras
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Pilar Alvarez Jerez
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Laksh Malik
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Ramita Dewan
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Xylena Reed
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Rylee M. Genner
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
| | - Kensuke Daida
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Kishwar Shafin
- Google LLC, 1600 Amphitheatre Pkwy, Mountain View, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Jeshuwin Prabakaran
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, USA
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, USA
| | | | | | - Jianzhi Yang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sonja W. Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
| | - Bryan J. Traynor
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mark Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, Texas, USA
| | - Cornelis Blauwendraat
- Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | | |
Collapse
|
15
|
Jun G, English AC, Metcalf GA, Yang J, Chaisson MJP, Pankratz N, Menon VK, Salerno WJ, Krasheninina O, Smith AV, Lane JA, Blackwell T, Kang HM, Salvi S, Meng Q, Shen H, Pasham D, Bhamidipati S, Kottapalli K, Arnett DK, Ashley-Koch A, Auer PL, Beutel KM, Bis JC, Blangero J, Bowden DW, Brody JA, Cade BE, Chen YDI, Cho MH, Curran JE, Fornage M, Freedman BI, Fingerlin T, Gelb BD, Hou L, Hung YJ, Kane JP, Kaplan R, Kim W, Loos RJ, Marcus GM, Mathias RA, McGarvey ST, Montgomery C, Naseri T, Nouraie SM, Preuss MH, Palmer ND, Peyser PA, Raffield LM, Ratan A, Redline S, Reupena S, Rotter JI, Rich SS, Rienstra M, Ruczinski I, Sankaran VG, Schwartz DA, Seidman CE, Seidman JG, Silverman EK, Smith JA, Stilp A, Taylor KD, Telen MJ, Weiss ST, Williams LK, Wu B, Yanek LR, Zhang Y, Lasky-Su J, Gingras MC, Dutcher SK, Eichler EE, Gabriel S, Germer S, Kim R, Viaud-Martinez KA, Nickerson DA, Luo J, Reiner A, Gibbs RA, Boerwinkle E, Abecasis G, Sedlazeck FJ. Structural variation across 138,134 samples in the TOPMed consortium. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.25.525428. [PMID: 36747810 PMCID: PMC9900832 DOI: 10.1101/2023.01.25.525428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Ever larger Structural Variant (SV) catalogs highlighting the diversity within and between populations help researchers better understand the links between SVs and disease. The identification of SVs from DNA sequence data is non-trivial and requires a balance between comprehensiveness and precision. Here we present a catalog of 355,667 SVs (59.34% novel) across autosomes and the X chromosome (50bp+) from 138,134 individuals in the diverse TOPMed consortium. We describe our methodologies for SV inference resulting in high variant quality and >90% allele concordance compared to long-read de-novo assemblies of well-characterized control samples. We demonstrate utility through significant associations between SVs and important various cardio-metabolic and hemotologic traits. We have identified 690 SV hotspots and deserts and those that potentially impact the regulation of medically relevant genes. This catalog characterizes SVs across multiple populations and will serve as a valuable tool to understand the impact of SV on disease development and progression.
Collapse
Affiliation(s)
- Goo Jun
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston
| | - Adam C English
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Ginger A Metcalf
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Jianzhi Yang
- University of Southern California, Los Angeles, CA, USA
| | | | | | - Vipin K Menon
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | | | | | - Albert V Smith
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - John A Lane
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Tom Blackwell
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Hyun Min Kang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Sejal Salvi
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Qingchang Meng
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Hua Shen
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Divya Pasham
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Sravya Bhamidipati
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Kavya Kottapalli
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Donna K. Arnett
- Department of Epidemiology, University of Kentucky College of Public Health
| | - Allison Ashley-Koch
- Department of Medicine, Duke University Medical Center, Durham, NC
- Duke Molecular Physiology Institute, Duke University Medical Center, Durham, NC
| | - Paul L. Auer
- Division of Biostatistics and Cancer Center, Medical College of Wisconsin, Milwaukee WI
| | | | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas, Rio Grande Valley School of Medicine, Brownsville, TX
| | - Donald W. Bowden
- Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A. Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E. Cade
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA
| | - Yii-Der Ida Chen
- Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Joanne E. Curran
- Biochemistry, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX
| | - Barry I. Freedman
- Department of Internal Medicine, Section on Nephrology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Tasha Fingerlin
- Center for Genes, Environment and Health, National Jewish Health, 1400 Jackson St., Denver, CO, 80206, USA
| | - Bruce D. Gelb
- Mindich Child Health and Development Institute and the Departments of Pediatrics and Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai
| | | | - Yi-Jen Hung
- Institute of Preventive Medicine, National Defense Medical Center, Taiwan
| | - John P Kane
- Cardiovascular Research Institute, University of California, San Francisco
| | - Robert Kaplan
- Department of epidemiology and population health, Albert Einstein College of Medicine, Bronx NY USA
| | - Wonji Kim
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Ruth J.F. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Gregory M Marcus
- Division of Cardiology, University of California, San Francisco CA
| | - Rasika A. Mathias
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Stephen T. McGarvey
- Department of Epidemiology, International Health Institute and Department of Anthropology, Brown University
| | - Courtney Montgomery
- Genes and Human Disease Research Program, Oklahoma Medical Research Foundation
| | - Take Naseri
- Ministry of Health, Government of Samoa, Apia, Samoa
| | - S. Mehdi Nouraie
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Michael H. Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | | | - Patricia A. Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI USA
| | | | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA
| | | | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Stephen S. Rich
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI USA
| | - Michiel Rienstra
- Department of Cardiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
| | - Ingo Ruczinski
- Department of Biostatistics, Johns Hopkins University Bloomberg, School of Public Health, Baltimore, MD, USA
| | - Vijay G. Sankaran
- Division of Hematology/Oncology, Boston Children's Hospital and Department of Pediatric Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115
- Broad Institute of MIT and Harvard, Cambridge, MA 02142
| | | | - Christine E. Seidman
- Department of Genetics, Harvard Medical School
- Cardiovascular Division, Brigham & Women’s Hospital, Harvard University
- Howard Hughes Medical Institute, Harvard University
| | | | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA
| | - Jennifer A. Smith
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Adrienne Stilp
- Department of Biostatistics, University of Washington, Seattle, WA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA USA
| | - Marilyn J. Telen
- Department of Medicine, Duke University Medical Center, Durham, NC
| | - Scott T. Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - L. Keoki Williams
- Center for Individualized and Genomic Medicine Research (CIGMA), Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America
| | - Baojun Wu
- Center for Individualized and Genomic Medicine Research (CIGMA), Department of Internal Medicine, Henry Ford Health System, Detroit, Michigan, United States of America
| | - Lisa R. Yanek
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Yingze Zhang
- Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, 15213
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | | | - Susan K. Dutcher
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington, USA
| | | | | | - Ryan Kim
- Psomagen, Inc.,Rockville, Maryland, USA
| | | | | | | | - James Luo
- National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alex Reiner
- Department of Epidemiology, University of Washington, Seattle, WA 98109, USA
| | - Richard A Gibbs
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, School of Public Health, University of Texas Health Science Center at Houston
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
| | - Goncalo Abecasis
- Regeneron Genetics Center
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI
| | - Fritz J Sedlazeck
- Baylor College of Medicine Human Genome Sequencing Center, Houston, TX, USA
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA
| |
Collapse
|
16
|
Schmidt JK, Kim YH, Strelchenko N, Gierczic SR, Pavelec D, Golos TG, Slukvin II. Whole genome sequencing of CCR5 CRISPR-Cas9-edited Mauritian cynomolgus macaque blastomeres reveals large-scale deletions and off-target edits. Front Genome Ed 2023; 4:1031275. [PMID: 36714391 PMCID: PMC9877282 DOI: 10.3389/fgeed.2022.1031275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 12/15/2022] [Indexed: 01/15/2023] Open
Abstract
Introduction: Genome editing by CRISPR-Cas9 approaches offers promise for introducing or correcting disease-associated mutations for research and clinical applications. Nonhuman primates are physiologically closer to humans than other laboratory animal models, providing ideal candidates for introducing human disease-associated mutations to develop models of human disease. The incidence of large chromosomal anomalies in CRISPR-Cas9-edited human embryos and cells warrants comprehensive genotypic investigation of editing outcomes in primate embryos. Our objective was to evaluate on- and off-target editing outcomes in CCR5 CRISPR-Cas9-targeted Mauritian cynomolgus macaque embryos. Methods: DNA isolated from individual blastomeres of two embryos, along with paternal and maternal DNA, was subjected to whole genome sequencing (WGS) analysis. Results: Large deletions were identified in macaque blastomeres at the on-target site that were not previously detected using PCR-based methods. De novo mutations were also identified at predicted CRISPR-Cas9 off-target sites. Discussion: This is the first report of WGS analysis of CRISPR-Cas9-targeted nonhuman primate embryonic cells, in which a high editing efficiency was coupled with the incidence of editing errors in cells from two embryos. These data demonstrate that comprehensive sequencing-based methods are warranted for evaluating editing outcomes in primate embryos, as well as any resultant offspring to ensure that the observed phenotype is due to the targeted edit and not due to unidentified off-target mutations.
Collapse
Affiliation(s)
- Jenna Kropp Schmidt
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI, United States
| | - Yun Hee Kim
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI, United States
| | - Nick Strelchenko
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI, United States
| | - Sarah R. Gierczic
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI, United States
| | - Derek Pavelec
- University of Wisconsin Biotechnology Center, University of Wisconsin-Madison, Madison, WI, United States
| | - Thaddeus G. Golos
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI, United States
- Department of Comparative Biosciences, University of Wisconsin-Madison, Madison, WI, United States
- Department of Obstetrics and Gynecology, University of Wisconsin-Madison, Madison, WI, United States
| | - Igor I. Slukvin
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI, United States
- Department of Pathology and Laboratory Medicine, University of Wisconsin-Madison, Madison, WI, United States
- Department of Cell and Regenerative Biology, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
17
|
Wheeler MM, Stilp AM, Rao S, Halldórsson BV, Beyter D, Wen J, Mihkaylova AV, McHugh CP, Lane J, Jiang MZ, Raffield LM, Jun G, Sedlazeck FJ, Metcalf G, Yao Y, Bis JB, Chami N, de Vries PS, Desai P, Floyd JS, Gao Y, Kammers K, Kim W, Moon JY, Ratan A, Yanek LR, Almasy L, Becker LC, Blangero J, Cho MH, Curran JE, Fornage M, Kaplan RC, Lewis JP, Loos RJF, Mitchell BD, Morrison AC, Preuss M, Psaty BM, Rich SS, Rotter JI, Tang H, Tracy RP, Boerwinkle E, Abecasis GR, Blackwell TW, Smith AV, Johnson AD, Mathias RA, Nickerson DA, Conomos MP, Li Y, Þorsteinsdóttir U, Magnússon MK, Stefansson K, Pankratz ND, Bauer DE, Auer PL, Reiner AP. Whole genome sequencing identifies structural variants contributing to hematologic traits in the NHLBI TOPMed program. Nat Commun 2022; 13:7592. [PMID: 36481753 PMCID: PMC9732337 DOI: 10.1038/s41467-022-35354-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 11/29/2022] [Indexed: 12/13/2022] Open
Abstract
Genome-wide association studies have identified thousands of single nucleotide variants and small indels that contribute to variation in hematologic traits. While structural variants are known to cause rare blood or hematopoietic disorders, the genome-wide contribution of structural variants to quantitative blood cell trait variation is unknown. Here we utilized whole genome sequencing data in ancestrally diverse participants of the NHLBI Trans Omics for Precision Medicine program (N = 50,675) to detect structural variants associated with hematologic traits. Using single variant tests, we assessed the association of common and rare structural variants with red cell-, white cell-, and platelet-related quantitative traits and observed 21 independent signals (12 common and 9 rare) reaching genome-wide significance. The majority of these associations (N = 18) replicated in independent datasets. In genome-editing experiments, we provide evidence that a deletion associated with lower monocyte counts leads to disruption of an S1PR3 monocyte enhancer and decreased S1PR3 expression.
Collapse
Affiliation(s)
- Marsha M Wheeler
- Department of Genome Sciences, University of Washington, Seattle, WA, 98105, USA
| | - Adrienne M Stilp
- Department of Biostatistics, University of Washington, Seattle, WA, 98105, USA
| | - Shuquan Rao
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, 02115, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Harvard Stem Cell Institute, Boston, MA, 02138, USA
- Broad Institute, Cambridge, MA, 02142, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology & Blood Diseases Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Tianjin, 300020, China
| | - Bjarni V Halldórsson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- School of Technology, Reykjavik University, Reykjavík, Iceland
| | | | - Jia Wen
- Departments of Biostatistics, Genetics, Computer Science, Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Anna V Mihkaylova
- Department of Biostatistics, University of Washington, Seattle, WA, 98105, USA
| | - Caitlin P McHugh
- Department of Biostatistics, University of Washington, Seattle, WA, 98105, USA
| | - John Lane
- Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, MN, 55455, USA
| | - Min-Zhi Jiang
- Departments of Biostatistics, Genetics, Computer Science, Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina, Chapel Hill, NC, 27599, USA
| | - Goo Jun
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Ginger Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Yao Yao
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, 02115, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Harvard Stem Cell Institute, Boston, MA, 02138, USA
- Broad Institute, Cambridge, MA, 02142, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
| | - Joshua B Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, 98101, USA
| | - Nathalie Chami
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Paul S de Vries
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
- Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Pinkal Desai
- Division of Hematology and Oncology, Weill Cornell Medical College, New York, NY, 10065, USA
| | - James S Floyd
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, 98101, USA
| | - Yan Gao
- Jackson Heart Study, Department of Medicine, University of Mississippi, Jackson, MS, 39216, USA
| | - Kai Kammers
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Wonji Kim
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, 2115, USA
| | - Jee-Young Moon
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22908, USA
| | - Lisa R Yanek
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Laura Almasy
- Children's Hospital of Philadelphia and University of Pennsylvania School of Medicine, Philadelphia, PA, 19104, USA
| | - Lewis C Becker
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, 78520, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, 2115, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, 78520, USA
| | - Myriam Fornage
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Robert C Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, 10461, USA
| | - Joshua P Lewis
- Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Ruth J F Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Braxton D Mitchell
- Department of Medicine, Division of Endocrinology, Diabetes, and Nutrition, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C Morrison
- Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Michael Preuss
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, 98101, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22908, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, 90502, USA
| | - Hua Tang
- Department of Genetics, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Russell P Tracy
- Departments of Pathology & Laboratory Medicine and Biochemistry, Larner College of Medicine at the University of Vermont, Colchester, VT, 5446, USA
| | - Eric Boerwinkle
- Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Goncalo R Abecasis
- TOPMed Informatics Research Center, University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48109, USA
| | - Thomas W Blackwell
- TOPMed Informatics Research Center, University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48109, USA
| | - Albert V Smith
- TOPMed Informatics Research Center, University of Michigan, Department of Biostatistics, Ann Arbor, MI, 48109, USA
| | - Andrew D Johnson
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung and Blood Institute, Framingham, MA, 1702, USA
| | - Rasika A Mathias
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Deborah A Nickerson
- Department of Genome Sciences, University of Washington, Seattle, WA, 98105, USA
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, 98105, USA
| | - Yun Li
- Departments of Biostatistics, Genetics, Computer Science, Applied Physical Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Unnur Þorsteinsdóttir
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, 101, Reykjavik, Iceland
| | - Magnús K Magnússon
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, 101, Reykjavik, Iceland
| | - Kari Stefansson
- deCODE genetics/Amgen Inc., Reykjavik, Iceland
- Faculty of Medicine, University of Iceland, 101, Reykjavik, Iceland
| | - Nathan D Pankratz
- Department of Laboratory Medicine and Pathology, University of Minnesota Medical School, Minneapolis, MN, 55455, USA
| | - Daniel E Bauer
- Division of Hematology/Oncology, Boston Children's Hospital, Boston, MA, 02115, USA
- Department of Pediatric Oncology, Dana-Farber Cancer Institute, Boston, MA, 02115, USA
- Harvard Stem Cell Institute, Boston, MA, 02138, USA
- Broad Institute, Cambridge, MA, 02142, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, 02115, USA
| | - Paul L Auer
- Division of Biostatistics, Institute for Health and Equity, and Cancer Center, Medical College of Wisconsin, Milwaukee, WI, 53226, USA.
| | - Alex P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, 98105, USA.
| |
Collapse
|
18
|
Piernik M, Brzezinski D, Sztromwasser P, Pacewicz K, Majer-Burman W, Gniot M, Sielski D, Bryzghalov O, Wozna A, Zawadzki P. DBFE: distribution-based feature extraction from structural variants in whole-genome data. Bioinformatics 2022; 38:4466-4473. [PMID: 35929780 DOI: 10.1093/bioinformatics/btac513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 07/12/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Whole-genome sequencing has revolutionized biosciences by providing tools for constructing complete DNA sequences of individuals. With entire genomes at hand, scientists can pinpoint DNA fragments responsible for oncogenesis and predict patient responses to cancer treatments. Machine learning plays a paramount role in this process. However, the sheer volume of whole-genome data makes it difficult to encode the characteristics of genomic variants as features for learning algorithms. RESULTS In this article, we propose three feature extraction methods that facilitate classifier learning from sets of genomic variants. The core contributions of this work include: (i) strategies for determining features using variant length binning, clustering and density estimation; (ii) a programing library for automating distribution-based feature extraction in machine learning pipelines. The proposed methods have been validated on five real-world datasets using four different classification algorithms and a clustering approach. Experiments on genomes of 219 ovarian, 61 lung and 929 breast cancer patients show that the proposed approaches automatically identify genomic biomarkers associated with cancer subtypes and clinical response to oncological treatment. Finally, we show that the extracted features can be used alongside unsupervised learning methods to analyze genomic samples. AVAILABILITY AND IMPLEMENTATION The source code of the presented algorithms and reproducible experimental scripts are available on Github at https://github.com/MNMdiagnostics/dbfe. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maciej Piernik
- Institute of Computing Science, Faculty of Computing and Telecommunications, Poznan University of Technology, 60-965 Poznan, Poland.,MNM Bioscience Inc., Cambridge, MA 02142, USA
| | - Dariusz Brzezinski
- Institute of Computing Science, Faculty of Computing and Telecommunications, Poznan University of Technology, 60-965 Poznan, Poland.,MNM Bioscience Inc., Cambridge, MA 02142, USA.,Institute of Bioorganic Chemistry of the Polish Academy of Sciences, 61-704 Poznan, Poland
| | | | | | | | - Michal Gniot
- MNM Bioscience Inc., Cambridge, MA 02142, USA.,Department of Hematology and Bone Marrow Transplantation, Poznan University of Medical Sciences, 60-569 Poznan, Poland
| | | | | | - Alicja Wozna
- MNM Bioscience Inc., Cambridge, MA 02142, USA.,Faculty of Physics, Adam Mickiewicz University, 61-614 Poznan, Poland
| | - Pawel Zawadzki
- MNM Bioscience Inc., Cambridge, MA 02142, USA.,Faculty of Physics, Adam Mickiewicz University, 61-614 Poznan, Poland
| |
Collapse
|
19
|
Vali-Pour M, Park S, Espinosa-Carrasco J, Ortiz-Martínez D, Lehner B, Supek F. The impact of rare germline variants on human somatic mutation processes. Nat Commun 2022; 13:3724. [PMID: 35764656 PMCID: PMC9240060 DOI: 10.1038/s41467-022-31483-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 06/17/2022] [Indexed: 02/07/2023] Open
Abstract
Somatic mutations are an inevitable component of ageing and the most important cause of cancer. The rates and types of somatic mutation vary across individuals, but relatively few inherited influences on mutation processes are known. We perform a gene-based rare variant association study with diverse mutational processes, using human cancer genomes from over 11,000 individuals of European ancestry. By combining burden and variance tests, we identify 207 associations involving 15 somatic mutational phenotypes and 42 genes that replicated in an independent data set at a false discovery rate of 1%. We associate rare inherited deleterious variants in genes such as MSH3, EXO1, SETD2, and MTOR with two phenotypically different forms of DNA mismatch repair deficiency, and variants in genes such as EXO1, PAXIP1, RIF1, and WRN with deficiency in homologous recombination repair. In addition, we identify associations with other mutational processes, such as APEX1 with APOBEC-signature mutagenesis. Many of the genes interact with each other and with known mutator genes within cellular sub-networks. Considered collectively, damaging variants in the identified genes are prevalent in the population. We suggest that rare germline variation in diverse genes commonly impacts mutational processes in somatic cells.
Collapse
Affiliation(s)
- Mischan Vali-Pour
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Solip Park
- Centro Nacional de Investigaciones Oncológicas (CNIO), Madrid, Spain
| | - Jose Espinosa-Carrasco
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Daniel Ortiz-Martínez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Ben Lehner
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain.
| | - Fran Supek
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain.
| |
Collapse
|
20
|
Sarwal V, Niehus S, Ayyala R, Kim M, Sarkar A, Chang S, Lu A, Rajkumar N, Darfci-Maher N, Littman R, Chhugani K, Soylev A, Comarova Z, Wesel E, Castellanos J, Chikka R, Distler MG, Eskin E, Flint J, Mangul S. A comprehensive benchmarking of WGS-based deletion structural variant callers. Brief Bioinform 2022; 23:6618239. [PMID: 35753701 DOI: 10.1093/bib/bbac221] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 04/30/2022] [Accepted: 05/11/2022] [Indexed: 01/10/2023] Open
Abstract
Advances in whole-genome sequencing (WGS) promise to enable the accurate and comprehensive structural variant (SV) discovery. Dissecting SVs from WGS data presents a substantial number of challenges and a plethora of SV detection methods have been developed. Currently, evidence that investigators can use to select appropriate SV detection tools is lacking. In this article, we have evaluated the performance of SV detection tools on mouse and human WGS data using a comprehensive polymerase chain reaction-confirmed gold standard set of SVs and the genome-in-a-bottle variant set, respectively. In contrast to the previous benchmarking studies, our gold standard dataset included a complete set of SVs allowing us to report both precision and sensitivity rates of the SV detection methods. Our study investigates the ability of the methods to detect deletions, thus providing an optimistic estimate of SV detection performance as the SV detection methods that fail to detect deletions are likely to miss more complex SVs. We found that SV detection tools varied widely in their performance, with several methods providing a good balance between sensitivity and precision. Additionally, we have determined the SV callers best suited for low- and ultralow-pass sequencing data as well as for different deletion length categories.
Collapse
Affiliation(s)
- Varuni Sarwal
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA.,Indian Institute of Technology Delhi, Hauz Khas, New Delhi, Delhi 110016, India
| | - Sebastian Niehus
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178 Berlin, Germany.,Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, 10117 Berlin, Germany
| | - Ram Ayyala
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Minyoung Kim
- Department of Quantitative and Computational Biology, University of Southern California, 1050 Childs Way, Los Angeles, CA 90089
| | - Aditya Sarkar
- School of Computing and Electrical Engineering, Indian Institute of Technology Mandi, Kamand, Mandi, Himachal Pradesh 175001, India
| | - Sei Chang
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Angela Lu
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Neha Rajkumar
- Department of Bioengineering, Department of Bioengineering, University of California Los Angeles, Los Angeles, CA, 90095
| | - Nicholas Darfci-Maher
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Russell Littman
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Karishma Chhugani
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California 1985 Zonal Avenue Los Angeles, CA 90089-9121
| | - Arda Soylev
- Department of Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
| | - Zoia Comarova
- Department Civil and Environmental Engineering, University of Southern California, Los Angeles, CA, United States
| | - Emily Wesel
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Jacqueline Castellanos
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Rahul Chikka
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Margaret G Distler
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA
| | - Eleazar Eskin
- Department of Computer Science, University of California Los Angeles, 580 Portola Plaza, Los Angeles, CA 90095, USA.,Department of Human Genetics, David Geffen School of Medicine at UCLA, 695 Charles E. Young Drive South, Box 708822, Los Angeles, CA, 90095, USA.,Department of Computational Medicine, David Geffen School of Medicine at UCLA, 73-235 CHS, Los Angeles, CA, 90095, USA
| | - Jonathan Flint
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, 760 Westwood Plaza, Los Angeles, CA 90095, USA
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California 1985 Zonal Avenue Los Angeles, CA 90089-9121
| |
Collapse
|
21
|
Ruigrok M, Xue B, Catanach A, Zhang M, Jesson L, Davy M, Wellenreuther M. The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus. Genes (Basel) 2022; 13:genes13071129. [PMID: 35885912 PMCID: PMC9320665 DOI: 10.3390/genes13071129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 06/08/2022] [Accepted: 06/20/2022] [Indexed: 02/04/2023] Open
Abstract
Background: Genetic diversity provides the basic substrate for evolution. Genetic variation consists of changes ranging from single base pairs (single-nucleotide polymorphisms, or SNPs) to larger-scale structural variants, such as inversions, deletions, and duplications. SNPs have long been used as the general currency for investigations into how genetic diversity fuels evolution. However, structural variants can affect more base pairs in the genome than SNPs and can be responsible for adaptive phenotypes due to their impact on linkage and recombination. In this study, we investigate the first steps needed to explore the genetic basis of an economically important growth trait in the marine teleost finfish Chrysophrys auratus using both SNP and structural variant data. Specifically, we use feature selection methods in machine learning to explore the relative predictive power of both types of genetic variants in explaining growth and discuss the feature selection results of the evaluated methods. Methods: SNP and structural variant callers were used to generate catalogues of variant data from 32 individual fish at ages 1 and 3 years. Three feature selection algorithms (ReliefF, Chi-square, and a mutual-information-based method) were used to reduce the dataset by selecting the most informative features. Following this selection process, the subset of variants was used as features to classify fish into small, medium, or large size categories using KNN, naïve Bayes, random forest, and logistic regression. The top-scoring features in each feature selection method were subsequently mapped to annotated genomic regions in the zebrafish genome, and a permutation test was conducted to see if the number of mapped regions was greater than when random sampling was applied. Results: Without feature selection, the prediction accuracies ranged from 0 to 0.5 for both structural variants and SNPs. Following feature selection, the prediction accuracy increased only slightly to between 0 and 0.65 for structural variants and between 0 and 0.75 for SNPs. The highest prediction accuracy for the logistic regression was achieved for age 3 fish using SNPs, although generally predictions for age 1 and 3 fish were very similar (ranging from 0–0.65 for both SNPs and structural variants). The Chi-square feature selection of SNP data was the only method that had a significantly higher number of matches to annotated genomic regions of zebrafish than would be explained by chance alone. Conclusions: Predicting a complex polygenic trait such as growth using data collected from a low number of individuals remains challenging. While we demonstrate that both SNPs and structural variants provide important information to help understand the genetic basis of phenotypic traits such as fish growth, the full complexities that exist within a genome cannot be easily captured by classical machine learning techniques. When using high-dimensional data, feature selection shows some increase in the prediction accuracy of classification models and provides the potential to identify unknown genomic correlates with growth. Our results show that both SNPs and structural variants significantly impact growth, and we therefore recommend that researchers interested in the genotype–phenotype map should strive to go beyond SNPs and incorporate structural variants in their studies as well. We discuss how our machine learning models can be further expanded to serve as a test bed to inform evolutionary studies and the applied management of species.
Collapse
Affiliation(s)
- Mike Ruigrok
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
- Wellington Faculty of Engineering, Victoria University of Wellington, Wellington 6012, New Zealand; (B.X.); (M.Z.)
| | - Bing Xue
- Wellington Faculty of Engineering, Victoria University of Wellington, Wellington 6012, New Zealand; (B.X.); (M.Z.)
| | - Andrew Catanach
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
| | - Mengjie Zhang
- Wellington Faculty of Engineering, Victoria University of Wellington, Wellington 6012, New Zealand; (B.X.); (M.Z.)
| | - Linley Jesson
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
| | - Marcus Davy
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
| | - Maren Wellenreuther
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
- Correspondence:
| |
Collapse
|
22
|
Mc Cartney AM, Shafin K, Alonge M, Bzikadze AV, Formenti G, Fungtammasan A, Howe K, Jain C, Koren S, Logsdon GA, Miga KH, Mikheenko A, Paten B, Shumate A, Soto DC, Sović I, Wood JMD, Zook JM, Phillippy AM, Rhie A. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat Methods 2022; 19:687-695. [PMID: 35361931 PMCID: PMC9812399 DOI: 10.1038/s41592-022-01440-3] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 03/04/2022] [Indexed: 01/07/2023]
Abstract
Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.
Collapse
Affiliation(s)
- Ann M. Mc Cartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Michael Alonge
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Giulio Formenti
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | | | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH,Department of Computational and Data Sciences, Indian Institute of Science, Bangalore KA, India
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Daniela C. Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis, CA, USA
| | - Ivan Sović
- Pacific Biosciences, Menlo Park, CA, USA,Digital BioLogic d.o.o., Ivanić-Grad, Croatia
| | | | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH,Correspondence: ,
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH,Correspondence: ,
| |
Collapse
|
23
|
Polishing high-quality genome assemblies. Nat Methods 2022; 19:649-650. [PMID: 35610477 DOI: 10.1038/s41592-022-01515-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
24
|
Cleal K, Baird DM. Dysgu: efficient structural variant calling using short or long reads. Nucleic Acids Res 2022; 50:e53. [PMID: 35100420 PMCID: PMC9122538 DOI: 10.1093/nar/gkac039] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 12/20/2021] [Accepted: 01/24/2022] [Indexed: 12/27/2022] Open
Abstract
Structural variation (SV) plays a fundamental role in genome evolution and can underlie inherited or acquired diseases such as cancer. Long-read sequencing technologies have led to improvements in the characterization of structural variants (SVs), although paired-end sequencing offers better scalability. Here, we present dysgu, which calls SVs or indels using paired-end or long reads. Dysgu detects signals from alignment gaps, discordant and supplementary mappings, and generates consensus contigs, before classifying events using machine learning. Additional SVs are identified by remapping of anomalous sequences. Dysgu outperforms existing state-of-the-art tools using paired-end or long-reads, offering high sensitivity and precision whilst being among the fastest tools to run. We find that combining low coverage paired-end and long-reads is competitive in terms of performance with long-reads at higher coverage values.
Collapse
Affiliation(s)
- Kez Cleal
- Division of Cancer and Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Duncan M Baird
- Division of Cancer and Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| |
Collapse
|
25
|
Liu Z, Roberts R, Mercer TR, Xu J, Sedlazeck FJ, Tong W. Towards accurate and reliable resolution of structural variants for clinical diagnosis. Genome Biol 2022; 23:68. [PMID: 35241127 PMCID: PMC8892125 DOI: 10.1186/s13059-022-02636-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 02/15/2022] [Indexed: 12/17/2022] Open
Abstract
Structural variants (SVs) are a major source of human genetic diversity and have been associated with different diseases and phenotypes. The detection of SVs is difficult, and a diverse range of detection methods and data analysis protocols has been developed. This difficulty and diversity make the detection of SVs for clinical applications challenging and requires a framework to ensure accuracy and reproducibility. Here, we discuss current developments in the diagnosis of SVs and propose a roadmap for the accurate and reproducible detection of SVs that includes case studies provided from the FDA-led SEquencing Quality Control Phase II (SEQC-II) and other consortium efforts.
Collapse
Affiliation(s)
- Zhichao Liu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Ruth Roberts
- ApconiX, BioHub at Alderley Park, Alderley Edge, SK10 4TG, UK.,University of Birmingham, Edgbaston, Birmingham, B15 2TT, UK
| | - Timothy R Mercer
- Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia.,Garvan Institute of Medical Research, Sydney, NSW, Australia.,St Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia
| | - Joshua Xu
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
26
|
Wang M, Song WM, Ming C, Wang Q, Zhou X, Xu P, Krek A, Yoon Y, Ho L, Orr ME, Yuan GC, Zhang B. Guidelines for bioinformatics of single-cell sequencing data analysis in Alzheimer's disease: review, recommendation, implementation and application. Mol Neurodegener 2022; 17:17. [PMID: 35236372 PMCID: PMC8889402 DOI: 10.1186/s13024-022-00517-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Accepted: 01/18/2022] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is the most common form of dementia, characterized by progressive cognitive impairment and neurodegeneration. Extensive clinical and genomic studies have revealed biomarkers, risk factors, pathways, and targets of AD in the past decade. However, the exact molecular basis of AD development and progression remains elusive. The emerging single-cell sequencing technology can potentially provide cell-level insights into the disease. Here we systematically review the state-of-the-art bioinformatics approaches to analyze single-cell sequencing data and their applications to AD in 14 major directions, including 1) quality control and normalization, 2) dimension reduction and feature extraction, 3) cell clustering analysis, 4) cell type inference and annotation, 5) differential expression, 6) trajectory inference, 7) copy number variation analysis, 8) integration of single-cell multi-omics, 9) epigenomic analysis, 10) gene network inference, 11) prioritization of cell subpopulations, 12) integrative analysis of human and mouse sc-RNA-seq data, 13) spatial transcriptomics, and 14) comparison of single cell AD mouse model studies and single cell human AD studies. We also address challenges in using human postmortem and mouse tissues and outline future developments in single cell sequencing data analysis. Importantly, we have implemented our recommended workflow for each major analytic direction and applied them to a large single nucleus RNA-sequencing (snRNA-seq) dataset in AD. Key analytic results are reported while the scripts and the data are shared with the research community through GitHub. In summary, this comprehensive review provides insights into various approaches to analyze single cell sequencing data and offers specific guidelines for study design and a variety of analytic directions. The review and the accompanied software tools will serve as a valuable resource for studying cellular and molecular mechanisms of AD, other diseases, or biological systems at the single cell level.
Collapse
Affiliation(s)
- Minghui Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Won-min Song
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Chen Ming
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Qian Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Xianxiao Zhou
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Peng Xu
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Yonejung Yoon
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Lap Ho
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| | - Miranda E. Orr
- Department of Internal Medicine, Section of Gerontology and Geriatric Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
- Sticht Center for Healthy Aging and Alzheimer’s Prevention, Wake Forest School of Medicine, Winston-Salem, North Carolina USA
| | - Guo-Cheng Yuan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029 USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Mount Sinai Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
- Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, 1470 Madison Avenue, Room S8-111, New York, NY 10029 USA
| |
Collapse
|
27
|
Wang T, Sun J, Zhang X, Wang WJ, Zhou Q. CNV-P: a machine-learning framework for predicting high confident copy number variations. PeerJ 2021; 9:e12564. [PMID: 34917425 PMCID: PMC8645205 DOI: 10.7717/peerj.12564] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 11/08/2021] [Indexed: 12/27/2022] Open
Abstract
Background Copy-number variants (CNVs) have been recognized as one of the major causes of genetic disorders. Reliable detection of CNVs from genome sequencing data has been a strong demand for disease research. However, current software for detecting CNVs has high false-positive rates, which needs further improvement. Methods Here, we proposed a novel and post-processing approach for CNVs prediction (CNV-P), a machine-learning framework that could efficiently remove false-positive fragments from results of CNVs detecting tools. A series of CNVs signals such as read depth (RD), split reads (SR) and read pair (RP) around the putative CNV fragments were defined as features to train a classifier. Results The prediction results on several real biological datasets showed that our models could accurately classify the CNVs at over 90% precision rate and 85% recall rate, which greatly improves the performance of state-of-the-art algorithms. Furthermore, our results indicate that CNV-P is robust to different sizes of CNVs and the platforms of sequencing. Conclusions Our framework for classifying high-confident CNVs could improve both basic research and clinical diagnosis of genetic diseases.
Collapse
Affiliation(s)
| | - Jinghua Sun
- BGI-Shenzhen, Shenzhen, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
| | - Xiuqing Zhang
- BGI-Shenzhen, Shenzhen, China.,College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China.,Guangdong Enterprise Key Laboratory of Human Disease Genomics, Beishan Industrial Zone, Shenzhen, China
| | | | | |
Collapse
|
28
|
Zverinova S, Guryev V. Variant calling: Considerations, practices, and developments. Hum Mutat 2021; 43:976-985. [PMID: 34882898 PMCID: PMC9545713 DOI: 10.1002/humu.24311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 11/02/2021] [Accepted: 12/03/2021] [Indexed: 11/10/2022]
Abstract
The success of many clinical, association, or population genetics studies critically relies on properly performed variant calling step. The variety of modern genomics protocols, techniques, and platforms makes our choices of methods and algorithms difficult and there is no "one size fits all" solution for study design and data analysis. In this review, we discuss considerations that need to be taken into account while designing the study and preparing for the experiments. We outline the variety of variant types that can be detected using sequencing approaches and highlight some specific requirements and basic principles of their detection. Finally, we cover interesting developments that enable variant calling for a broad range of applications in the genomics field. We conclude by discussing technological and algorithmic advances that have the potential to change the ways of calling DNA variants in the nearest future.
Collapse
Affiliation(s)
- Stepanka Zverinova
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| |
Collapse
|
29
|
Mitani T, Isikay S, Gezdirici A, Gulec EY, Punetha J, Fatih JM, Herman I, Akay G, Du H, Calame DG, Ayaz A, Tos T, Yesil G, Aydin H, Geckinli B, Elcioglu N, Candan S, Sezer O, Erdem HB, Gul D, Demiral E, Elmas M, Yesilbas O, Kilic B, Gungor S, Ceylan AC, Bozdogan S, Ozalp O, Cicek S, Aslan H, Yalcintepe S, Topcu V, Bayram Y, Grochowski CM, Jolly A, Dawood M, Duan R, Jhangiani SN, Doddapaneni H, Hu J, Muzny DM, Marafi D, Akdemir ZC, Karaca E, Carvalho CMB, Gibbs RA, Posey JE, Lupski JR, Pehlivan D. High prevalence of multilocus pathogenic variation in neurodevelopmental disorders in the Turkish population. Am J Hum Genet 2021; 108:1981-2005. [PMID: 34582790 PMCID: PMC8546040 DOI: 10.1016/j.ajhg.2021.08.009] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 08/20/2021] [Indexed: 02/06/2023] Open
Abstract
Neurodevelopmental disorders (NDDs) are clinically and genetically heterogenous; many such disorders are secondary to perturbation in brain development and/or function. The prevalence of NDDs is > 3%, resulting in significant sociocultural and economic challenges to society. With recent advances in family-based genomics, rare-variant analyses, and further exploration of the Clan Genomics hypothesis, there has been a logarithmic explosion in neurogenetic "disease-associated genes" molecular etiology and biology of NDDs; however, the majority of NDDs remain molecularly undiagnosed. We applied genome-wide screening technologies, including exome sequencing (ES) and whole-genome sequencing (WGS), to identify the molecular etiology of 234 newly enrolled subjects and 20 previously unsolved Turkish NDD families. In 176 of the 234 studied families (75.2%), a plausible and genetically parsimonious molecular etiology was identified. Out of 176 solved families, deleterious variants were identified in 218 distinct genes, further documenting the enormous genetic heterogeneity and diverse perturbations in human biology underlying NDDs. We propose 86 candidate disease-trait-associated genes for an NDD phenotype. Importantly, on the basis of objective and internally established variant prioritization criteria, we identified 51 families (51/176 = 28.9%) with multilocus pathogenic variation (MPV), mostly driven by runs of homozygosity (ROHs) - reflecting genomic segments/haplotypes that are identical-by-descent. Furthermore, with the use of additional bioinformatic tools and expansion of ES to additional family members, we established a molecular diagnosis in 5 out of 20 families (25%) who remained undiagnosed in our previously studied NDD cohort emanating from Turkey.
Collapse
Affiliation(s)
- Tadahiro Mitani
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sedat Isikay
- Department of Pediatric Neurology, Faculty of Medicine, University of Gaziantep, Gaziantep 27310, Turkey
| | - Alper Gezdirici
- Department of Medical Genetics, Basaksehir Cam and Sakura City Hospital, Istanbul 34480, Turkey
| | - Elif Yilmaz Gulec
- Department of Medical Genetics, Kanuni Sultan Suleyman Training and Research Hospital, 34303 Istanbul, Turkey
| | - Jaya Punetha
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jawid M Fatih
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Isabella Herman
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Gulsen Akay
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Haowei Du
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Daniel G Calame
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Akif Ayaz
- Department of Medical Genetics, Adana City Training and Research Hospital, Adana 01170, Turkey; Departments of Medical Genetics, School of Medicine, Istanbul Medipol University, Istanbul 34810, Turkey
| | - Tulay Tos
- University of Health Sciences Zubeyde Hanim Research and Training Hospital of Women's Health and Diseases, Department of Medical Genetics, Ankara 06080, Turkey
| | - Gozde Yesil
- Istanbul Faculty of Medicine, Department of Medical Genetics, Istanbul University, Istanbul 34093, Turkey
| | - Hatip Aydin
- Centre of Genetics Diagnosis, Zeynep Kamil Maternity and Children's Training and Research Hospital, Istanbul, Turkey; Private Reyap Istanbul Hospital, Istanbul 34515, Turkey
| | - Bilgen Geckinli
- Centre of Genetics Diagnosis, Zeynep Kamil Maternity and Children's Training and Research Hospital, Istanbul, Turkey; Department of Medical Genetics, School of Medicine, Marmara University, Istanbul 34722, Turkey
| | - Nursel Elcioglu
- Department of Pediatric Genetics, School of Medicine, Marmara University, Istanbul 34722, Turkey; Eastern Mediterranean University Medical School, Magosa, Mersin 10, Turkey
| | - Sukru Candan
- Medical Genetics Section, Balikesir Ataturk Public Hospital, Balikesir 10100, Turkey
| | - Ozlem Sezer
- Department of Medical Genetics, Samsun Education and Research Hospital, Samsun 55100, Turkey
| | - Haktan Bagis Erdem
- Department of Medical Genetics, University of Health Sciences, Diskapi Yildirim Beyazit Training and Research Hospital, Ankara 06110, Turkey
| | - Davut Gul
- Department of Medical Genetics, Gulhane Military Medical School, Ankara 06010, Turkey
| | - Emine Demiral
- Department of Medical Genetics, School of Medicine, University of Inonu, Malatya 44280, Turkey
| | - Muhsin Elmas
- Department of Medical Genetics, Afyon Kocatepe University, School of Medicine, Afyon 03218, Turkey
| | - Osman Yesilbas
- Division of Critical Care Medicine, Department of Pediatrics, School of Medicine, Bezmialem Foundation University, Istanbul 34093, Turkey; Department of Pediatrics, Division of Pediatric Critical Care Medicine, Faculty of Medicine, Karadeniz Technical University, Trabzon, Turkey
| | - Betul Kilic
- Department of Pediatrics and Pediatric Neurology, Faculty of Medicine, Inonu University, Malatya 34218, Turkey
| | - Serdal Gungor
- Department of Pediatrics and Pediatric Neurology, Faculty of Medicine, Inonu University, Malatya 34218, Turkey
| | - Ahmet C Ceylan
- Department of Medical Genetics, University of Health Sciences, Ankara Training and Research Hospital, Ankara 06110, Turkey
| | - Sevcan Bozdogan
- Department of Medical Genetics, Cukurova University Faculty of Medicine, Adana 01330, Turkey
| | - Ozge Ozalp
- Department of Medical Genetics, Adana City Training and Research Hospital, Adana 01170, Turkey
| | - Salih Cicek
- Department of Medical Genetics, Konya Training and Research Hospital, Konya 42250, Turkey
| | - Huseyin Aslan
- Department of Medical Genetics, Adana City Training and Research Hospital, Adana 01170, Turkey
| | - Sinem Yalcintepe
- Department of Medical Genetics, School of Medicine, Trakya University, Edirne 22130, Turkey
| | - Vehap Topcu
- Department of Medical Genetics, Ankara City Hospital, Ankara 06800, Turkey
| | - Yavuz Bayram
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Angad Jolly
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA
| | - Moez Dawood
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Medical Scientist Training Program, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ruizhi Duan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Shalini N Jhangiani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Harsha Doddapaneni
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jianhong Hu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Dana Marafi
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zeynep Coban Akdemir
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ender Karaca
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Claudia M B Carvalho
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Richard A Gibbs
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jennifer E Posey
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - James R Lupski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA; Texas Children's Hospital, Houston, TX 77030, USA.
| | - Davut Pehlivan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Section of Pediatric Neurology and Developmental Neuroscience, Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA; Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, TX 77030, USA.
| |
Collapse
|
30
|
Cisarova K, Garavelli L, Caraffi SG, Peluso F, Valeri L, Gargano G, Gavioli S, Trimarchi G, Neri A, Campos-Xavier B, Superti-Furga A. A monoallelic SEC23A variant E599K associated with cranio-lenticulo-sutural dysplasia. Am J Med Genet A 2021; 188:319-325. [PMID: 34580982 PMCID: PMC9291540 DOI: 10.1002/ajmg.a.62506] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 08/25/2021] [Accepted: 08/26/2021] [Indexed: 11/06/2022]
Abstract
Cranio-lenticulo-sutural dysplasia (CLSD; MIM 607812) is a rare or underdiagnosed condition, as only two families have been reported. The original family (Boyadjiev et al., Human Genetics, 2003, 113, 1-9 and Boyadjiev et al., Nature Genetics, 2006, 38, 1192-1197) showed recessive inheritance of the condition with a biallelic SEC23A missense variant in affected individuals. In contrast, another child with sporadic CLSD had a monoallelic SEC23A variant inherited from the reportedly unaffected father (Boyadjiev et al., Clinical Genetics, 2011, 80, 169-176), raising questions on possible digenism. Here, we report a 2-month-old boy seen because of large fontanels with wide cranial sutures, a large forehead, hypertelorism, a thin nose, a high arched palate, and micrognathia. His mother was clinically unremarkable, while his father had a history of large fontanels in infancy who had closed only around age 10 years; he also had a large forehead, hypertelorism, a thin, beaked nose and was operated for bilateral glaucoma with exfoliation of the lens capsule. Trio genome sequencing and familial segregation revealed a monoallelic c.1795G > A transition in SEC23A that was de novo in the father and transmitted to the proband. The variant predicts a nonconservative substitution (p.E599K) in an ultra-conserved residue that is seen in 3D models of yeast SEC23 to be involved in direct binding between SEC23 and SAR1 subunits of the coat protein complex II coat. This observation confirms the link between SEC23A variants and CLSD but suggests that in addition to the recessive inheritance described in the original family, SEC23A variants may result in dominant inheritance of CLSD, possibly by a dominant-negative disruptive effect on the SEC23 multimer.
Collapse
Affiliation(s)
- Katarina Cisarova
- Division of Genetic Medicine, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Livia Garavelli
- Clinical Genetics Unit, Azienda USL-IRCCS of Reggio Emilia, Reggio Emilia, Italy
| | | | - Francesca Peluso
- Clinical Genetics Unit, Azienda USL-IRCCS of Reggio Emilia, Reggio Emilia, Italy
| | - Lara Valeri
- Clinical Genetics Unit, Azienda USL-IRCCS of Reggio Emilia, Reggio Emilia, Italy
| | - Giancarlo Gargano
- Neonatal Intensive Care Unit, Azienda USL-IRCCS of Reggio Emilia, Reggio Emilia, Italy
| | - Sara Gavioli
- Neonatal Intensive Care Unit, Azienda USL-IRCCS of Reggio Emilia, Reggio Emilia, Italy
| | - Gabriele Trimarchi
- Clinical Genetics Unit, Azienda USL-IRCCS of Reggio Emilia, Reggio Emilia, Italy
| | - Alberto Neri
- Ophthalmology Unit, Department of Surgery, Azienda USL-IRCCS of Reggio Emilia, Reggio Emilia, Italy
| | - Belinda Campos-Xavier
- Division of Genetic Medicine, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| | - Andrea Superti-Furga
- Division of Genetic Medicine, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
31
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/23/2021] [Indexed: 11/20/2022] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Mc Cartney AM, Mahmoud M, Jochum M, Agustinho DP, Zorman B, Al Khleifat A, Dabbaghie F, K Kesharwani R, Smolka M, Dawood M, Albin D, Aliyev E, Almabrazi H, Arslan A, Balaji A, Behera S, Billingsley K, L Cameron D, Daw J, T. Dawson E, De Coster W, Du H, Dunn C, Esteban R, Jolly A, Kalra D, Liao C, Liu Y, Lu TY, M Havrilla J, M Khayat M, Marin M, Monlong J, Price S, Rafael Gener A, Ren J, Sagayaradj S, Sapoval N, Sinner C, C. Soto D, Soylev A, Subramaniyan A, Syed N, Tadimeti N, Tater P, Vats P, Vaughn J, Walker K, Wang G, Zeng Q, Zhang S, Zhao T, Kille B, Biederstedt E, Chaisson M, English A, Kronenberg Z, J. Treangen T, Hefferon T, Chin CS, Busby B, J Sedlazeck F. An international virtual hackathon to build tools for the analysis of structural variants within species ranging from coronaviruses to vertebrates. F1000Res 2021; 10:246. [PMID: 34621504 PMCID: PMC8479851 DOI: 10.12688/f1000research.51477.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/04/2021] [Indexed: 11/08/2023] Open
Abstract
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fawaz Dabbaghie
- Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | | | | | | | | | | | | | - Ahmed Arslan
- Stanford University School of Medicine, California, USA
| | | | | | | | - Daniel L Cameron
- Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
| | - Joyjit Daw
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | - Haowei Du
- Baylor College of Medicine, Houston, USA
| | | | | | | | | | | | | | | | | | | | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, Santa Cruz, USA
| | | | | | | | | | | | | | | | - Arda Soylev
- Konya Food and Agriculture University, Konya, Turkey
| | | | | | | | | | - Pankaj Vats
- NVIDIA Corporation, Santa Clara, California, USA
| | | | | | | | - Qiandong Zeng
- Laboratory Corporation of America Holdings, Westborough, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Jun G, Sedlazeck F, Zhu Q, English A, Metcalf G, Kang HM, Lee C, Gibbs R, Boerwinkle E. muCNV: Genotyping Structural Variants for Population-level Sequencing. Bioinformatics 2021; 37:btab199. [PMID: 33760063 PMCID: PMC8496513 DOI: 10.1093/bioinformatics/btab199] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Revised: 01/31/2021] [Accepted: 03/13/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION There are high demands for joint genotyping of structural variations with short-read sequencing, but efficient and accurate genotyping in population scale is a challenging task. RESULTS We developed muCNV that aggregates per-sample summary pileups for joint genotyping of > 100,000 samples. Pilot results show very low Mendelian inconsistencies. Applications to large-scale projects in cloud show the computational efficiencies of muCNV genotyping pipeline. AVAILABILITY muCNV is publicly available for download at: https://github.com/gjun/muCNV. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Goo Jun
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Fritz Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Qihui Zhu
- Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Adam English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ginger Metcalf
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Hyun Min Kang
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA
| | | | - Charles Lee
- Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Richard Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Eric Boerwinkle
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
34
|
van Belzen IAEM, Schönhuth A, Kemmeren P, Hehir-Kwa JY. Structural variant detection in cancer genomes: computational challenges and perspectives for precision oncology. NPJ Precis Oncol 2021; 5:15. [PMID: 33654267 PMCID: PMC7925608 DOI: 10.1038/s41698-021-00155-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 01/12/2021] [Indexed: 01/31/2023] Open
Abstract
Cancer is generally characterized by acquired genomic aberrations in a broad spectrum of types and sizes, ranging from single nucleotide variants to structural variants (SVs). At least 30% of cancers have a known pathogenic SV used in diagnosis or treatment stratification. However, research into the role of SVs in cancer has been limited due to difficulties in detection. Biological and computational challenges confound SV detection in cancer samples, including intratumor heterogeneity, polyploidy, and distinguishing tumor-specific SVs from germline and somatic variants present in healthy cells. Classification of tumor-specific SVs is challenging due to inconsistencies in detected breakpoints, derived variant types and biological complexity of some rearrangements. Full-spectrum SV detection with high recall and precision requires integration of multiple algorithms and sequencing technologies to rescue variants that are difficult to resolve through individual methods. Here, we explore current strategies for integrating SV callsets and to enable the use of tumor-specific SVs in precision oncology.
Collapse
Affiliation(s)
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Patrick Kemmeren
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands
| | - Jayne Y Hehir-Kwa
- Princess Máxima Center for Pediatric Oncology, Utrecht, The Netherlands.
| |
Collapse
|
35
|
Niehus S, Jónsson H, Schönberger J, Björnsson E, Beyter D, Eggertsson HP, Sulem P, Stefánsson K, Halldórsson BV, Kehr B. PopDel identifies medium-size deletions simultaneously in tens of thousands of genomes. Nat Commun 2021; 12:730. [PMID: 33526789 PMCID: PMC7851401 DOI: 10.1038/s41467-020-20850-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 12/14/2020] [Indexed: 12/14/2022] Open
Abstract
Thousands of genomic structural variants (SVs) segregate in the human population and can impact phenotypic traits and diseases. Their identification in whole-genome sequence data of large cohorts is a major computational challenge. Most current approaches identify SVs in single genomes and afterwards merge the identified variants into a joint call set across many genomes. We describe the approach PopDel, which directly identifies deletions of about 500 to at least 10,000 bp in length in data of many genomes jointly, eliminating the need for subsequent variant merging. PopDel scales to tens of thousands of genomes as we demonstrate in evaluations on up to 49,962 genomes. We show that PopDel reliably reports common, rare and de novo deletions. On genomes with available high-confidence reference call sets PopDel shows excellent recall and precision. Genotype inheritance patterns in up to 6794 trios indicate that genotypes predicted by PopDel are more reliable than those of previous SV callers. Furthermore, PopDel's running time is competitive with the fastest tested previous tools. The demonstrated scalability and accuracy of PopDel enables routine scans for deletions in large-scale sequencing studies.
Collapse
Affiliation(s)
- Sebastian Niehus
- Regensburg Center for Interventional Immunology (RCI), Regensburg, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | | | - Janina Schönberger
- Regensburg Center for Interventional Immunology (RCI), Regensburg, Germany
- Berlin Institute of Health (BIH), Berlin, Germany
| | - Eythór Björnsson
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- Faculty of Medicine, School of Heath Sciences, University of Iceland, Reykjavík, Iceland
- Department of Internal Medicine, Landspítali-The National University Hospital of Iceland, Reykjavík, Iceland
| | | | | | | | - Kári Stefánsson
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- Faculty of Medicine, School of Heath Sciences, University of Iceland, Reykjavík, Iceland
| | - Bjarni V Halldórsson
- deCODE genetics/Amgen Inc., Reykjavík, Iceland
- School of Science and Engineering, Reykjavik University, Reykjavík, Iceland
| | - Birte Kehr
- Regensburg Center for Interventional Immunology (RCI), Regensburg, Germany.
- Berlin Institute of Health (BIH), Berlin, Germany.
- Charité-Universitätsmedizin Berlin, Berlin, Germany.
- Univeristät Regensburg, Regensburg, Germany.
| |
Collapse
|
36
|
Zarate S, Carroll A, Mahmoud M, Krasheninina O, Jun G, Salerno WJ, Schatz MC, Boerwinkle E, Gibbs RA, Sedlazeck FJ. Parliament2: Accurate structural variant calling at scale. Gigascience 2020; 9:giaa145. [PMID: 33347570 PMCID: PMC7751401 DOI: 10.1093/gigascience/giaa145] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 09/17/2020] [Accepted: 11/18/2020] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Structural variants (SVs) are critical contributors to genetic diversity and genomic disease. To predict the phenotypic impact of SVs, there is a need for better estimates of both the occurrence and frequency of SVs, preferably from large, ethnically diverse cohorts. Thus, the current standard approach requires the use of short paired-end reads, which remain challenging to detect, especially at the scale of hundreds to thousands of samples. FINDINGS We present Parliament2, a consensus SV framework that leverages multiple best-in-class methods to identify high-quality SVs from short-read DNA sequence data at scale. Parliament2 incorporates pre-installed SV callers that are optimized for efficient execution in parallel to reduce the overall runtime and costs. We demonstrate the accuracy of Parliament2 when applied to data from NovaSeq and HiSeq X platforms with the Genome in a Bottle (GIAB) SV call set across all size classes. The reported quality score per SV is calibrated across different SV types and size classes. Parliament2 has the highest F1 score (74.27%) measured across the independent gold standard from GIAB. We illustrate the compute performance by processing all 1000 Genomes samples (2,691 samples) in <1 day on GRCH38. Parliament2 improves the runtime performance of individual methods and is open source (https://github.com/slzarate/parliament2), and a Docker image, as well as a WDL implementation, is available. CONCLUSION Parliament2 provides both a highly accurate single-sample SV call set from short-read DNA sequence data and enables cost-efficient application over cloud or cluster environments, processing thousands of samples.
Collapse
Affiliation(s)
- Samantha Zarate
- DNAnexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA
- Department of Computer Science, 3400 N. Charles St. Johns Hopkins University, Baltimore, MD 21218, USA
| | - Andrew Carroll
- DNAnexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA
| | - Medhat Mahmoud
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olga Krasheninina
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Goo Jun
- Human Genetics Center, 1200 Pressler Street, University of Texas Health Science Center at Houston, Houston, TX 77040, USA
| | - William J Salerno
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Michael C Schatz
- Department of Computer Science, 3400 N. Charles St. Johns Hopkins University, Baltimore, MD 21218, USA
| | - Eric Boerwinkle
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
- Human Genetics Center, 1200 Pressler Street, University of Texas Health Science Center at Houston, Houston, TX 77040, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, One Baylor Plaza, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|