1
|
Plender EG, Prodanov T, Hsieh P, Nizamis E, Harvey WT, Sulovari A, Munson KM, Kaufman EJ, O'Neal WK, Valdmanis PN, Marschall T, Bloom JD, Eichler EE. Structural and genetic diversity in the secreted mucins, MUC5AC and MUC5B. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.18.585560. [PMID: 38562829 PMCID: PMC10983947 DOI: 10.1101/2024.03.18.585560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The secreted mucins MUC5AC and MUC5B play critical defensive roles in airway pathogen entrapment and mucociliary clearance by encoding large glycoproteins with variable number tandem repeats (VNTRs). These polymorphic and degenerate protein coding VNTRs make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5761-5762aa); however, seven haplotypes have expanded VNTRs (6291-7019aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5249-6325aa) with cysteine-rich domain and VNTR copy number variation. We grouped MUC5AC alleles into three phylogenetic clades: H1 (46%, ~5654aa), H2 (33%, ~5742aa), and H3 (7%, ~6325aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium (LD) and Tajima's D analyses reveal that East Asians carry exceptionally large MUC5AC LD blocks with an excess of rare variation (p<0.05). To validate this result, we used Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observed signatures of positive selection in H1 and H2 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Africans and Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium, consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein coding VNTRs for improved disease associations.
Collapse
Affiliation(s)
- Elizabeth G Plender
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Timofey Prodanov
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Evangelos Nizamis
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Eli J Kaufman
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Wanda K O'Neal
- Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, 27599, North Carolina, USA
| | - Paul N Valdmanis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Moorenstr. 5, 40225 Düsseldorf, Germany
| | - Jesse D Bloom
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
- Howard Hughes Medical Institute, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
2
|
Chaisson MJP, Sulovari A, Valdmanis PN, Miller DE, Eichler EE. Advances in the discovery and analyses of human tandem repeats. Emerg Top Life Sci 2023; 7:361-381. [PMID: 37905568 PMCID: PMC10806765 DOI: 10.1042/etls20230074] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 10/18/2023] [Accepted: 10/18/2023] [Indexed: 11/02/2023]
Abstract
Long-read sequencing platforms provide unparalleled access to the structure and composition of all classes of tandemly repeated DNA from STRs to satellite arrays. This review summarizes our current understanding of their organization within the human genome, their importance with respect to disease, as well as the advances and challenges in understanding their genetic diversity and functional effects. Novel computational methods are being developed to visualize and associate these complex patterns of human variation with disease, expression, and epigenetic differences. We predict accurate characterization of this repeat-rich form of human variation will become increasingly relevant to both basic and clinical human genetics.
Collapse
Affiliation(s)
- Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, U.S.A
- The Genomic and Epigenomic Regulation Program, USC Norris Cancer Center, University of Southern California, Los Angeles, CA 90089, U.S.A
| | - Arvis Sulovari
- Computational Biology, Cajal Neuroscience Inc, Seattle, WA 98102, U.S.A
| | - Paul N Valdmanis
- Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
| | - Danny E Miller
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, U.S.A
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, U.S.A
- Department of Pediatrics, University of Washington, Seattle, WA 98195, U.S.A
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, U.S.A
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, U.S.A
| |
Collapse
|
3
|
Noyes MD, Harvey WT, Porubsky D, Sulovari A, Li R, Rose NR, Audano PA, Munson KM, Lewis AP, Hoekzema K, Mantere T, Graves-Lindsay TA, Sanders AD, Goodwin S, Kramer M, Mokrab Y, Zody MC, Hoischen A, Korbel JO, McCombie WR, Eichler EE. Familial long-read sequencing increases yield of de novo mutations. Am J Hum Genet 2022; 109:631-646. [PMID: 35290762 DOI: 10.1016/j.ajhg.2022.02.014] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 02/16/2022] [Indexed: 12/11/2022] Open
Abstract
Studies of de novo mutation (DNM) have typically excluded some of the most repetitive and complex regions of the genome because these regions cannot be unambiguously mapped with short-read sequencing data. To better understand the genome-wide pattern of DNM, we generated long-read sequence data from an autism parent-child quad with an affected female where no pathogenic variant had been discovered in short-read Illumina sequence data. We deeply sequenced all four individuals by using three sequencing platforms (Illumina, Oxford Nanopore, and Pacific Biosciences) and three complementary technologies (Strand-seq, optical mapping, and 10X Genomics). Using long-read sequencing, we initially discovered and validated 171 DNMs across two children-a 20% increase in the number of de novo single-nucleotide variants (SNVs) and indels when compared to short-read callsets. The number of DNMs further increased by 5% when considering a more complete human reference (T2T-CHM13) because of the recovery of events in regions absent from GRCh38 (e.g., three DNMs in heterochromatic satellites). In total, we validated 195 de novo germline mutations and 23 potential post-zygotic mosaic mutations across both children; the overall true substitution rate based on this integrated callset is at least 1.41 × 10-8 substitutions per nucleotide per generation. We also identified six de novo insertions and deletions in tandem repeats, two of which represent structural variants. We demonstrate that long-read sequencing and assembly, especially when combined with a more complete reference genome, increases the number of DNMs by >25% compared to previous studies, providing a more complete catalog of DNM compared to short-read data alone.
Collapse
Affiliation(s)
- Michelle D Noyes
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Ruiyang Li
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Nicholas R Rose
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Tuomo Mantere
- Department of Human Genetics, Radboud University Medical Center, 6500 Nijmegen, the Netherlands; Laboratory of Cancer Genetics and Tumor Biology, Cancer and Translational Medicine Research Unit and Biocenter Oulu, University of Oulu, 90220 Oulu, Finland
| | | | - Ashley D Sanders
- European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany
| | - Sara Goodwin
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Melissa Kramer
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Younes Mokrab
- Department of Human Genetics, Sidra Medicine, PO Box 26999, Doha, Qatar; Weill Cornell Medicine, PO Box 24144, Doha, Qatar; College of Health and Life Sciences, Hamad Bin Khalifa University, PO Box 34110, Doha, Qatar
| | | | - Alexander Hoischen
- Department of Human Genetics, Radboud University Medical Center, 6500 Nijmegen, the Netherlands; Radboud Institute of Medical Life Sciences and Department of Internal Medicine and Radboud Center for Infectious Diseases, Radboud University Medical Center, 6500 Nijmegen, the Netherlands
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, 69117 Heidelberg, Germany
| | - W Richard McCombie
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
4
|
Vollger MR, Guitart X, Dishuck PC, Mercuri L, Harvey WT, Gershman A, Diekhans M, Sulovari A, Munson KM, Lewis AP, Hoekzema K, Porubsky D, Li R, Nurk S, Koren S, Miga KH, Phillippy AM, Timp W, Ventura M, Eichler EE. Segmental duplications and their variation in a complete human genome. Science 2022; 376:eabj6965. [PMID: 35357917 PMCID: PMC8979283 DOI: 10.1126/science.abj6965] [Citation(s) in RCA: 91] [Impact Index Per Article: 45.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Despite their importance in disease and evolution, highly identical segmental duplications (SDs) are among the last regions of the human reference genome (GRCh38) to be fully sequenced. Using a complete telomere-to-telomere human genome (T2T-CHM13), we present a comprehensive view of human SD organization. SDs account for nearly one-third of the additional sequence, increasing the genome-wide estimate from 5.4 to 7.0% [218 million base pairs (Mbp)]. An analysis of 268 human genomes shows that 91% of the previously unresolved T2T-CHM13 SD sequence (68.3 Mbp) better represents human copy number variation. Comparing long-read assemblies from human (n = 12) and nonhuman primate (n = 5) genomes, we systematically reconstruct the evolution and structural haplotype diversity of biomedically relevant and duplicated genes. This analysis reveals patterns of structural heterozygosity and evolutionary differences in SD organization between humans and other primates.
Collapse
Affiliation(s)
- Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xavi Guitart
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ludovica Mercuri
- Department of Biology, University of Bari, Aldo Moro, Bari 70125, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ruiyang Li
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mario Ventura
- Department of Biology, University of Bari, Aldo Moro, Bari 70125, Italy
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
5
|
Miller DE, Sulovari A, Wang T, Loucks H, Hoekzema K, Munson KM, Lewis AP, Fuerte EPA, Paschal CR, Walsh T, Thies J, Bennett JT, Glass I, Dipple KM, Patterson K, Bonkowski ES, Nelson Z, Squire A, Sikes M, Beckman E, Bennett RL, Earl D, Lee W, Allikmets R, Perlman SJ, Chow P, Hing AV, Wenger TL, Adam MP, Sun A, Lam C, Chang I, Zou X, Austin SL, Huggins E, Safi A, Iyengar AK, Reddy TE, Majoros WH, Allen AS, Crawford GE, Kishnani PS, King MC, Cherry T, Chong JX, Bamshad MJ, Nickerson DA, Mefford HC, Doherty D, Eichler EE. Targeted long-read sequencing identifies missing disease-causing variation. Am J Hum Genet 2021; 108:1436-1449. [PMID: 34216551 PMCID: PMC8387463 DOI: 10.1016/j.ajhg.2021.06.006] [Citation(s) in RCA: 87] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Accepted: 06/07/2021] [Indexed: 12/28/2022] Open
Abstract
Despite widespread clinical genetic testing, many individuals with suspected genetic conditions lack a precise diagnosis, limiting their opportunity to take advantage of state-of-the-art treatments. In some cases, testing reveals difficult-to-evaluate structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted. We performed targeted long-read sequencing (T-LRS) using adaptive sampling on the Oxford Nanopore platform on 40 individuals, 10 of whom lacked a complete molecular diagnosis. We computationally targeted up to 151 Mbp of sequence per individual and searched for pathogenic substitutions, structural variants, and methylation differences using a single data source. We detected all genomic aberrations-including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences-identified by prior clinical testing. In 8/8 individuals with complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, leading to changes in clinical management in one case. In ten individuals with suspected Mendelian conditions lacking a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in six and variants of uncertain significance in two others. T-LRS accurately identifies pathogenic structural variants, resolves complex rearrangements, and identifies Mendelian variants not detected by other technologies. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority genes and regions or complex clinical testing results.
Collapse
Affiliation(s)
- Danny E Miller
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA.
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Tianyun Wang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Hailey Loucks
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Edith P Almanza Fuerte
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Catherine R Paschal
- Department of Laboratories, Seattle Children's Hospital, Seattle, WA 98105, USA; Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Tom Walsh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Jenny Thies
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - James T Bennett
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA; Department of Laboratories, Seattle Children's Hospital, Seattle, WA 98105, USA; Center for Developmental Biology and Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA 98101, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Ian Glass
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Katrina M Dipple
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA; Center for Clinical and Translational Research, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Karynne Patterson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Emily S Bonkowski
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Zoe Nelson
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Audrey Squire
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Megan Sikes
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Erika Beckman
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Robin L Bennett
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Dawn Earl
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Winston Lee
- Department of Genetics and Development, Columbia University, New York, NY 10032, USA; Department of Ophthalmology, Columbia University, New York, NY 10032, USA
| | - Rando Allikmets
- Department of Ophthalmology, Columbia University, New York, NY 10032, USA; Department of Pathology and Cell Biology, Columbia University, New York, NY 10032, USA
| | - Seth J Perlman
- Department of Neurology, Seattle Children's Hospital, University of Washington, Seattle, WA 98105, USA
| | - Penny Chow
- Department of Pediatrics, Division of Craniofacial Medicine, University of Washington, Seattle, WA 98195, USA
| | - Anne V Hing
- Department of Pediatrics, Division of Craniofacial Medicine, University of Washington, Seattle, WA 98195, USA
| | - Tara L Wenger
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Margaret P Adam
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Angela Sun
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA; Center for Clinical and Translational Research, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Christina Lam
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA; Center for Integrative Brain Research, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Irene Chang
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Xue Zou
- Program in Computational Biology & Bioinformatics, Duke University, Durham, NC 27710, USA
| | - Stephanie L Austin
- Department of Pediatrics, Division of Medical Genetics, Duke University, Durham, NC 27708, USA
| | - Erin Huggins
- Department of Pediatrics, Division of Medical Genetics, Duke University, Durham, NC 27708, USA
| | - Alexias Safi
- Department of Pediatrics, Division of Medical Genetics, Duke University, Durham, NC 27708, USA
| | - Apoorva K Iyengar
- Department of Biostatistics and Bioinformatics, Duke University; Durham, NC 27708, USA; University Program in Genetics and Genomics, Duke University; Durham, NC 27708, USA
| | - Timothy E Reddy
- Department of Biostatistics and Bioinformatics, Duke University; Durham, NC 27708, USA
| | - William H Majoros
- Department of Biostatistics and Bioinformatics, Duke University; Durham, NC 27708, USA
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University; Durham, NC 27708, USA
| | - Gregory E Crawford
- Department of Pediatrics, Division of Medical Genetics, Duke University, Durham, NC 27708, USA
| | - Priya S Kishnani
- Department of Pediatrics, Division of Medical Genetics, Duke University, Durham, NC 27708, USA
| | - Mary-Claire King
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Tim Cherry
- Center for Developmental Biology and Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Jessica X Chong
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Michael J Bamshad
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Deborah A Nickerson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Heather C Mefford
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
| | - Dan Doherty
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA; Department of Pediatrics, Division of Developmental Medicine, University of Washington and Seattle Children's Hospital, Seattle, WA 98105, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
6
|
Course MM, Sulovari A, Gudsnuk K, Eichler EE, Valdmanis PN. Characterizing nucleotide variation and expansion dynamics in human-specific variable number tandem repeats. Genome Res 2021; 31:1313-1324. [PMID: 34244228 PMCID: PMC8327921 DOI: 10.1101/gr.275560.121] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 06/25/2021] [Indexed: 12/14/2022]
Abstract
There are more than 55,000 variable number tandem repeats (VNTRs) in the human genome, notable for both their striking polymorphism and mutability. Despite their role in human evolution and genomic variation, they have yet to be studied collectively and in detail, partially owing to their large size, variability, and predominant location in noncoding regions. Here, we examine 467 VNTRs that are human-specific expansions, unique to one location in the genome, and not associated with retrotransposons. We leverage publicly available long-read genomes, including from the Human Genome Structural Variant Consortium, to ascertain the exact nucleotide composition of these VNTRs and compare their composition of alleles. We then confirm repeat unit composition in more than 3000 short-read samples from the 1000 Genomes Project. Our analysis reveals that these VNTRs contain highly structured repeat motif organization, modified by frequent deletion and duplication events. Although overall VNTR compositions tend to remain similar between 1000 Genomes Project superpopulations, we describe a notable exception with substantial differences in repeat composition (in PCBP3), as well as several VNTRs that are significantly different in length between superpopulations (in ART1, PROP1, DYNC2I1, and LOC102723906). We also observe that most of these VNTRs are expanded in archaic human genomes, yet remain stable in length between single generations. Collectively, our findings indicate that repeat motif variability, repeat composition, and repeat length are all informative modalities to consider when characterizing VNTRs and their contribution to genomic variation.
Collapse
Affiliation(s)
- Meredith M Course
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Kathryn Gudsnuk
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, Washington 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Paul N Valdmanis
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, Washington 98195, USA
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
7
|
Mao Y, Catacchio CR, Hillier LW, Porubsky D, Li R, Sulovari A, Fernandes JD, Montinaro F, Gordon DS, Storer JM, Haukness M, Fiddes IT, Murali SC, Dishuck PC, Hsieh P, Harvey WT, Audano PA, Mercuri L, Piccolo I, Antonacci F, Munson KM, Lewis AP, Baker C, Underwood JG, Hoekzema K, Huang TH, Sorensen M, Walker JA, Hoffman J, Thibaud-Nissen F, Salama SR, Pang AWC, Lee J, Hastie AR, Paten B, Batzer MA, Diekhans M, Ventura M, Eichler EE. A high-quality bonobo genome refines the analysis of hominid evolution. Nature 2021; 594:77-81. [PMID: 33953399 PMCID: PMC8172381 DOI: 10.1038/s41586-021-03519-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 04/07/2021] [Indexed: 12/17/2022]
Abstract
The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3–5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome. A high-quality bonobo genome assembly provides insights into incomplete lineage sorting in hominids and its relevance to gene evolution and the genetic relationship among living hominids.
Collapse
Affiliation(s)
- Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - LaDeana W Hillier
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ruiyang Li
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jason D Fernandes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Francesco Montinaro
- Department of Biology, University of Bari, Bari, Italy.,Estonian Biocentre, Institute of Genomics, Tartu, Estonia
| | - David S Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Shwetha Canchi Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tzu-Hsueh Huang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jerilyn A Walker
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Jinna Hoffman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R Salama
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.,Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark A Batzer
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mario Ventura
- Department of Biology, University of Bari, Bari, Italy.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA. .,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
8
|
Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R, Yilmaz F, Zhao X, Hsieh P, Lee J, Kumar S, Lin J, Rausch T, Chen Y, Ren J, Santamarina M, Höps W, Ashraf H, Chuang NT, Yang X, Munson KM, Lewis AP, Fairley S, Tallon LJ, Clarke WE, Basile AO, Byrska-Bishop M, Corvelo A, Evani US, Lu TY, Chaisson MJP, Chen J, Li C, Brand H, Wenger AM, Ghareghani M, Harvey WT, Raeder B, Hasenfeld P, Regier AA, Abel HJ, Hall IM, Flicek P, Stegle O, Gerstein MB, Tubio JMC, Mu Z, Li YI, Shi X, Hastie AR, Ye K, Chong Z, Sanders AD, Zody MC, Talkowski ME, Mills RE, Devine SE, Lee C, Korbel JO, Marschall T, Eichler EE. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 2021; 372:eabf7117. [PMID: 33632895 PMCID: PMC8026704 DOI: 10.1126/science.abf7117] [Citation(s) in RCA: 270] [Impact Index Per Article: 90.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/09/2021] [Indexed: 12/14/2022]
Abstract
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Collapse
Affiliation(s)
- Peter Ebert
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Bernardo Rodriguez-Martin
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Marc Jan Bonder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Jana Ebler
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
| | - Rebecca Serra Mari
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA
| | - Xuefang Zhao
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Joyce Lee
- Bionano Genomics, San Diego, CA 92121, USA
| | - Sushant Kumar
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jiadong Lin
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Tobias Rausch
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Yu Chen
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Jingwen Ren
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Martin Santamarina
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Wolfram Höps
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Hufsah Ashraf
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - Nelson T Chuang
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Xiaofei Yang
- School of Computer Science and Technology, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | | | | | | | | | | | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Junjie Chen
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | - Harrison Brand
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Aaron M Wenger
- Pacific Biosciences of California, Menlo Park, CA 94025, USA
| | - Maryam Ghareghani
- Max Planck Institute for Informatics, Saarland Informatics Campus E1.4, 66123 Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland University, Saarland Informatics Campus E1.3, 66123 Saarbrücken, Germany
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA
| | - Benjamin Raeder
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Patrick Hasenfeld
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Allison A Regier
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Haley J Abel
- Department of Medicine, Washington University, St. Louis, MO 63108, USA
| | - Ira M Hall
- Department of Genetics, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Mark B Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, BASS 432 and 437, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Jose M C Tubio
- Genomes and Disease, Centre for Research in Molecular Medicine and Chronic Diseases (CIMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain
- Department of Zoology, Genetics, and Physical Anthropology, Universidade de Santiago de Compostela, Santiago de Compostela, Spain
| | - Zepeng Mu
- Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
| | | | - Kai Ye
- School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, Shaanxi, 710049, China
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Zechen Chong
- Department of Genetics and Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | | | - Michael E Talkowski
- Center for Genomic Medicine, Massachusetts General Hospital, Department of Neurology, Harvard Medical School, Boston, MA 02114, USA
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Ryan E Mills
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
- Department of Human Genetics, University of Michigan, 1241 E. Catherine Street, Ann Arbor, MI 48109, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, 670 W Baltimore Street, Baltimore, MD 21201, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, USA.
- Precision Medicine Center, The First Affiliated Hospital of Xi'an Jiaotong University, 277 West Yanta Road, Xi'an, 710061, Shaanxi, China
- Department of Graduate Studies-Life Sciences, Ewha Womans University, Ewhayeodae-gil, Seodaemun-gu, Seoul 120-750, South Korea
| | - Jan O Korbel
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany.
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Tobias Marschall
- Heinrich Heine University, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Moorenstraße 20, 40225 Düsseldorf, Germany.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, 3720 15th Avenue NE, Seattle, WA 98195-5065, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
9
|
Porubsky D, Ebert P, Audano PA, Vollger MR, Harvey WT, Marijon P, Ebler J, Munson KM, Sorensen M, Sulovari A, Haukness M, Ghareghani M, Lansdorp PM, Paten B, Devine SE, Sanders AD, Lee C, Chaisson MJP, Korbel JO, Eichler EE, Marschall T. Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads. Nat Biotechnol 2021; 39:302-308. [PMID: 33288906 PMCID: PMC7954704 DOI: 10.1038/s41587-020-0719-5] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 09/16/2020] [Indexed: 12/18/2022]
Abstract
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.
Collapse
Affiliation(s)
- David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter Ebert
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pierre Marijon
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | - Jana Ebler
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Maryam Ghareghani
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany
- Center for Bioinformatics, Saarland University, and Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Peter M Lansdorp
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, British Columbia, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Scott E Devine
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China
- Department of Life Science, Ewha Womans University, Seoul, Republic of Korea
| | - Mark J P Chaisson
- Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany.
| |
Collapse
|
10
|
Mathkar PP, Chen X, Sulovari A, Li D. Characterization of Hepatitis B Virus Integrations Identified in Hepatocellular Carcinoma Genomes. Viruses 2021; 13:v13020245. [PMID: 33557409 PMCID: PMC7915589 DOI: 10.3390/v13020245] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 01/31/2021] [Accepted: 02/02/2021] [Indexed: 12/19/2022] Open
Abstract
Hepatocellular carcinoma (HCC) is a leading cause of cancer-related mortality. Almost half of HCC cases are associated with hepatitis B virus (HBV) infections, which often lead to HBV sequence integrations in the human genome. Accurate identification of HBV integration sites at a single nucleotide resolution is critical for developing a better understanding of the cancer genome landscape and of the disease itself. Here, we performed further analyses and characterization of HBV integrations identified by our recently reported VIcaller platform in recurrent or known HCC genes (such as TERT, MLL4, and CCNE1) as well as non-recurrent cancer-related genes (such as CSMD2, NKD2, and RHOU). Our pathway enrichment analysis revealed multiple pathways involving the alcohol dehydrogenase 4 gene, such as the metabolism pathways of retinol, tyrosine, and fatty acid. Further analysis of the HBV integration sites revealed distinct patterns involving the integration upper breakpoints, integrated genome lengths, and integration allele fractions between tumor and normal tissues. Our analysis also implies that the VIcaller method has diagnostic potential through discovering novel clonal integrations in cancer-related genes. In conclusion, although VIcaller is a hypothesis free virome-wide approach, it can still be applied to accurately identify genome-wide integration events of a specific candidate virus and their integration allele fractions.
Collapse
Affiliation(s)
- Pranav P. Mathkar
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA; (P.P.M.); (A.S.)
| | - Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA; (P.P.M.); (A.S.)
- Institute for the Advanced Study of Human Biology, Kyoto University, Kyoto 606-8501, Japan
- Correspondence: (X.C.); (D.L.)
| | - Arvis Sulovari
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA; (P.P.M.); (A.S.)
- Cajal Neuroscience Inc., Seattle, WA 98102, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA; (P.P.M.); (A.S.)
- Department of Biomedical Science, Charles E. Schmidt College of Medicine, Florida Atlantic University, Boca Raton, FL 33431, USA
- Correspondence: (X.C.); (D.L.)
| |
Collapse
|
11
|
Wang T, Hoekzema K, Vecchio D, Wu H, Sulovari A, Coe BP, Gillentine MA, Wilfert AB, Perez-Jurado LA, Kvarnung M, Sleyp Y, Earl RK, Rosenfeld JA, Geisheker MR, Han L, Du B, Barnett C, Thompson E, Shaw M, Carroll R, Friend K, Catford R, Palmer EE, Zou X, Ou J, Li H, Guo H, Gerdts J, Avola E, Calabrese G, Elia M, Greco D, Lindstrand A, Nordgren A, Anderlid BM, Vandeweyer G, Van Dijck A, Van der Aa N, McKenna B, Hancarova M, Bendova S, Havlovicova M, Malerba G, Bernardina BD, Muglia P, van Haeringen A, Hoffer MJV, Franke B, Cappuccio G, Delatycki M, Lockhart PJ, Manning MA, Liu P, Scheffer IE, Brunetti-Pierri N, Rommelse N, Amaral DG, Santen GWE, Trabetti E, Sedláček Z, Michaelson JJ, Pierce K, Courchesne E, Kooy RF, Nordenskjöld M, Romano C, Peeters H, Bernier RA, Gecz J, Xia K, Eichler EE. Large-scale targeted sequencing identifies risk genes for neurodevelopmental disorders. Nat Commun 2020; 11:4932. [PMID: 33004838 PMCID: PMC7530681 DOI: 10.1038/s41467-020-18723-y] [Citation(s) in RCA: 83] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 09/04/2020] [Indexed: 02/08/2023] Open
Abstract
Most genes associated with neurodevelopmental disorders (NDDs) were identified with an excess of de novo mutations (DNMs) but the significance in case-control mutation burden analysis is unestablished. Here, we sequence 63 genes in 16,294 NDD cases and an additional 62 genes in 6,211 NDD cases. By combining these with published data, we assess a total of 125 genes in over 16,000 NDD cases and compare the mutation burden to nonpsychiatric controls from ExAC. We identify 48 genes (25 newly reported) showing significant burden of ultra-rare (MAF < 0.01%) gene-disruptive mutations (FDR 5%), six of which reach family-wise error rate (FWER) significance (p < 1.25E-06). Among these 125 targeted genes, we also reevaluate DNM excess in 17,426 NDD trios with 6,499 new autism trios. We identify 90 genes enriched for DNMs (FDR 5%; e.g., GABRG2 and UIMC1); of which, 61 reach FWER significance (p < 3.64E-07; e.g., CASZ1). In addition to doubling the number of patients for many NDD risk genes, we present phenotype-genotype correlations for seven risk genes (CTCF, HNRNPU, KCNQ3, ZBTB18, TCF12, SPEN, and LEO1) based on this large-scale targeted sequencing effort.
Collapse
Affiliation(s)
- Tianyun Wang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Davide Vecchio
- Rare Disease and Medical Genetics, Academic Department of Pediatrics, Bambino Gesù Children's Hospital, Rome, Italy
- Genetics and Rare Diseases Research Division, Bambino Gesù Children's Hospital, Rome, Italy
| | - Huidan Wu
- Center for Medical Genetics & Hunan Provincial Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Bradley P Coe
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | | | - Amy B Wilfert
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Luis A Perez-Jurado
- Paediatric and Reproductive Genetics unit, Women's and Children's Hospital, Adelaide, SA, Australia
- South Australian Health and Medical Research Institute, Adelaide, SA, Australia
- Genetics Unit, Universitat Pompeu Fabra, Hospital del Mar Research Institute (IMIM) and CIBERER, Barcelona, Spain
| | - Malin Kvarnung
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Yoeri Sleyp
- Centre for Human Genetics, KU Leuven and Leuven Autism Research (LAuRes), Leuven, Belgium
| | - Rachel K Earl
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Jill A Rosenfeld
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Baylor Genetics, Houston, TX, USA
| | | | - Lin Han
- Center for Medical Genetics & Hunan Provincial Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Bing Du
- Center for Medical Genetics & Hunan Provincial Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Chris Barnett
- Paediatric and Reproductive Genetics unit, Women's and Children's Hospital, Adelaide, SA, Australia
- Adelaide Medical School and the Robinson Research Institute, the University of Adelaide, Adelaide, SA, Australia
| | - Elizabeth Thompson
- Paediatric and Reproductive Genetics unit, Women's and Children's Hospital, Adelaide, SA, Australia
| | - Marie Shaw
- Adelaide Medical School and the Robinson Research Institute, the University of Adelaide, Adelaide, SA, Australia
| | - Renee Carroll
- Adelaide Medical School and the Robinson Research Institute, the University of Adelaide, Adelaide, SA, Australia
| | - Kathryn Friend
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA, Australia
| | - Rachael Catford
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA, Australia
| | - Elizabeth E Palmer
- Genetics of Learning Disability Service, Hunter New England Health Service, Waratah, NSW, Australia
- School of Women's and Children's Health, University of New South Wales, Randwick, NSW, Australia
| | - Xiaobing Zou
- Children Development Behavior Center, The Third Affiliated Hospital, Sun Yat-Sen University, Guangzhou, Guangdong, China
| | - Jianjun Ou
- Mental Health Institute of the Second Xiangya Hospital, Central South University, Changsha, China
| | - Honghui Li
- Key Laboratory of Developmental Disorders in Children, Liuzhou Maternity and Child Healthcare Hospital, Liuzhou, China
| | - Hui Guo
- Center for Medical Genetics & Hunan Provincial Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
| | - Jennifer Gerdts
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | | | | | | | | | - Anna Lindstrand
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Ann Nordgren
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Britt-Marie Anderlid
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | - Geert Vandeweyer
- Department of Medical Genetics, University of Antwerp, Antwerp, Belgium
| | - Anke Van Dijck
- Department of Medical Genetics, University of Antwerp, Antwerp, Belgium
| | | | - Brooke McKenna
- Department of Psychology, Emory University, Atlanta, GA, USA
| | - Miroslava Hancarova
- Department of Biology and Medical Genetics, Charles University 2nd Faculty of Medicine and University Hospital Motol, Prague, Czech Republic
| | - Sarka Bendova
- Department of Biology and Medical Genetics, Charles University 2nd Faculty of Medicine and University Hospital Motol, Prague, Czech Republic
| | - Marketa Havlovicova
- Department of Biology and Medical Genetics, Charles University 2nd Faculty of Medicine and University Hospital Motol, Prague, Czech Republic
| | - Giovanni Malerba
- Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, Verona, Italy
| | | | | | - Arie van Haeringen
- Department of Clinical Genetics, Leiden University Medical Center (LUMC), Leiden, Netherlands
| | - Mariette J V Hoffer
- Department of Clinical Genetics, Leiden University Medical Center (LUMC), Leiden, Netherlands
| | - Barbara Franke
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
- Department of Psychiatry, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
| | - Gerarda Cappuccio
- Department of Translational Medicine, Federico II University, Naples, Italy
- Telethon Institute of Genetics and Medicine, Pozzuoli, Naples, Italy
| | | | - Paul J Lockhart
- Murdoch Children's Research Institute, Melbourne, Australia
- Department of Paediatrics, University of Melbourne, Parkville, VIC, Australia
| | - Melanie A Manning
- Division of Medical Genetics, Department of Pediatrics, Stanford University, Stanford, CA, USA
- Department of Pathology, Stanford University, Stanford, CA, USA
| | - Pengfei Liu
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Baylor Genetics, Houston, TX, USA
| | - Ingrid E Scheffer
- Murdoch Children's Research Institute, Melbourne, Australia
- Department of Paediatrics, University of Melbourne, Royal Children's Hospital, Melbourne, VIC, Australia
- Department of Medicine, University of Melbourne, Austin Health, Melbourne, Australia
- The Florey Institute of Neuroscience and Mental Health, Parkville, VIC, Australia
| | - Nicola Brunetti-Pierri
- Department of Translational Medicine, Federico II University, Naples, Italy
- Telethon Institute of Genetics and Medicine, Pozzuoli, Naples, Italy
| | - Nanda Rommelse
- Department of Psychiatry, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen, Netherlands
- Karakter Child and Adolescent Psychiatry Center, Nijmegen, Netherlands
| | - David G Amaral
- Department of Psychiatry and Behavioral Sciences and the MIND Institute, University of California, Davis, Sacramento, CA, USA
| | - Gijs W E Santen
- Department of Clinical Genetics, Leiden University Medical Center (LUMC), Leiden, Netherlands
| | - Elisabetta Trabetti
- Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, Verona, Italy
| | - Zdeněk Sedláček
- Department of Biology and Medical Genetics, Charles University 2nd Faculty of Medicine and University Hospital Motol, Prague, Czech Republic
| | - Jacob J Michaelson
- Department of Psychiatry, University of Iowa Carver College of Medicine, Iowa City, IA, USA
| | - Karen Pierce
- Department of Neurosciences, UC San Diego Autism Center, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - Eric Courchesne
- Department of Neurosciences, UC San Diego Autism Center, School of Medicine, University of California San Diego, La Jolla, CA, USA
| | - R Frank Kooy
- Department of Medical Genetics, University of Antwerp, Antwerp, Belgium
| | - Magnus Nordenskjöld
- Department of Molecular Medicine and Surgery, Center for Molecular Medicine, Karolinska Institutet, Stockholm, Sweden
- Department of Clinical Genetics, Karolinska University Hospital, Stockholm, Sweden
| | | | - Hilde Peeters
- Centre for Human Genetics, KU Leuven and Leuven Autism Research (LAuRes), Leuven, Belgium
| | - Raphael A Bernier
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, USA
| | - Jozef Gecz
- South Australian Health and Medical Research Institute, Adelaide, SA, Australia
- Adelaide Medical School and the Robinson Research Institute, the University of Adelaide, Adelaide, SA, Australia
- Genetics and Molecular Pathology, SA Pathology, Adelaide, SA, Australia
| | - Kun Xia
- Center for Medical Genetics & Hunan Provincial Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan, China
- CAS Center for Excellence in Brain Science and Intelligences Technology (CEBSIT), Chinese Academy of Sciences, Shanghai, China
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
12
|
Course MM, Gudsnuk K, Smukowski SN, Winston K, Desai N, Ross JP, Sulovari A, Bourassa CV, Spiegelman D, Couthouis J, Yu CE, Tsuang DW, Jayadev S, Kay MA, Gitler AD, Dupre N, Eichler EE, Dion PA, Rouleau GA, Valdmanis PN. Evolution of a Human-Specific Tandem Repeat Associated with ALS. Am J Hum Genet 2020; 107:445-460. [PMID: 32750315 DOI: 10.1016/j.ajhg.2020.07.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 07/08/2020] [Indexed: 12/12/2022] Open
Abstract
Tandem repeats are proposed to contribute to human-specific traits, and more than 40 tandem repeat expansions are known to cause neurological disease. Here, we characterize a human-specific 69 bp variable number tandem repeat (VNTR) in the last intron of WDR7, which exhibits striking variability in both copy number and nucleotide composition, as revealed by long-read sequencing. In addition, greater repeat copy number is significantly enriched in three independent cohorts of individuals with sporadic amyotrophic lateral sclerosis (ALS). Each unit of the repeat forms a stem-loop structure with the potential to produce microRNAs, and the repeat RNA can aggregate when expressed in cells. We leveraged its remarkable sequence variability to align the repeat in 288 samples and uncover its mechanism of expansion. We found that the repeat expands in the 3'-5' direction, in groups of repeat units divisible by two. The expansion patterns we observed were consistent with duplication events, and a replication error called template switching. We also observed that the VNTR is expanded in both Denisovan and Neanderthal genomes but is fixed at one copy or fewer in non-human primates. Evaluating the repeat in 1000 Genomes Project samples reveals that some repeat segments are solely present or absent in certain geographic populations. The large size of the repeat unit in this VNTR, along with our multiplexed sequencing strategy, provides an unprecedented opportunity to study mechanisms of repeat expansion, and a framework for evaluating the roles of VNTRs in human evolution and disease.
Collapse
|
13
|
Cantsilieris S, Sunkin SM, Johnson ME, Anaclerio F, Huddleston J, Baker C, Dougherty ML, Underwood JG, Sulovari A, Hsieh P, Mao Y, Catacchio CR, Malig M, Welch AE, Sorensen M, Munson KM, Jiang W, Girirajan S, Ventura M, Lamb BT, Conlon RA, Eichler EE. An evolutionary driver of interspersed segmental duplications in primates. Genome Biol 2020; 21:202. [PMID: 32778141 PMCID: PMC7419210 DOI: 10.1186/s13059-020-02074-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 06/08/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human-ape gene families, nuclear pore interacting protein (NPIP). RESULTS Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis. CONCLUSIONS LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to NPIP gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution.
Collapse
Affiliation(s)
- Stuart Cantsilieris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Present Address: Centre for Eye Research Australia, Department of Surgery (Ophthalmology), University of Melbourne, Royal Victorian Eye and Ear Hospital, East Melbourne, VIC, 3002, Australia
| | | | - Matthew E Johnson
- Center for Spatial and Functional Genomics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Fabio Anaclerio
- Department of Biology-Genetics, University of Bari, Bari, Italy
| | - John Huddleston
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, 98195, USA
| | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Max L Dougherty
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Jason G Underwood
- Pacific Biosciences (PacBio) of California, Incorporated, Menlo Park, CA, 94025, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | | | - Maika Malig
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Present Address: Department of Molecular and Cellular Biology, University of California, Davis, CA, 95616, USA
- Present Address: Integrative Genetics and Genomics Graduate Group, University of California, Davis, CA, 95616, USA
| | - AnneMarie E Welch
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
- Present Address: Brain and Mitochondrial Research, Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, VIC, Australia
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Weihong Jiang
- Case Transgenic and Targeting Facility, Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Santhosh Girirajan
- Department of Biochemistry and Molecular Biology, Department of Anthropology, Pennsylvania State University, University Park, PA, 16802, USA
| | - Mario Ventura
- Department of Biology-Genetics, University of Bari, Bari, Italy
| | - Bruce T Lamb
- Stark Neurosciences Research Institute, Indiana University School of Medicine, Indianapolis, IN, 46202, USA
| | - Ronald A Conlon
- Case Transgenic and Targeting Facility, Department of Genetics and Genome Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.
- Howard Hughes Medical Institute, University of Washington School of Medicine, 3720 15th Ave NE, S413C, Box 355065, Seattle, WA, 98195-5065, USA.
| |
Collapse
|
14
|
Sulovari A, Li R, Audano PA, Porubsky D, Vollger MR, Logsdon GA, Warren WC, Pollen AA, Chaisson MJP, Eichler EE. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc Natl Acad Sci U S A 2019; 116:23243-23253. [PMID: 31659027 PMCID: PMC6859368 DOI: 10.1073/pnas.1912175116] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Short tandem repeats (STRs) and variable number tandem repeats (VNTRs) are important sources of natural and disease-causing variation, yet they have been problematic to resolve in reference genomes and genotype with short-read technology. We created a framework to model the evolution and instability of STRs and VNTRs in apes. We phased and assembled 3 ape genomes (chimpanzee, gorilla, and orangutan) using long-read and 10x Genomics linked-read sequence data for 21,442 human tandem repeats discovered in 6 haplotype-resolved assemblies of Yoruban, Chinese, and Puerto Rican origin. We define a set of 1,584 STRs/VNTRs expanded specifically in humans, including large tandem repeats affecting coding and noncoding portions of genes (e.g., MUC3A, CACNA1C). We show that short interspersed nuclear element-VNTR-Alu (SVA) retrotransposition is the main mechanism for distributing GC-rich human-specific tandem repeat expansions throughout the genome but with a bias against genes. In contrast, we observe that VNTRs not originating from retrotransposons have a propensity to cluster near genes, especially in the subtelomere. Using tissue-specific expression from human and chimpanzee brains, we identify genes where transcript isoform usage differs significantly, likely caused by cryptic splicing variation within VNTRs. Using single-cell expression from cerebral organoids, we observe a strong effect for genes associated with transcription profiles analogous to intermediate progenitor cells. Finally, we compare the sequence composition of some of the largest human-specific repeat expansions and identify 52 STRs/VNTRs with at least 40 uninterrupted pure tracts as candidates for genetically unstable regions associated with disease.
Collapse
Affiliation(s)
- Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195
| | - Ruiyang Li
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195
| | - Wesley C Warren
- Bond Life Sciences Center, University of Missouri, Columbia, MO 65201
| | - Alex A Pollen
- Department of Neurology, University of California, San Francisco, CA 94143
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195
- Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195;
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195
| |
Collapse
|
15
|
Chen X, Kost J, Sulovari A, Wong N, Liang WS, Cao J, Li D. A virome-wide clonal integration analysis platform for discovering cancer viral etiology. Genome Res 2019; 29:819-830. [PMID: 30872350 PMCID: PMC6499315 DOI: 10.1101/gr.242529.118] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 03/11/2019] [Indexed: 12/31/2022]
Abstract
Oncoviral infection is responsible for 12%–15% of cancer in humans. Convergent evidence from epidemiology, pathology, and oncology suggests that new viral etiologies for cancers remain to be discovered. Oncoviral profiles can be obtained from cancer genome sequencing data; however, widespread viral sequence contamination and noncausal viruses complicate the process of identifying genuine oncoviruses. Here, we propose a novel strategy to address these challenges by performing virome-wide screening of early-stage clonal viral integrations. To implement this strategy, we developed VIcaller, a novel platform for identifying viral integrations that are derived from any characterized viruses and shared by a large proportion of tumor cells using whole-genome sequencing (WGS) data. The sensitivity and precision were confirmed with simulated and benchmark cancer data sets. By applying this platform to cancer WGS data sets with proven or speculated viral etiology, we newly identified or confirmed clonal integrations of hepatitis B virus (HBV), human papillomavirus (HPV), Epstein-Barr virus (EBV), and BK Virus (BKV), suggesting the involvement of these viruses in early stages of tumorigenesis in affected tumors, such as HBV in TERT and KMT2B (also known as MLL4) gene loci in liver cancer, HPV and BKV in bladder cancer, and EBV in non-Hodgkin's lymphoma. We also showed the capacity of VIcaller to identify integrations from some uncharacterized viruses. This is the first study to systematically investigate the strategy and method of virome-wide screening of clonal integrations to identify oncoviruses. Searching clonal viral integrations with our platform has the capacity to identify virus-caused cancers and discover cancer viral etiologies.
Collapse
Affiliation(s)
- Xun Chen
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Jason Kost
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Arvis Sulovari
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA
| | - Nathalie Wong
- Department of Anatomical and Cellular Pathology, Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, NT, Hong Kong 999077, P.R. China
| | - Winnie S Liang
- Translational Genomics Research Institute, Phoenix, Arizona 85004, USA
| | - Jian Cao
- Division of Medical Oncology, Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, New Jersey 08903, USA.,Department of Medicine, Rutgers Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, New Brunswick, New Jersey 08903, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA.,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, Vermont 05405, USA.,Department of Computer Science, University of Vermont, Burlington, Vermont 05405, USA
| |
Collapse
|
16
|
Sulovari A, Li D. VIpower: Simulation-based tool for estimating power of viral integration detection via high-throughput sequencing. Genomics 2019; 112:207-211. [PMID: 30710609 DOI: 10.1016/j.ygeno.2019.01.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Revised: 12/31/2018] [Accepted: 01/22/2019] [Indexed: 12/12/2022]
Abstract
Viral sequence integrations in the human genome have been implicated in various human diseases. Viral integrations remain among the most challenging-to-detect structural changes of the human genome. No studies have systematically analyzed how molecular and bioinformatics factors affect the power (sensitivity) to detect viral integrations using high-throughput sequencing (HTS). We selected a wide-range of molecular and bioinformatics factors covering genome sequence characteristics, HTS features, and viral integration detection. We designed a fast simulation-based framework to model the process of detecting variable viral integration events in the human genome. We then examined the associations of selected factors with viral integration detection power. We identified six factors that significantly affected viral integration detection power (P < 2 × 10-16). The strongest factors associated with detection power included proportion of sample cells with clonal viral integrations (Pearson's ρ = 0.64), sequencing depth (ρ = 0.37), length of viral integration (ρ = 0.37), paired-end read insert size (ρ = 0.23), user-defined threshold (number of supporting reads) to claim successful identification of integrations (ρ = -0.19), and read length (when sequence volume was fixed) (ρ = -0.09). As the first tool of its kind, VIpower incorporates all these factors, which can be manipulated in concert with each other to optimize the detection power. This tool may be used to estimate viral integration detection power for various combinations of sequencing or analytic parameters. It may also be used to estimate the parameters required to achieve a specific power when designing new sequencing experiments.
Collapse
Affiliation(s)
- Arvis Sulovari
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT 05405, USA; Department of Computer Science, University of Vermont, Burlington, VT 05405, USA; Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, VT 05405, USA.
| |
Collapse
|
17
|
Audano PA, Sulovari A, Graves-Lindsay TA, Cantsilieris S, Sorensen M, Welch AE, Dougherty ML, Nelson BJ, Shah A, Dutcher SK, Warren WC, Magrini V, McGrath SD, Li YI, Wilson RK, Eichler EE. Characterizing the Major Structural Variant Alleles of the Human Genome. Cell 2019; 176:663-675.e19. [PMID: 30661756 PMCID: PMC6438697 DOI: 10.1016/j.cell.2018.12.019] [Citation(s) in RCA: 271] [Impact Index Per Article: 54.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 09/01/2018] [Accepted: 12/12/2018] [Indexed: 12/17/2022]
Abstract
In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity.
Collapse
Affiliation(s)
- Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Tina A Graves-Lindsay
- McDonnell Genome Institute, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Stuart Cantsilieris
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - AnneMarie E Welch
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Max L Dougherty
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Bradley J Nelson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Ankeeta Shah
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Susan K Dutcher
- McDonnell Genome Institute, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Wesley C Warren
- McDonnell Genome Institute, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - Vincent Magrini
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH 43205, USA; The Ohio State University College of Medicine, Columbus, OH 43210, USA
| | - Sean D McGrath
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH 43205, USA
| | - Yang I Li
- Section of Genetic Medicine, University of Chicago, Chicago, IL 60637, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Richard K Wilson
- Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH 43205, USA; The Ohio State University College of Medicine, Columbus, OH 43210, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
18
|
Abstract
Next-generation sequencing (NGS) is now more accessible to clinicians and researchers. As a result, our understanding of the genetics of neurodevelopmental disorders (NDDs) has rapidly advanced over the past few years. NGS has led to the discovery of new NDD genes with an excess of recurrent de novo mutations (DNMs) when compared to controls. Development of large-scale databases of normal and disease variation has given rise to metrics exploring the relative tolerance of individual genes to human mutation. Genetic etiology and diagnosis rates have improved, which have led to the discovery of new pathways and tissue types relevant to NDDs. In this review, we highlight several key findings based on the discovery of recurrent DNMs ranging from copy number variants to point mutations. We explore biases and patterns of DNM enrichment and the role of mosaicism and secondary mutations in variable expressivity. We discuss the benefit of whole-genome sequencing (WGS) over whole-exome sequencing (WES) to understand more complex, multifactorial cases of NDD and explain how this improved understanding aids diagnosis and management of these disorders. Comprehensive assessment of the DNM landscape across the genome using WGS and other technologies will lead to the development of novel functional and bioinformatics approaches to interpret DNMs and drive new insights into NDD biology.
Collapse
Affiliation(s)
- Amy B Wilfert
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Tychele N Turner
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Bradley P Coe
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
19
|
Little AC, Sulovari A, Danyal K, Heppner DE, Seward DJ, van der Vliet A. Paradoxical roles of dual oxidases in cancer biology. Free Radic Biol Med 2017; 110:117-132. [PMID: 28578013 PMCID: PMC5535817 DOI: 10.1016/j.freeradbiomed.2017.05.024] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2017] [Revised: 05/26/2017] [Accepted: 05/30/2017] [Indexed: 02/06/2023]
Abstract
Dysregulated oxidative metabolism is a well-recognized aspect of cancer biology, and many therapeutic strategies are based on targeting cancers by altering cellular redox pathways. The NADPH oxidases (NOXes) present an important enzymatic source of biological oxidants, and the expression and activation of several NOX isoforms are frequently dysregulated in many cancers. Cell-based studies have demonstrated a role for several NOX isozymes in controlling cell proliferation and/or cell migration, further supporting a potential contributing role for NOX in promoting cancer. While various NOX isoforms are often upregulated in cancers, paradoxical recent findings indicate that dual oxidases (DUOXes), normally prominently expressed in epithelial lineages, are frequently suppressed in epithelial-derived cancers by epigenetic mechanisms, although the functional relevance of such DUOX silencing has remained unclear. This review will briefly summarize our current understanding regarding the importance of reactive oxygen species (ROS) and NOXes in cancer biology, and focus on recent observations indicating the unique and seemingly opposing roles of DUOX enzymes in cancer biology. We will discuss current knowledge regarding the functional properties of DUOX, and recent studies highlighting mechanistic consequences of DUOX1 loss in lung cancer, and its consequences for tumor invasiveness and current anticancer therapy. Finally, we will also discuss potentially unique roles for the DUOX maturation factors. Overall, a better understanding of mechanisms that regulate DUOX and the functional consequences of DUOX silencing in cancer may offer valuable new diagnostic insights and novel therapeutic opportunities.
Collapse
Affiliation(s)
- Andrew C Little
- Department of Pathology and Laboratory Medicine, Robert Larner, M.D. College of Medicine, University of Vermont, Burlington, VT 05405, United States; Cellular, Molecular, and Biomedical Sciences Graduate Program, University of Vermont, Burlington, VT 05405, United States
| | - Arvis Sulovari
- Cellular, Molecular, and Biomedical Sciences Graduate Program, University of Vermont, Burlington, VT 05405, United States; Department of Microbiology and Molecular Genetics, Robert Larner, M.D. College of Medicine, University of Vermont, Burlington, VT 05405, United States
| | - Karamatullah Danyal
- Department of Pathology and Laboratory Medicine, Robert Larner, M.D. College of Medicine, University of Vermont, Burlington, VT 05405, United States
| | - David E Heppner
- Department of Pathology and Laboratory Medicine, Robert Larner, M.D. College of Medicine, University of Vermont, Burlington, VT 05405, United States
| | - David J Seward
- Department of Pathology and Laboratory Medicine, Robert Larner, M.D. College of Medicine, University of Vermont, Burlington, VT 05405, United States
| | - Albert van der Vliet
- Department of Pathology and Laboratory Medicine, Robert Larner, M.D. College of Medicine, University of Vermont, Burlington, VT 05405, United States; Cellular, Molecular, and Biomedical Sciences Graduate Program, University of Vermont, Burlington, VT 05405, United States.
| |
Collapse
|
20
|
Sulovari A, Liu Z, Zhu Z, Li D. Genome-wide meta-analysis of copy number variations with alcohol dependence. Pharmacogenomics J 2017; 18:398-405. [PMID: 28696413 DOI: 10.1038/tpj.2017.35] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 05/10/2017] [Accepted: 06/07/2017] [Indexed: 12/26/2022]
Abstract
Genetic association studies and meta-analyses of alcohol dependence (AD) have reported AD-associated single nucleotide polymorphisms (SNPs). These SNPs collectively account for a small portion of estimated heritability in AD. Recent genome-wide copy number variation (CNV) studies have identified CNVs associated with AD and substance dependence, suggesting that a portion of the missing heritability is explained by CNV. We applied PennCNV and QuantiSNP CNV calling algorithms to identify consensus CNVs in five AD cohorts of European and African origins. After rigorous quality control, genome-wide meta-analyses of CNVs were carried out in 3243 well-diagnosed AD cases and 2802 controls. We identified nine CNV regions, including a deletion in chromosome 5q21.3 with a suggestive association with AD (OR=2.15 (1.41-3.29) and P=3.8 × 10-4) and eight nominally significant CNV regions. All regions were replicated with consistent effect sizes across studies and populations. Pathway and gene-drug interaction enrichment analyses based on the resulting genes indicated the mitogen-activated protein kinase signaling pathway and the recombinant insulin and hyaluronidase drugs, which were relevant to AD biology or treatment. To our knowledge, this is the first genome-wide meta-analysis of CNVs with addiction. Further investigation of the AD-associated CNV regions will provide better understanding of the AD genetic mechanism.
Collapse
Affiliation(s)
- A Sulovari
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, USA
| | - Z Liu
- Spine Surgery, Drum Tower Hospital, Nanjing University Medical School, Nanjing, China
| | - Z Zhu
- Spine Surgery, Drum Tower Hospital, Nanjing University Medical School, Nanjing, China
| | - D Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, USA.,Department of Computer Science, University of Vermont, Burlington, VT, USA.,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, VT, USA
| |
Collapse
|
21
|
Sulovari A, Chen YH, Hudziak JJ, Li D. Atlas of human diseases influenced by genetic variants with extreme allele frequency differences. Hum Genet 2016; 136:39-54. [PMID: 27699474 DOI: 10.1007/s00439-016-1734-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 09/27/2016] [Indexed: 12/22/2022]
Abstract
Genetic variants with extreme allele frequency differences (EAFD) may underlie some human health disparities across populations. To identify EAFD loci, we systematically analyzed and characterized 81 million genomic variants from 2504 unrelated individuals of 26 world populations (phase III of the 1000 Genomes Project). Our analyses revealed a total of 434 genes, 15 pathways, and 18 diseases and traits influenced by EAFD variants from five continental populations. They included known EAFD genes, such as LCT (lactose tolerance), SLC24A5 (skin pigmentation), and EDAR (hair morphology). We found many novel EAFD genes, including TBC1D2B (autophagy mediator), TRIM40 (gastrointestinal inflammatory regulator), KRT71, KRT75, KRT83, and KRTAP10-1 (hair and epithelial keratin synthesis), PIK3R3 (insulin receptor interaction), DARS (neurological disorders), and NACA2 (skin inflammatory response). Our results also showed four complex diseases significantly associated with EAFD loci, including asthma (adjusted enrichment P = 4 × 10-8), type I diabetes (P = 6 × 10-9), alcohol consumption (P = 0.0002), and attention deficit/hyperactivity disorder (P = 0.003). This study provides a comprehensive atlas of genes, pathways, and human diseases significantly influenced by EAFD variants.
Collapse
Affiliation(s)
- Arvis Sulovari
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, 05405, USA
| | - Yolanda H Chen
- Department of Plant and Soil Science, University of Vermont, Burlington, VT, 05405, USA
| | - James J Hudziak
- Vermont Center for Children, Youth, and Families, Department of Psychiatry, University of Vermont, Burlington, VT, 05405, USA
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, 05405, USA. .,Department of Computer Science, University of Vermont, Burlington, VT, 05405, USA. .,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, VT, 05405, USA.
| |
Collapse
|
22
|
Sulovari A, Kranzler HR, Farrer LA, Gelernter J, Li D. Further analyses support the association between light eye color and alcohol dependence. Am J Med Genet B Neuropsychiatr Genet 2015; 168:757-60. [PMID: 26290254 DOI: 10.1002/ajmg.b.32357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/24/2015] [Accepted: 07/27/2015] [Indexed: 01/11/2023]
Affiliation(s)
- Arvis Sulovari
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont.,Cell, Molecular and Biomedical Sciences Graduate Program, University of Vermont, Burlington, Vermont
| | - Henry R Kranzler
- Department of Psychiatry, University of Pennsylvania School of Medicine and VISN 4 MIRECC, Philadelphia VAMC, Philadelphia, Pennsylvania
| | - Lindsay A Farrer
- Departments of Medicine (Biomedical Genetics), Neurology, Ophthalmology, Genetics & Genomics, Biostatistics, and Epidemiology, Boston University Schools of Medicine and Public Health, Boston, Massachusetts
| | - Joel Gelernter
- Department of Psychiatry, School of Medicine, Yale University, New Haven, Connecticut.,Department of Genetics, School of Medicine, Yale University, New Haven, Connecticut.,VA Connecticut Healthcare Center, West Haven, Connecticut and Department of Neurobiology, Yale University School of Medicine, New Haven, Connecticut
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont.,Department of Computer Science, University of Vermont, Burlington, Vermont.,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, Vermont
| |
Collapse
|
23
|
Sulovari A, Kranzler HR, Farrer LA, Gelernter J, Li D. Eye color: A potential indicator of alcohol dependence risk in European Americans. Am J Med Genet B Neuropsychiatr Genet 2015; 168B:347-53. [PMID: 25921801 DOI: 10.1002/ajmg.b.32316] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 04/02/2015] [Indexed: 12/20/2022]
Abstract
In archival samples of European-ancestry subjects, light-eyed individuals have been found to consume more alcohol than dark-eyed individuals. No published population-based studies have directly tested the association between alcohol dependence (AD) and eye color. We hypothesized that light-eyed individuals have a higher prevalence of AD than dark-eyed individuals. A mixture model was used to select a homogeneous sample of 1,263 European-Americans and control for population stratification. After quality control, we conducted an association study using logistic regression, adjusting for confounders (age, sex, and genetic ancestry). We found evidence of association between AD and blue eye color (P = 0.0005 and odds ratio = 1.83 (1.31-2.57)), supporting light eye color as a risk factor relative to brown eye color. Network-based analyses revealed a statistically significant (P = 0.02) number of genetic interactions between eye color genes and AD-associated genes. We found evidence of linkage disequilibrium between an AD-associated GABA receptor gene cluster, GABRB3/GABRG3, and eye color genes, OCA2/HERC2, as well as between AD-associated GRM5 and pigmentation-associated TYR. Our population-phenotype, network, and linkage disequilibrium analyses support association between blue eye color and AD. Although we controlled for stratification we cannot exclude underlying occult stratification as a contributor to this observation. Although replication is needed, our findings suggest that eye pigmentation information may be useful in research on AD. Further characterization of this association may unravel new AD etiological factors. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Arvis Sulovari
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont.,Cell, Molecular, and Biomedical Sciences Graduate Program, University of Vermont, Burlington, Vermont
| | - Henry R Kranzler
- Departmentof Psychiatry, University of Pennsylvania School of Medicine and VAMC 4 MIRECC, Philadelphia VAMC, Philadelphia, Pennsylvania
| | - Lindsay A Farrer
- Departments of Medicine (Biomedical Genetics), Neurology, Ophthalmology, Genetics & Genomics, Biostatistics, and Epidemiology, Boston University Schools of Medicine and Public Health, Boston, Massachusetts
| | - Joel Gelernter
- Department of Psychiatry, School of Medicine, Yale University, New Haven, Connecticut.,Department of Genetics, School of Medicine, Yale University, New Haven, Connecticut.,Connecticutand Department of Neurobiology, Yale University School of Medicine, VA Connecticut Healthcare Center, West Haven, New Haven, Connecticut
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont.,Department of Computer Science, University of Vermont, Burlington, Vermont.,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, Vermont
| |
Collapse
|
24
|
Sulovari A, Li D. GACT: a Genome build and Allele definition Conversion Tool for SNP imputation and meta-analysis in genetic association studies. BMC Genomics 2014; 15:610. [PMID: 25038819 PMCID: PMC4223508 DOI: 10.1186/1471-2164-15-610] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 07/10/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) have successfully identified genes associated with complex human diseases. Although much of the heritability remains unexplained, combining single nucleotide polymorphism (SNP) genotypes from multiple studies for meta-analysis will increase the statistical power to identify new disease-associated variants. Meta-analysis requires same allele definition (nomenclature) and genome build among individual studies. Similarly, imputation, commonly-used prior to meta-analysis, requires the same consistency. However, the genotypes from various GWAS are generated using different genotyping platforms, arrays or SNP-calling approaches, resulting in use of different genome builds and allele definitions. Incorrect assumptions of identical allele definition among combined GWAS lead to a large portion of discarded genotypes or incorrect association findings. There is no published tool that predicts and converts among all major allele definitions. RESULTS In this study, we have developed a tool, GACT, which stands for Genome build and Allele definition Conversion Tool, that predicts and inter-converts between any of the common SNP allele definitions and between the major genome builds. In addition, we assessed several factors that may affect imputation quality, and our results indicated that inclusion of singletons in the reference had detrimental effects while ambiguous SNPs had no measurable effect. Unexpectedly, exclusion of genotypes with missing rate > 0.001 (40% of study SNPs) showed no significant decrease of imputation quality (even significantly higher when compared to the imputation with singletons in the reference), especially for rare SNPs. CONCLUSION GACT is a new, powerful, and user-friendly tool with both command-line and interactive online versions that can accurately predict, and convert between any of the common allele definitions and between genome builds for genome-wide meta-analysis and imputation of genotypes from SNP-arrays or deep-sequencing, particularly for data from the dbGaP and other public databases. GACT SOFTWARE http://www.uvm.edu/genomics/software/gact.
Collapse
Affiliation(s)
| | - Dawei Li
- Department of Microbiology and Molecular Genetics, University of Vermont, 05405 Burlington, VT, USA.
| |
Collapse
|
25
|
Li D, Sulovari A, Cheng C, Zhao H, Kranzler HR, Gelernter J. Association of gamma-aminobutyric acid A receptor α2 gene (GABRA2) with alcohol use disorder. Neuropsychopharmacology 2014; 39:907-18. [PMID: 24136292 PMCID: PMC3924525 DOI: 10.1038/npp.2013.291] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 10/09/2013] [Accepted: 10/10/2013] [Indexed: 12/26/2022]
Abstract
Gamma-aminobutyric acid (GABA) is a major inhibitory neurotransmitter in mammalian brain. GABA receptor are involved in a number of complex disorders, including substance abuse. No variants of the commonly studied GABA receptor genes that have been associated with substance dependence have been determined to be functional or pathogenic. To reconcile the conflicting associations with substance dependence traits, we performed a meta-analysis of variants in the GABAA receptor genes (GABRB2, GABRA6, GABRA1, and GABRG2 on chromosome 5q and GABRA2 on chromosome 4p12) using genotype data from 4739 cases of alcohol, opioid, or methamphetamine dependence and 4924 controls. Then, we combined the data from candidate gene association studies in the literature with two alcohol dependence (AD) samples, including 1691 cases and 1712 controls from the Study of Addiction: Genetics and Environment (SAGE), and 2644 cases and 494 controls from our own study. Using a Bonferroni-corrected threshold of 0.007, we found strong associations between GABRA2 and AD (P=9 × 10(-6) and odds ratio (OR) 95% confidence interval (CI)=1.27 (1.15, 1.4) for rs567926, P=4 × 10(-5) and OR=1.21 (1.1, 1.32) for rs279858), and between GABRG2 and both dependence on alcohol and dependence on heroin (P=0.0005 and OR=1.22 (1.09, 1.37) for rs211014). Significant association was also observed between GABRA6 rs3219151 and AD. The GABRA2 rs279858 association was observed in the SAGE data sets with a combined P of 9 × 10(-6) (OR=1.17 (1.09, 1.26)). When all of these data sets, including our samples, were meta-analyzed, associations of both GABRA2 single-nucleotide polymorphisms remained (for rs567926, P=7 × 10(-5) (OR=1.18 (1.09, 1.29)) in all the studies, and P=8 × 10(-6) (OR=1.25 (1.13, 1.38)) in subjects of European ancestry and for rs279858, P=5 × 10(-6) (OR=1.18 (1.1, 1.26)) in subjects of European ancestry. Findings from this extensive meta-analysis of five GABAA receptor genes and substance abuse support their involvement (with the best evidence for GABRA2) in the pathogenesis of AD. Further replications with larger samples are warranted.
Collapse
Affiliation(s)
- Dawei Li
- Department of Psychiatry, School of Medicine, Yale University, New Haven, CT, USA,Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, USA,Department of Computer Science, University of Vermont, Burlington, VT, USA,Neuroscience, Behavior, and Health Initiative, University of Vermont, Burlington, VT, USA,Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, Vermont 05405, USA, Tel: 802-656-9838; E-mail:
| | - Arvis Sulovari
- Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, USA
| | - Chao Cheng
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA,Department of Genetics, School of Medicine, Yale University, New Haven, CT, USA
| | - Henry R Kranzler
- Department of Psychiatry, Perelman School of Medicine of the University of Pennsylvania and Philadelphia VAMC, Philadelphia, PA, USA
| | - Joel Gelernter
- Department of Psychiatry, School of Medicine, Yale University, New Haven, CT, USA,Department of Genetics, School of Medicine, Yale University, New Haven, CT, USA,VA Connecticut Healthcare Center, West Haven, CT, USA
| |
Collapse
|
26
|
Moore JH, Hill DP, Sulovari A, Kidd LC. Genetic Analysis of Prostate Cancer Using Computational Evolution, Pareto-Optimization and Post-processing. Genetic and Evolutionary Computation 2013. [DOI: 10.1007/978-1-4614-6846-2_7] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|