1
|
Soto DC, Uribe-Salazar JM, Shew CJ, Sekar A, McGinty SP, Dennis MY. Genomic structural variation: A complex but important driver of human evolution. AMERICAN JOURNAL OF BIOLOGICAL ANTHROPOLOGY 2023. [PMID: 36794631 DOI: 10.1002/ajpa.24713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 01/21/2023] [Accepted: 02/05/2023] [Indexed: 02/17/2023]
Abstract
Structural variants (SVs)-including duplications, deletions, and inversions of DNA-can have significant genomic and functional impacts but are technically difficult to identify and assay compared with single-nucleotide variants. With the aid of new genomic technologies, it has become clear that SVs account for significant differences across and within species. This phenomenon is particularly well-documented for humans and other primates due to the wealth of sequence data available. In great apes, SVs affect a larger number of nucleotides than single-nucleotide variants, with many identified SVs exhibiting population and species specificity. In this review, we highlight the importance of SVs in human evolution by (1) how they have shaped great ape genomes resulting in sensitized regions associated with traits and diseases, (2) their impact on gene functions and regulation, which subsequently has played a role in natural selection, and (3) the role of gene duplications in human brain evolution. We further discuss how to incorporate SVs in research, including the strengths and limitations of various genomic approaches. Finally, we propose future considerations in integrating existing data and biospecimens with the ever-expanding SV compendium propelled by biotechnology advancements.
Collapse
Affiliation(s)
- Daniela C Soto
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - José M Uribe-Salazar
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Colin J Shew
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Aarthi Sekar
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Sean P McGinty
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| | - Megan Y Dennis
- Genome Center, MIND Institute, Department of Biochemstry & Molecular Medicine, University of California, Davis, California, USA.,Integrative Genetics and Genomics Graduate Group, University of California, Davis, California, USA
| |
Collapse
|
2
|
Caceres M, Mumey B, Husic E, Rizzi R, Cairo M, Sahlin K, Tomescu AI. Safety in Multi-Assembly via Paths Appearing in All Path Covers of a DAG. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3673-3684. [PMID: 34847041 DOI: 10.1109/tcbb.2021.3131203] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A multi-assembly problem asks to reconstruct multiple genomic sequences from mixed reads sequenced from all of them. Standard formulations of such problems model a solution as a path cover in a directed acyclic graph, namely a set of paths that together cover all vertices of the graph. Since multi-assembly problems admit multiple solutions in practice, we consider an approach commonly used in standard genome assembly: output only partial solutions (contigs, or safe paths), that appear in all path cover solutions. We study constrained path covers, a restriction on the path cover solution that incorporate practical constraints arising in multi-assembly problems. We give efficient algorithms finding all maximal safe paths for constrained path covers. We compute the safe paths of splicing graphs constructed from transcript annotations of different species. Our algorithms run in less than 15 seconds per species and report RNA contigs that are over 99% precise and are up to 8 times longer than unitigs. Moreover, RNA contigs cover over 70% of the transcripts and their coding sequences in most cases. With their increased length to unitigs, high precision, and fast construction time, maximal safe paths can provide a better base set of sequences for transcript assembly programs.
Collapse
|
3
|
Dias FH, Williams L, Mumey B, Tomescu AI. Efficient Minimum Flow Decomposition via Integer Linear Programming. J Comput Biol 2022; 29:1252-1267. [PMID: 36260412 PMCID: PMC9700332 DOI: 10.1089/cmb.2022.0257] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Minimum flow decomposition (MFD) is an NP-hard problem asking to decompose a network flow into a minimum set of paths (together with associated weights). Variants of it are powerful models in multiassembly problems in Bioinformatics, such as RNA assembly. Owing to its hardness, practical multiassembly tools either use heuristics or solve simpler, polynomial time-solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Here, we provide the first fast and exact solver for MFD on acyclic flow networks, based on Integer Linear Programming (ILP). Key to our approach is an encoding of all the exponentially many solution paths using only a quadratic number of variables. We also extend our ILP formulation to many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. On both simulated and real-flow splicing graphs, our approach solves any instance in <13 seconds. We hope that our formulations can lie at the core of future practical RNA assembly tools. Our implementations are freely available on Github.
Collapse
Affiliation(s)
- Fernando H.C. Dias
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Lucia Williams
- School of Computing, Montana State University, Bozeman, Montana, USA
| | - Brendan Mumey
- School of Computing, Montana State University, Bozeman, Montana, USA
| | | |
Collapse
|
4
|
Hertzberg J, Mundlos S, Vingron M, Gallone G. TADA-a machine learning tool for functional annotation-based prioritisation of pathogenic CNVs. Genome Biol 2022; 23:67. [PMID: 35232478 PMCID: PMC8886976 DOI: 10.1186/s13059-022-02631-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 02/11/2022] [Indexed: 01/08/2023] Open
Abstract
Few methods have been developed to investigate copy number variants (CNVs) based on their predicted pathogenicity. We introduce TADA, a method to prioritise pathogenic CNVs through assisted manual filtering and automated classification, based on an extensive catalogue of functional annotation supported by rigourous enrichment analysis. We demonstrate that our classifiers are able to accurately predict pathogenic CNVs, outperforming current alternative methods, and produce a well-calibrated pathogenicity score. Our results suggest that functional annotation-based prioritisation of pathogenic CNVs is a promising approach to support clinical diagnostics and to further the understanding of mechanisms controlling the disease impact of larger genomic alterations.
Collapse
Affiliation(s)
- Jakob Hertzberg
- Max Planck Institute for Molecular Genetics, Ihnestraße 63, Berlin, 14195, Germany. .,Charité Universitätsmedizin Berlin, Charitéplatz 1, Berlin, 10117, Germany.
| | - Stefan Mundlos
- Max Planck Institute for Molecular Genetics, Ihnestraße 63, Berlin, 14195, Germany.,Charité Universitätsmedizin Berlin, Charitéplatz 1, Berlin, 10117, Germany
| | - Martin Vingron
- Max Planck Institute for Molecular Genetics, Ihnestraße 63, Berlin, 14195, Germany
| | - Giuseppe Gallone
- Max Planck Institute for Molecular Genetics, Ihnestraße 63, Berlin, 14195, Germany
| |
Collapse
|
5
|
Wang Z, Guo Y, Liu S, Meng Q. Genome-Wide Assessment Characteristics of Genes Overlapping Copy Number Variation Regions in Duroc Purebred Population. Front Genet 2021; 12:753748. [PMID: 34721540 PMCID: PMC8552909 DOI: 10.3389/fgene.2021.753748] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 09/23/2021] [Indexed: 11/13/2022] Open
Abstract
Copy number variations (CNVs) are important structural variations that can cause significant phenotypic diversity. Reliable CNVs mapping can be achieved by identification of CNVs from different genetic backgrounds. Investigations on the characteristics of overlapping between CNV regions (CNVRs) and protein-coding genes (CNV genes) or miRNAs (CNV-miRNAs) can reveal the potential mechanisms of their regulation. In this study, we used 50 K SNP arrays to detect CNVs in Duroc purebred pig. A total number of 211 CNVRs were detected with a total length of 118.48 Mb, accounting for 5.23% of the autosomal genome sequence. Of these CNVRs, 32 were gains, 175 losses, and four contained both types (loss and gain within the same region). The CNVRs we detected were non-randomly distributed in the swine genome and were significantly enriched in the segmental duplication and gene density region. Additionally, these CNVRs were overlapping with 1,096 protein-coding genes (CNV-genes), and 39 miRNAs (CNV-miRNAs), respectively. The CNV-genes were enriched in terms of dosage-sensitive gene list. The expression of the CNV genes was significantly higher than that of the non-CNV genes in the adult Duroc prostate. Of all detected CNV genes, 22.99% genes were tissue-specific (TSI > 0.9). Strong negative selection had been underway in the CNV-genes as the ones that were located entirely within the loss CNVRs appeared to be evolving rapidly as determined by the median dN plus dS values. Non-CNV genes tended to be miRNA target than CNV-genes. Furthermore, CNV-miRNAs tended to target more genes compared to non-CNV-miRNAs, and a combination of two CNV-miRNAs preferentially synergistically regulated the same target genes. We also focused our efforts on examining CNV genes and CNV-miRNAs functions, which were also involved in the lipid metabolism, including DGAT1, DGAT2, MOGAT2, miR143, miR335, and miRLET7. Further molecular experiments and independent large studies are needed to confirm our findings.
Collapse
Affiliation(s)
- Zhipeng Wang
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China.,Bioinformatics Center, Northeast Agricultural University, Harbin, China
| | - Yuanyuan Guo
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China.,Bioinformatics Center, Northeast Agricultural University, Harbin, China
| | - Shengwei Liu
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China.,Bioinformatics Center, Northeast Agricultural University, Harbin, China
| | - Qingli Meng
- Beijing Breeding Swine Center, Beijing, China
| |
Collapse
|
6
|
Abdullaev ET, Umarova IR, Arndt PF. Modelling segmental duplications in the human genome. BMC Genomics 2021; 22:496. [PMID: 34215180 PMCID: PMC8254307 DOI: 10.1186/s12864-021-07789-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 06/10/2021] [Indexed: 11/22/2022] Open
Abstract
Background Segmental duplications (SDs) are long DNA sequences that are repeated in a genome and have high sequence identity. In contrast to repetitive elements they are often unique and only sometimes have multiple copies in a genome. There are several well-studied mechanisms responsible for segmental duplications: non-allelic homologous recombination, non-homologous end joining and replication slippage. Such duplications play an important role in evolution, however, we do not have a full understanding of the dynamic properties of the duplication process. Results We study segmental duplications through a graph representation where nodes represent genomic regions and edges represent duplications between them. The resulting network (the SD network) is quite complex and has distinct features which allow us to make inference on the evolution of segmantal duplications. We come up with the network growth model that explains features of the SD network thus giving us insights on dynamics of segmental duplications in the human genome. Based on our analysis of genomes of other species the network growth model seems to be applicable for multiple mammalian genomes. Conclusions Our analysis suggests that duplication rates of genomic loci grow linearly with the number of copies of a duplicated region. Several scenarios explaining such a preferential duplication rates were suggested. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-07789-7).
Collapse
Affiliation(s)
- Eldar T Abdullaev
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 63/73, Berlin, 14195, Germany.
| | - Iren R Umarova
- Faculty of Computational Mathematics and Cybernetics, Moscow State University, Leninskiye Gory 1-52, Moscow, 119991, Russia
| | - Peter F Arndt
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 63/73, Berlin, 14195, Germany
| |
Collapse
|
7
|
Long X, Xue H. Genetic-variant hotspots and hotspot clusters in the human genome facilitating adaptation while increasing instability. Hum Genomics 2021; 15:19. [PMID: 33741065 PMCID: PMC7976700 DOI: 10.1186/s40246-021-00318-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Accepted: 03/04/2021] [Indexed: 12/25/2022] Open
Abstract
Background Genetic variants, underlining phenotypic diversity, are known to distribute unevenly in the human genome. A comprehensive understanding of the distributions of different genetic variants is important for insights into genetic functions and disorders. Methods Herein, a sliding-window scan of regional densities of eight kinds of germline genetic variants, including single-nucleotide-polymorphisms (SNPs) and four size-classes of copy-number-variations (CNVs) in the human genome has been performed. Results The study has identified 44,379 hotspots with high genetic-variant densities, and 1135 hotspot clusters comprising more than one type of hotspots, accounting for 3.1% and 0.2% of the genome respectively. The hotspots and clusters are found to co-localize with different functional genomic features, as exemplified by the associations of hotspots of middle-size CNVs with histone-modification sites, work with balancing and positive selections to meet the need for diversity in immune proteins, and facilitate the development of sensory-perception and neuroactive ligand-receptor interaction pathways in the function-sparse late-replicating genomic sequences. Genetic variants of different lengths co-localize with retrotransposons of different ages on a “long-with-young” and “short-with-all” basis. Hotspots and clusters are highly associated with tumor suppressor genes and oncogenes (p < 10−10), and enriched with somatic tumor CNVs and the trait- and disease-associated SNPs identified by genome-wise association studies, exceeding tenfold enrichment in clusters comprising SNPs and extra-long CNVs. Conclusions In conclusion, the genetic-variant hotspots and clusters represent two-edged swords that spearhead both positive and negative genomic changes. Their strong associations with complex traits and diseases also open up a potential “Common Disease-Hotspot Variant” approach to the missing heritability problem. Supplementary Information The online version contains supplementary material available at 10.1186/s40246-021-00318-3.
Collapse
Affiliation(s)
- Xi Long
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China.,HKUST Shenzhen Research Institute, 9 Yuexing First Road, Nanshan, Shenzhen, China
| | - Hong Xue
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China. .,HKUST Shenzhen Research Institute, 9 Yuexing First Road, Nanshan, Shenzhen, China. .,Centre for Cancer Genomics, School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, Jiangsu, China.
| |
Collapse
|
8
|
Bretani G, Rossini L, Ferrandi C, Russell J, Waugh R, Kilian B, Bagnaresi P, Cattivelli L, Fricano A. Segmental duplications are hot spots of copy number variants affecting barley gene content. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:1073-1088. [PMID: 32338390 PMCID: PMC7496488 DOI: 10.1111/tpj.14784] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 04/10/2020] [Accepted: 04/14/2020] [Indexed: 05/31/2023]
Abstract
Copy number variants (CNVs) are pervasive in several animal and plant genomes and contribute to shaping genetic diversity. In barley, there is evidence that changes in gene copy number underlie important agronomic traits. The recently released reference sequence of barley represents a valuable genomic resource for unveiling the incidence of CNVs that affect gene content and for identifying sequence features associated with CNV formation. Using exome sequencing and read count data, we detected 16 605 deletions and duplications that affect barley gene content by surveying a diverse panel of 172 cultivars, 171 landraces, 22 wild relatives and other 32 uncategorized domesticated accessions. The quest for segmental duplications (SDs) in the reference sequence revealed many low-copy repeats, most of which overlap predicted coding sequences. Statistical analyses revealed that the incidence of CNVs increases significantly in SD-rich regions, indicating that these sequence elements act as hot spots for the formation of CNVs. The present study delivers a comprehensive genome-wide study of CNVs affecting barley gene content and implicates SDs in the molecular mechanisms that lead to the formation of this class of CNVs.
Collapse
Affiliation(s)
- Gianluca Bretani
- Università degli Studi di Milano – DiSAAVia Celoria 220133MilanoItaly
| | - Laura Rossini
- Università degli Studi di Milano – DiSAAVia Celoria 220133MilanoItaly
| | - Chiara Ferrandi
- Parco Tecnologico PadanoLoc. C.na CodazzaVia Einstein26900LodiItaly
| | | | - Robbie Waugh
- James Hutton Institute, InvergowrieDundeeDD2 5DAUK
| | - Benjamin Kilian
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK)Corrensstrasse 306466GaterslebenGermany
- Global Crop Diversity TrustPlatz der Vereinten Nationen 753113BonnGermany
| | - Paolo Bagnaresi
- Council for Agricultural Research and Economics – Research Centre for Genomics & BioinformaticsVia San Protaso 30229017Fiorenzuola d'Arda (PC)Italy
| | - Luigi Cattivelli
- Council for Agricultural Research and Economics – Research Centre for Genomics & BioinformaticsVia San Protaso 30229017Fiorenzuola d'Arda (PC)Italy
| | - Agostino Fricano
- Council for Agricultural Research and Economics – Research Centre for Genomics & BioinformaticsVia San Protaso 30229017Fiorenzuola d'Arda (PC)Italy
| |
Collapse
|
9
|
Wang Z, Guo J, Guo Y, Yang Y, Teng T, Yu Q, Wang T, Zhou M, Zhu Q, Wang W, Zhang Q, Yang H. Genome-Wide Detection of CNVs and Association With Body Weight in Sheep Based on 600K SNP Arrays. Front Genet 2020; 11:558. [PMID: 32582291 PMCID: PMC7297042 DOI: 10.3389/fgene.2020.00558] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 05/07/2020] [Indexed: 01/30/2023] Open
Abstract
Copy number variations (CNVs) are important genomic structural variations and can give rise to significant phenotypic diversity. Herein, we used high-density 600K SNP arrays to detect CNVs in two synthetic lines of sheep (DS and SHH) and in Hu sheep (a local Chinese breed). A total of 919 CNV regions (CNVRs) were detected with a total length of 48.17 Mb, accounting for 1.96% of the sheep genome. These CNVRs consisted of 730 gains, 102 losses, and 87 complex CNVRs. These CNVRs were significantly enriched in the segmental duplication (SD) region. A CNVR-based cluster analysis of the three breeds revealed that the DS and SHH breeds share a close genetic relationship. Functional analysis revealed that some genes in these CNVRs were also significantly enriched in the olfactory transduction pathway (oas04740), including members of the OR gene family such as OR6C76, OR4Q2, and OR4K14. Using association analyses and previous gene annotations, we determined that a subset of identified genes was likely to be associated with body weight, including FOXF2, MAPK12, MAP3K11, STRBP, and C14orf132. Together, these results offer valuable information that will guide future efforts to explore the genetic basis for body weight in sheep.
Collapse
Affiliation(s)
- Zhipeng Wang
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China.,Key Laboratory of Animal Genetics, Breeding and Reproduction, Education Department of Heilongjiang Province, Harbin, China
| | - Jing Guo
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China.,Key Laboratory of Animal Genetics, Breeding and Reproduction, Education Department of Heilongjiang Province, Harbin, China
| | - Yuanyuan Guo
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China.,Key Laboratory of Animal Genetics, Breeding and Reproduction, Education Department of Heilongjiang Province, Harbin, China
| | - Yonglin Yang
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science, Shihezi, China
| | - Teng Teng
- Institute of Animal Nutrition, Northeast Agricultural University, Harbin, China
| | - Qian Yu
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science, Shihezi, China
| | - Tao Wang
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China.,Key Laboratory of Animal Genetics, Breeding and Reproduction, Education Department of Heilongjiang Province, Harbin, China
| | - Meng Zhou
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China.,Key Laboratory of Animal Genetics, Breeding and Reproduction, Education Department of Heilongjiang Province, Harbin, China
| | - Qiusi Zhu
- College of Animal Science and Technology, Northeast Agricultural University, Harbin, China.,Key Laboratory of Animal Genetics, Breeding and Reproduction, Education Department of Heilongjiang Province, Harbin, China
| | - Wenwen Wang
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Shandong Agricultural University, Tai'an, China
| | - Qin Zhang
- Department of Animal Genetics and Breeding, College of Animal Science and Technology, Shandong Agricultural University, Tai'an, China
| | - Hua Yang
- State Key Laboratory of Sheep Genetic Improvement and Healthy Production, Xinjiang Academy of Agricultural and Reclamation Science, Shihezi, China
| |
Collapse
|
10
|
Brasó-Vives M, Povolotskaya IS, Hartasánchez DA, Farré X, Fernandez-Callejo M, Raveendran M, Harris RA, Rosene DL, Lorente-Galdos B, Navarro A, Marques-Bonet T, Rogers J, Juan D. Copy number variants and fixed duplications among 198 rhesus macaques (Macaca mulatta). PLoS Genet 2020; 16:e1008742. [PMID: 32392208 PMCID: PMC7241854 DOI: 10.1371/journal.pgen.1008742] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2019] [Revised: 05/21/2020] [Accepted: 03/27/2020] [Indexed: 01/01/2023] Open
Abstract
The rhesus macaque is an abundant species of Old World monkeys and a valuable model organism for biomedical research due to its close phylogenetic relationship to humans. Copy number variation is one of the main sources of genomic diversity within and between species and a widely recognized cause of inter-individual differences in disease risk. However, copy number differences among rhesus macaques and between the human and macaque genomes, as well as the relevance of this diversity to research involving this nonhuman primate, remain understudied. Here we present a high-resolution map of sequence copy number for the rhesus macaque genome constructed from a dataset of 198 individuals. Our results show that about one-eighth of the rhesus macaque reference genome is composed of recently duplicated regions, either copy number variable regions or fixed duplications. Comparison with human genomic copy number maps based on previously published data shows that, despite overall similarities in the genome-wide distribution of these regions, there are specific differences at the chromosome level. Some of these create differences in the copy number profile between human disease genes and their rhesus macaque orthologs. Our results highlight the importance of addressing the number of copies of target genes in the design of experiments and cautions against human-centered assumptions in research conducted with model organisms. Overall, we present a genome-wide copy number map from a large sample of rhesus macaque individuals representing an important novel contribution concerning the evolution of copy number in primate genomes.
Collapse
Affiliation(s)
- Marina Brasó-Vives
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
- Laboratoire de Biométrie et Biologie Évolutive UMR 5558, Université de Lyon, Université Lyon 1, CNRS, Villeurbanne, France
| | - Inna S. Povolotskaya
- Veltischev Research and Clinical Institute for Pediatrics of the Pirogov Russian National Research Medical University, Moscow, Russia
| | - Diego A. Hartasánchez
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
| | - Xavier Farré
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
| | - Marcos Fernandez-Callejo
- National Centre for Genomic Analysis-Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - R. Alan Harris
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Douglas L. Rosene
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, Massachusetts, United States of America
| | - Belen Lorente-Galdos
- Department of Neuroscience, Yale School of Medicine, New Haven, Connecticut, United States of America
| | - Arcadi Navarro
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
- National Institute for Bioinformatics (INB), Barcelona, Catalonia, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Catalonia, Spain
| | - Tomas Marques-Bonet
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
- National Centre for Genomic Analysis-Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Catalonia, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Catalonia, Spain
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - David Juan
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Parc de Recerca Biomèdica de Barcelona, Barcelona, Catalonia, Spain
| |
Collapse
|
11
|
Variant Calling Using Whole Genome Resequencing and Sequence Capture for Population and Evolutionary Genomic Inferences in Norway Spruce (Picea Abies). COMPENDIUM OF PLANT GENOMES 2020. [DOI: 10.1007/978-3-030-21001-4_2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
12
|
An Evolutionary Perspective on the Impact of Genomic Copy Number Variation on Human Health. J Mol Evol 2019; 88:104-119. [PMID: 31522275 DOI: 10.1007/s00239-019-09911-6] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Accepted: 08/27/2019] [Indexed: 02/06/2023]
Abstract
Copy number variants (CNVs), deletions and duplications of segments of DNA, account for at least five times more variable base pairs in humans than single-nucleotide variants. Several common CNVs were shown to change coding and regulatory sequences and thus dramatically affect adaptive phenotypes involving immunity, perception, metabolism, skin structure, among others. Some of these CNVs were also associated with susceptibility to cancer, infection, and metabolic disorders. These observations raise the possibility that CNVs are a primary contributor to human phenotypic variation and consequently evolve under selective pressures. Indeed, locus-specific haplotype-level analyses revealed signatures of natural selection on several CNVs. However, more traditional tests of selection which are often applied to single-nucleotide variation often have diminished statistical power when applied to CNVs because they often do not show strong linkage disequilibrium with nearby variants. Recombination-based formation mechanisms of CNVs lead to frequent recurrence and gene conversion events, breaking the linkage disequilibrium involving CNVs. Similar methodological challenges also prevent routine genome-wide association studies to adequately investigate the impact of CNVs on heritable human disease. Thus, we argue that the full relevance of CNVs to human health and evolution is yet to be elucidated. We further argue that a holistic investigation of formation mechanisms within an evolutionary framework would provide a powerful framework to understand the functional and biomedical impact of CNVs. In this paper, we review several cases where studies reveal diverse evolutionary histories and unexpected functional consequences of CNVs. We hope that this review will encourage further work on CNVs by both evolutionary and medical geneticists.
Collapse
|
13
|
Joy N, Maimoonath Beevi YP, Soniya EV. A deeper view into the significance of simple sequence repeats in pre-miRNAs provides clues for its possible roles in determining the function of microRNAs. BMC Genet 2018; 19:29. [PMID: 29739315 PMCID: PMC5941480 DOI: 10.1186/s12863-018-0615-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Accepted: 04/30/2018] [Indexed: 02/06/2023] Open
Abstract
Background The central tenet of ‘genome content’ has been that the ‘non-coding’ parts are highly enriched with ‘microsatellites’ or ‘Simple Sequence Repeats’ (SSRs). We presume that the presence and change in number of repeat unit (n) of SSRs in different genomic locations may or may not become beneficial, depending on the position of SSRs in a gene. Very few studies have looked into the existence of SSRs in the hair-pin precursors of miRNAs (pre-miRNAs). The interplay between SSRs and miRNAs is not yet clearly understood. Results Considering the potential significance of SSRs in pre-miRNAs, we analysed the miRNA hair-pin precursors of 171 organisms, which revealed a noticeable (29.8%) existence of SSRs in their pre-miRNAs. The maintenance of SSRs in pre-miRNAs even in the complex, highly evolved phyla like Chordata and Magnoliophyta shed light upon its diverse functions. Putative effects of SSRs in either regulating the biogenesis or function of miRNAs were more underlined based on computational and experimental analysis. A preliminary computational analysis to explore the relevance of such SSRs maintained in pre-miRNA sequences led to the detection of splicing regulatory elements (SREs) either in or near to the SSRs. The absence of SSRs correspondingly decreased the detection of SREs. Conclusion The present study is the first implication for the possible involvement of SSRs in shaping the SREs to undergo Alternative Splicing events to produce miRNA isoforms in accordance with different stress environments. This part of work well demonstrates the importance of studying such consistently maintained SSRs residing in pre-miRNAs and can enhance more and more research towards deciphering the exact function of SSRs in the near future. Electronic supplementary material The online version of this article (10.1186/s12863-018-0615-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nisha Joy
- Plant Disease Biology and Biotechnology, Rajiv Gandhi Center for Biotechnology, Poojappura, Thiruvananthapuram, Kerala, 695014, India.
| | - Y P Maimoonath Beevi
- Plant Disease Biology and Biotechnology, Rajiv Gandhi Center for Biotechnology, Poojappura, Thiruvananthapuram, Kerala, 695014, India
| | - E V Soniya
- Plant Disease Biology and Biotechnology, Rajiv Gandhi Center for Biotechnology, Poojappura, Thiruvananthapuram, Kerala, 695014, India.
| |
Collapse
|
14
|
Wang S, Zhu XQ, Cai X. Gene Duplication Analysis Reveals No Ancient Whole Genome Duplication but Extensive Small-Scale Duplications during Genome Evolution and Adaptation of Schistosoma mansoni. Front Cell Infect Microbiol 2017; 7:412. [PMID: 28983471 PMCID: PMC5613093 DOI: 10.3389/fcimb.2017.00412] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Accepted: 09/05/2017] [Indexed: 01/19/2023] Open
Abstract
Gene duplication (GD), thought to facilitate evolutionary innovation and adaptation, has been studied in many phylogenetic lineages. However, it remains poorly investigated in trematodes, a medically important parasite group that has been evolutionarily specialized during long-term host-parasite interaction. In this study, we conducted a genome-wide study of GD modes and contributions in Schistosoma mansoni, a pathogen causing human schistosomiasis. We combined several lines of evidence provided by duplicate age distributions, genomic sequence similarity, depth-of-coverage and gene synteny to identify the dominant drivers that contribute to the origins of new genes in this parasite. The gene divergences following duplication events (gene structure, expression and function retention) were also analyzed. Our results reveal that the genome lacks whole genome duplication (WGD) in a long evolutionary time and has few large segmental duplications, but is extensively shaped by the continuous small-scale gene duplications (SSGDs) (i.e., dispersed, tandem and proximal GDs) that may be derived from (retro-) transposition and unequal crossing over. Additionally, our study shows that the genes generated by tandem duplications have the smallest divergence during the evolution. Finally, we demonstrate that SSGDs, especially the tandem duplications, greatly contribute to the expansions of some preferentially retained pathogenesis-associated gene families that are associated with the parasite's survival during infection. This study is the first to systematically summarize the landscape of GDs in trematodes and provides new insights of adaptations to parasitism linked to GD events for these parasites.
Collapse
Affiliation(s)
- Shuai Wang
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural SciencesLanzhou, China
| | - Xing-Quan Zhu
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural SciencesLanzhou, China
| | - Xuepeng Cai
- State Key Laboratory of Veterinary Etiological Biology, Key Laboratory of Veterinary Parasitology of Gansu Province, Lanzhou Veterinary Research Institute, Chinese Academy of Agricultural SciencesLanzhou, China
| |
Collapse
|
15
|
Van Schil K, Naessens S, Van de Sompele S, Carron M, Aslanidis A, Van Cauwenbergh C, Kathrin Mayer A, Van Heetvelde M, Bauwens M, Verdin H, Coppieters F, Greenberg ME, Yang MG, Karlstetter M, Langmann T, De Preter K, Kohl S, Cherry TJ, Leroy BP, De Baere E. Mapping the genomic landscape of inherited retinal disease genes prioritizes genes prone to coding and noncoding copy-number variations. Genet Med 2017; 20:202-213. [PMID: 28749477 PMCID: PMC5787040 DOI: 10.1038/gim.2017.97] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 05/19/2017] [Indexed: 01/08/2023] Open
Abstract
PurposePart of the hidden genetic variation in heterogeneous genetic conditions such as inherited retinal diseases (IRDs) can be explained by copy-number variations (CNVs). Here, we explored the genomic landscape of IRD genes listed in RetNet to identify and prioritize those genes susceptible to CNV formation.MethodsRetNet genes underwent an assessment of genomic features and of CNV occurrence in the Database of Genomic Variants and literature. CNVs identified in an IRD cohort were characterized using targeted locus amplification (TLA) on extracted genomic DNA.ResultsExhaustive literature mining revealed 1,345 reported CNVs in 81 different IRD genes. Correlation analysis between rankings of genomic features and CNV occurrence demonstrated the strongest correlation between gene size and CNV occurrence of IRD genes. Moreover, we identified and delineated 30 new CNVs in IRD cases, 13 of which are novel and three of which affect noncoding, putative cis-regulatory regions. Finally, the breakpoints of six complex CNVs were determined using TLA in a hypothesis-neutral manner.ConclusionWe propose a ranking of CNV-prone IRD genes and demonstrate the efficacy of TLA for the characterization of CNVs on extracted DNA. Finally, this IRD-oriented CNV study can serve as a paradigm for other genetically heterogeneous Mendelian diseases with hidden genetic variation.
Collapse
Affiliation(s)
- Kristof Van Schil
- Center for Medical Genetics, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Sarah Naessens
- Center for Medical Genetics, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Stijn Van de Sompele
- Center for Medical Genetics, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Marjolein Carron
- Center for Medical Genetics, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Alexander Aslanidis
- Laboratory for Experimental Immunology of the Eye, Department of Ophthalmology, University of Cologne, Cologne, Germany
| | | | - Anja Kathrin Mayer
- Molecular Genetics Laboratory, Institute for Ophthalmic Research, Centre for Ophthalmology, University of Tuebingen, Tuebingen, Germany
| | - Mattias Van Heetvelde
- Center for Medical Genetics, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Miriam Bauwens
- Center for Medical Genetics, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Hannah Verdin
- Center for Medical Genetics, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Frauke Coppieters
- Center for Medical Genetics, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Michael E Greenberg
- Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, USA
| | - Marty G Yang
- Department of Neurobiology, Harvard Medical School, Boston, Massachusetts, USA
| | - Marcus Karlstetter
- Laboratory for Experimental Immunology of the Eye, Department of Ophthalmology, University of Cologne, Cologne, Germany
| | - Thomas Langmann
- Laboratory for Experimental Immunology of the Eye, Department of Ophthalmology, University of Cologne, Cologne, Germany
| | - Katleen De Preter
- Center for Medical Genetics, Ghent University and Ghent University Hospital, Ghent, Belgium
| | - Susanne Kohl
- Molecular Genetics Laboratory, Institute for Ophthalmic Research, Centre for Ophthalmology, University of Tuebingen, Tuebingen, Germany
| | - Timothy J Cherry
- Department of Pediatrics, University of Washington School of Medicine, Seattle, Washington, USA.,Center for Developmental Biology and Regenerative Medicine, Seattle Children's Research Institute, Seattle, Washington, USA
| | - Bart P Leroy
- Center for Medical Genetics, Ghent University and Ghent University Hospital, Ghent, Belgium.,Department of Ophthalmology, Ghent University and Ghent University Hospital, Ghent, Belgium.,Division of Ophthalmology, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | | | - Elfride De Baere
- Center for Medical Genetics, Ghent University and Ghent University Hospital, Ghent, Belgium
| |
Collapse
|
16
|
Segmental duplications: evolution and impact among the current Lepidoptera genomes. BMC Evol Biol 2017; 17:161. [PMID: 28683762 PMCID: PMC5499213 DOI: 10.1186/s12862-017-1007-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Accepted: 06/23/2017] [Indexed: 11/10/2022] Open
Abstract
Background Structural variation among genomes is now viewed to be as important as single nucleoid polymorphisms in influencing the phenotype and evolution of a species. Segmental duplication (SD) is defined as segments of DNA with homologous sequence. Results Here, we performed a systematic analysis of segmental duplications (SDs) among five lepidopteran reference genomes (Plutella xylostella, Danaus plexippus, Bombyx mori, Manduca sexta and Heliconius melpomene) to understand their potential impact on the evolution of these species. We find that the SDs content differed substantially among species, ranging from 1.2% of the genome in B. mori to 15.2% in H. melpomene. Most SDs formed very high identity (similarity higher than 90%) blocks but had very few large blocks. Comparative analysis showed that most of the SDs arose after the divergence of each linage and we found that P. xylostella and H. melpomene showed more duplications than other species, suggesting they might be able to tolerate extensive levels of variation in their genomes. Conserved ancestral and species specific SD events were assessed, revealing multiple examples of the gain, loss or maintenance of SDs over time. SDs content analysis showed that most of the genes embedded in SDs regions belonged to species-specific SDs (“Unique” SDs). Functional analysis of these genes suggested their potential roles in the lineage-specific evolution. SDs and flanking regions often contained transposable elements (TEs) and this association suggested some involvement in SDs formation. Further studies on comparison of gene expression level between SDs and non-SDs showed that the expression level of genes embedded in SDs was significantly lower, suggesting that structure changes in the genomes are involved in gene expression differences in species. Conclusions The results showed that most of the SDs were “unique SDs”, which originated after species formation. Functional analysis suggested that SDs might play different roles in different species. Our results provide a valuable resource beyond the genetic mutation to explore the genome structure for future Lepidoptera research. Electronic supplementary material The online version of this article (doi:10.1186/s12862-017-1007-y) contains supplementary material, which is available to authorized users.
Collapse
|
17
|
Zhang Y, Li S, Abyzov A, Gerstein MB. Landscape and variation of novel retroduplications in 26 human populations. PLoS Comput Biol 2017; 13:e1005567. [PMID: 28662076 PMCID: PMC5510864 DOI: 10.1371/journal.pcbi.1005567] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Revised: 07/14/2017] [Accepted: 05/12/2017] [Indexed: 01/10/2023] Open
Abstract
Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of 2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling.
Collapse
Affiliation(s)
- Yan Zhang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, Ohio, United States of America
| | - Shantao Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Alexej Abyzov
- Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, United States of America
| | - Mark B. Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
18
|
Feng X, Jiang J, Padhi A, Ning C, Fu J, Wang A, Mrode R, Liu JF. Characterization of genome-wide segmental duplications reveals a common genomic feature of association with immunity among domestic animals. BMC Genomics 2017; 18:293. [PMID: 28403820 PMCID: PMC5389087 DOI: 10.1186/s12864-017-3690-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 04/06/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Segmental duplications (SDs) commonly exist in plant and animal genomes, playing crucial roles in genomic rearrangement, gene innovation and the formation of copy number variants. However, they have received little attention in most livestock species. RESULTS Aiming at characterizing SDs across the genomes of diverse livestock species, we mapped genome-wide SDs of horse, rabbit, goat, sheep and chicken, and also enhanced the existing SD maps of cattle and pig genomes based on the most updated genome assemblies. We adopted two different detection strategies, whole genome analysis comparison and whole genome shotgun sequence detection, to pursue more convincing findings. Accordingly we identified SDs for each species with the length of from 21.7 Mb to 164.1 Mb, and 807 to 4,560 genes were harboured within the SD regions across different species. More interestingly, many of these SD-related genes were involved in the process of immunity and response to external stimuli. We also found the existence of 59 common genes within SD regions in all studied species except goat. These common genes mainly consisted of both UDP glucuronosyltransferase and Interferon alpha families, implying the connection between SDs and the evolution of these gene families. CONCLUSIONS Our findings provide insights into livestock genome evolution and offer rich genomic sources for livestock genomic research.
Collapse
Affiliation(s)
- Xiaotian Feng
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Jicai Jiang
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Abinash Padhi
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD, 20740, USA
| | - Chao Ning
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Jinluan Fu
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Aiguo Wang
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Raphael Mrode
- International Livestock Research Institute, Nairobi, Box 30709-00100, Kenya
| | - Jian-Feng Liu
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing, 100193, China.
| |
Collapse
|
19
|
Duplication of chicken defensin7 gene generated by gene conversion and homologous recombination. Proc Natl Acad Sci U S A 2016; 113:13815-13820. [PMID: 27849592 DOI: 10.1073/pnas.1616948113] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Defensins constitute an evolutionary conserved family of cationic antimicrobial peptides that play a key role in host innate immune responses to infection. Defensin genes generally reside in complex genomic regions that are prone to structural variation, and defensin genes exhibit extensive copy number variation in humans and in other species. Copy number variation of defensin genes was examined in inbred lines of Leghorn and Fayoumi chickens, and a duplication of defensin7 was discovered in the Fayoumi breed. Analysis of junction sequences confirmed the occurrence of a simple tandem duplication of defensin7 with sequence identity at the junction, suggesting nonallelic homologous recombination between defensin7 and defensin6 The duplication event generated two chimeric promoters that are best explained by gene conversion followed by homologous recombination. Expression of defensin7 was not elevated in animals with two genes despite both genes being transcribed in the tissues examined. Computational prediction of promoter regions revealed the presence of several putative transcription factor binding sites generated by the duplication event. These data provide insight into the evolution and possible function of large gene families and specifically, the defensins.
Collapse
|
20
|
Janoušek V, Laukaitis CM, Yanchukov A, Karn RC. The Role of Retrotransposons in Gene Family Expansions in the Human and Mouse Genomes. Genome Biol Evol 2016; 8:2632-50. [PMID: 27503295 PMCID: PMC5631067 DOI: 10.1093/gbe/evw192] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Retrotransposons comprise a large portion of mammalian genomes. They contribute to structural changes and more importantly to gene regulation. The expansion and diversification of gene families have been implicated as sources of evolutionary novelties. Given the roles retrotransposons play in genomes, their contribution to the evolution of gene families warrants further exploration. In this study, we found a significant association between two major retrotransposon classes, LINEs and LTRs, and lineage-specific gene family expansions in both the human and mouse genomes. The distribution and diversity differ between LINEs and LTRs, suggesting that each has a distinct involvement in gene family expansion. LTRs are associated with open chromatin sites surrounding the gene families, supporting their involvement in gene regulation, whereas LINEs may play a structural role promoting gene duplication. Our findings also suggest that gene family expansions, especially in the mouse genome, undergo two phases. The first phase is characterized by elevated deposition of LTRs and their utilization in reshaping gene regulatory networks. The second phase is characterized by rapid gene family expansion due to continuous accumulation of LINEs and it appears that, in some instances at least, this could become a runaway process. We provide an example in which this has happened and we present a simulation supporting the possibility of the runaway process. Altogether we provide evidence of the contribution of retrotransposons to the expansion and evolution of gene families. Our findings emphasize the putative importance of these elements in diversification and adaptation in the human and mouse lineages.
Collapse
Affiliation(s)
- Václav Janoušek
- Department of Zoology, Faculty of Science, Charles University in Prague, Prague, Czech Republic Institute of Vertebrate Biology, ASCR, Brno, Czech Republic
| | | | - Alexey Yanchukov
- Institute of Vertebrate Biology, ASCR, Brno, Czech Republic Department of Biology, Faculty of Arts and Sciences, Bülent Ecevit University, Zonguldak, Turkey
| | - Robert C Karn
- Department of Medicine, College of Medicine, University of Arizona
| |
Collapse
|
21
|
Jenkins GM, Goddard ME, Black MA, Brauning R, Auvray B, Dodds KG, Kijas JW, Cockett N, McEwan JC. Copy number variants in the sheep genome detected using multiple approaches. BMC Genomics 2016; 17:441. [PMID: 27277319 PMCID: PMC4898393 DOI: 10.1186/s12864-016-2754-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2015] [Accepted: 05/19/2016] [Indexed: 02/07/2023] Open
Abstract
Background Copy number variants (CNVs) are a type of polymorphism found to underlie phenotypic variation, both in humans and livestock. Most surveys of CNV in livestock have been conducted in the cattle genome, and often utilise only a single approach for the detection of copy number differences. Here we performed a study of CNV in sheep, using multiple methods to identify and characterise copy number changes. Comprehensive information from small pedigrees (trios) was collected using multiple platforms (array CGH, SNP chip and whole genome sequence data), with these data then analysed via multiple approaches to identify and verify CNVs. Results In total, 3,488 autosomal CNV regions (CNVRs) were identified in this study, which substantially builds on an initial survey of the sheep genome that identified 135 CNVRs. The average length of the identified CNVRs was 19 kb (range of 1 kb to 3.6 Mb), with shorter CNVRs being more frequent than longer CNVRs. The total length of all CNVRs was 67.6Mbps, which equates to 2.7 % of the sheep autosomes. For individuals this value ranged from 0.24 to 0.55 %, and the majority of CNVRs were identified in single animals. Rather than being uniformly distributed throughout the genome, CNVRs tended to be clustered. Application of three independent approaches for CNVR detection facilitated a comparison of validation rates. CNVs identified on the Roche-NimbleGen 2.1M CGH array generally had low validation rates with lower density arrays, while whole genome sequence data had the highest validation rate (>60 %). Conclusions This study represents the first comprehensive survey of the distribution, prevalence and characteristics of CNVR in sheep. Multiple approaches were used to detect CNV regions and it appears that the best method for verifying CNVR on a large scale involves using a combination of detection methodologies. The characteristics of the 3,488 autosomal CNV regions identified in this study are comparable to other CNV regions reported in the literature and provide a valuable and sizeable addition to the small subset of published sheep CNVs. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2754-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gemma M Jenkins
- AbacusBio Limited, 442 Moray Place, PO Box 5585, Dunedin, 9058, New Zealand.
| | - Michael E Goddard
- Victorian Department of Economic Development, Jobs, Transport and Resources, Bundoora, VIC, 3083, Australia
| | - Michael A Black
- Department of Biochemistry, University of Otago, 710 Cumberland St, Dunedin, 9054, New Zealand
| | - Rudiger Brauning
- AgResearch, Invermay Agricultural Centre, PB 50034, Mosgiel, 9053, New Zealand
| | - Benoit Auvray
- Department of Biochemistry, University of Otago, 710 Cumberland St, Dunedin, 9054, New Zealand
| | - Ken G Dodds
- AgResearch, Invermay Agricultural Centre, PB 50034, Mosgiel, 9053, New Zealand
| | - James W Kijas
- CSIRO Animal, Food and Health Sciences, Queensland Bioscience Precinct, 306 Carmody Road, St Lucia, QLD 4067, Australia
| | - Noelle Cockett
- Utah State University, 1435 Old Main Hill, Logan, UT, 84322-1435-1435, USA
| | - John C McEwan
- AgResearch, Invermay Agricultural Centre, PB 50034, Mosgiel, 9053, New Zealand
| |
Collapse
|
22
|
Urnikyte A, Domarkiene I, Stoma S, Ambrozaityte L, Uktveryte I, Meskiene R, Kasiulevičius V, Burokiene N, Kučinskas V. CNV analysis in the Lithuanian population. BMC Genet 2016; 17:64. [PMID: 27142071 PMCID: PMC4855864 DOI: 10.1186/s12863-016-0373-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 04/22/2016] [Indexed: 12/13/2022] Open
Abstract
Background Although copy number variation (CNV) has received much attention, knowledge about the characteristics of CNVs such as occurrence rate and distribution in the genome between populations and within the same population is still insufficient. In this study, Illumina 770 K HumanOmniExpress-12 v1.0 (and v1.1) arrays were used to examine the diversity and distribution of CNVs in 286 unrelated individuals from the two main ethnolinguistic groups of the Lithuanian population (Aukštaičiai and Žemaičiai) (see Additional file 3). For primary data analysis, the Illumina GenomeStudio™ Genotyping Module v1.9 and two algorithms, cnvPartition 3.2.0 and QuantiSNP 2.0, were used to identify high-confidence CNVs. Results A total of 478 autosomal CNVs were detected by both algorithms, and those were clustered in 87 copy number variation regions (CNVRs), spanning ~12.5 Mb of the genome (see Table 1). At least 8.6 % of the CNVRs were unique and had not been reported in the Database of Genomic Variants. Most CNVRs (57.5 %) were rare, with a frequency of <1 %, whereas common CNVRs with at least 5 % frequency made up only 1.1 % of all CNVRs identified. About 49 % of non-singleton CNVRs were shared between Aukštaičiai and Žemaičiai, and the remaining CNVRs were specific to each group. Many of the CNVs detected (66 %) overlapped with known UCSC gene regions. Conclusions The ethnolinguistic groups of the Lithuanian population could not be differentiated based on CNV profiles, which may reflect their geographical proximity and suggest the homogeneity of the Lithuanian population. In addition, putative novel CNVs unique to the Lithuanian population were identified. The results of our study enhance the CNV map of the Lithuanian population. Electronic supplementary material The online version of this article (doi:10.1186/s12863-016-0373-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- A Urnikyte
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania.
| | - I Domarkiene
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - S Stoma
- Master of Science (MSc), Bioinformatics student, VU University Amsterdam, Amsterdam, Netherlands
| | - L Ambrozaityte
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - I Uktveryte
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - R Meskiene
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - V Kasiulevičius
- Clinics of Internal Diseases, Family Medicine and Oncology, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - N Burokiene
- Clinics of Internal Diseases, Family Medicine and Oncology, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| | - V Kučinskas
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu St. 2, LT-08661, Vilnius, Lithuania
| |
Collapse
|
23
|
Bickhart DM, Xu L, Hutchison JL, Cole JB, Null DJ, Schroeder SG, Song J, Garcia JF, Sonstegard TS, Van Tassell CP, Schnabel RD, Taylor JF, Lewin HA, Liu GE. Diversity and population-genetic properties of copy number variations and multicopy genes in cattle. DNA Res 2016; 23:253-62. [PMID: 27085184 PMCID: PMC4909312 DOI: 10.1093/dnares/dsw013] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 02/29/2016] [Indexed: 11/14/2022] Open
Abstract
The diversity and population genetics of copy number variation (CNV) in domesticated animals are not well understood. In this study, we analysed 75 genomes of major taurine and indicine cattle breeds (including Angus, Brahman, Gir, Holstein, Jersey, Limousin, Nelore, and Romagnola), sequenced to 11-fold coverage to identify 1,853 non-redundant CNV regions. Supported by high validation rates in array comparative genomic hybridization (CGH) and qPCR experiments, these CNV regions accounted for 3.1% (87.5 Mb) of the cattle reference genome, representing a significant increase over previous estimates of the area of the genome that is copy number variable (∼2%). Further population genetics and evolutionary genomics analyses based on these CNVs revealed the population structures of the cattle taurine and indicine breeds and uncovered potential diversely selected CNVs near important functional genes, including AOX1, ASZ1, GAT, GLYAT, and KRTAP9-1. Additionally, 121 CNV gene regions were found to be either breed specific or differentially variable across breeds, such as RICTOR in dairy breeds and PNPLA3 in beef breeds. In contrast, clusters of the PRP and PAG genes were found to be duplicated in all sequenced animals, suggesting that subfunctionalization, neofunctionalization, or overdominance play roles in diversifying those fertility-related genes. These CNV results provide a new glimpse into the diverse selection histories of cattle breeds and a basis for correlating structural variation with complex traits in the future.
Collapse
Affiliation(s)
- Derek M Bickhart
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Lingyang Xu
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA
| | - Jana L Hutchison
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - John B Cole
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Daniel J Null
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Steven G Schroeder
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | - Jiuzhou Song
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD 20742, USA
| | | | - Tad S Sonstegard
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| | | | - Robert D Schnabel
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA Informatics Institute, University of Missouri, Columbia, MO, USA
| | - Jeremy F Taylor
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
| | - Harris A Lewin
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - George E Liu
- USDA-ARS, Animal Genomics and Improvement Laboratory, Beltsville, MD 20705, USA
| |
Collapse
|
24
|
Canzar S, Elbassioni K, Jones M, Mestre J. Resolving Conflicting Predictions from Multimapping Reads. J Comput Biol 2016; 23:203-17. [DOI: 10.1089/cmb.2015.0164] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Affiliation(s)
- Stefan Canzar
- Toyota Technological Institute at Chicago, Chicago, Illinois
| | - Khaled Elbassioni
- Masdar Institute of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Mitchell Jones
- School of Information Technologies, University of Sydney, Sydney, Australia
| | - Julián Mestre
- School of Information Technologies, University of Sydney, Sydney, Australia
| |
Collapse
|
25
|
Ng SK, Hu T, Long X, Chan CH, Tsang SY, Xue H. Feature co-localization landscape of the human genome. Sci Rep 2016; 6:20650. [PMID: 26854351 PMCID: PMC4745063 DOI: 10.1038/srep20650] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 01/07/2016] [Indexed: 12/11/2022] Open
Abstract
Although feature co-localizations could serve as useful guide-posts to genome architecture, a comprehensive and quantitative feature co-localization map of the human genome has been lacking. Herein we show that, in contrast to the conventional bipartite division of genomic sequences into genic and inter-genic regions, pairwise co-localizations of forty-two genomic features in the twenty-two autosomes based on 50-kb to 2,000-kb sequence windows indicate a tripartite zonal architecture comprising Genic zones enriched with gene-related features and Alu-elements; Proximal zones enriched with MIR- and L2-elements, transcription-factor-binding-sites (TFBSs), and conserved-indels (CIDs); and Distal zones enriched with L1-elements. Co-localizations between single-nucleotide-polymorphisms (SNPs) and copy-number-variations (CNVs) reveal a fraction of sequence windows displaying steeply enhanced levels of SNPs, CNVs and recombination rates that point to active adaptive evolution in such pathways as immune response, sensory perceptions, and cognition. The strongest positive co-localization observed between TFBSs and CIDs suggests a regulatory role of CIDs in cooperation with TFBSs. The positive co-localizations of cancer somatic CNVs (CNVT) with all Proximal zone and most Genic zone features, in contrast to the distinctly more restricted co-localizations exhibited by germline CNVs (CNVG), reveal disparate distributions of CNVTs and CNVGs indicative of dissimilarity in their underlying mechanisms.
Collapse
Affiliation(s)
- Siu-Kin Ng
- Division of Life Science, Applied Genomics Center and Center for Statistical Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Taobo Hu
- Division of Life Science, Applied Genomics Center and Center for Statistical Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Xi Long
- Division of Life Science, Applied Genomics Center and Center for Statistical Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Cheuk-Hin Chan
- Division of Life Science, Applied Genomics Center and Center for Statistical Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Shui-Ying Tsang
- Division of Life Science, Applied Genomics Center and Center for Statistical Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| | - Hong Xue
- Division of Life Science, Applied Genomics Center and Center for Statistical Science, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
| |
Collapse
|
26
|
Abstract
Chromosomal copy number changes are frequently associated with harmful consequences and are thought of as an underlying mechanism for the development of diseases. However, changes in copy number are observed during development and occur during normal biological processes. In this review, we highlight the causes and consequences of copy number changes in normal physiologic processes as well as cover their associations with cancer and acquired drug resistance. We discuss the permanent and transient nature of copy number gains and relate these observations to a new mechanism driving transient site-specific copy gains (TSSGs). Finally, we discuss implications of TSSGs in generating intratumoral heterogeneity and tumor evolution and how TSSGs can influence the therapeutic response in cancer.
Collapse
Affiliation(s)
- Sweta Mishra
- Massachusetts General Hospital Cancer Center and Department of Medicine, Harvard Medical School, Charlestown, Massachusetts, USA
| | - Johnathan R Whetstine
- Massachusetts General Hospital Cancer Center and Department of Medicine, Harvard Medical School, Charlestown, Massachusetts, USA
| |
Collapse
|
27
|
Structural Variant Detection by Large-scale Sequencing Reveals New Evolutionary Evidence on Breed Divergence between Chinese and European Pigs. Sci Rep 2016; 6:18501. [PMID: 26729041 PMCID: PMC4700453 DOI: 10.1038/srep18501] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 11/19/2015] [Indexed: 01/28/2023] Open
Abstract
In this study, we performed a genome-wide SV detection among the genomes of thirteen pigs from diverse Chinese and European originated breeds by next genetation sequencing, and constrcuted a single-nucleotide resolution map involving 56,930 putative SVs. We firstly identified a SV hotspot spanning 35 Mb region on the X chromosome specifically in the genomes of Chinese originated individuals. Further scrutinizing this region by large-scale sequencing data of extra 111 individuals, we obtained the confirmatory evidence on our initial finding. Moreover, thirty five SV-related genes within the hotspot region, being of importance for reproduction ability, rendered significant different evolution rates between Chinese and European originated breeds. The SV hotspot identified herein offers a novel evidence for assessing phylogenetic relationships, as well as likely explains the genetic difference of corresponding phenotypes and features, among Chinese and European pig breeds. Furthermore, we employed various SVs to infer genetic structure of individuls surveyed. We found SVs can clearly detect the difference of genetic background among individuals. This clues us that genome-wide SVs can capture majority of geneic variation and be applied into cladistic analyses. Characterizing whole genome SVs demonstrated that SVs are significantly enriched/depleted with various genomic features.
Collapse
|
28
|
Wang H, Wang C, Yang K, Liu J, Zhang Y, Wang Y, Xu X, Michal JJ, Jiang Z, Liu B. Genome Wide Distributions and Functional Characterization of Copy Number Variations between Chinese and Western Pigs. PLoS One 2015; 10:e0131522. [PMID: 26154170 PMCID: PMC4496047 DOI: 10.1371/journal.pone.0131522] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2015] [Accepted: 06/03/2015] [Indexed: 01/02/2023] Open
Abstract
Copy number variations (CNVs) refer to large insertions, deletions and duplications in the genomic structure ranging from one thousand to several million bases in size. Since the development of next generation sequencing technology, several methods have been well built for detection of copy number variations with high credibility and accuracy. Evidence has shown that CNV occurring in gene region could lead to phenotypic changes due to the alteration in gene structure and dosage. However, it still remains unexplored whether CNVs underlie the phenotypic differences between Chinese and Western domestic pigs. Based on the read-depth methods, we investigated copy number variations using 49 individuals derived from both Chinese and Western pig breeds. A total of 3,131 copy number variation regions (CNVRs) were identified with an average size of 13.4 Kb in all individuals during domestication, harboring 1,363 genes. Among them, 129 and 147 CNVRs were Chinese and Western pig specific, respectively. Gene functional enrichments revealed that these CNVRs contribute to strong disease resistance and high prolificacy in Chinese domestic pigs, but strong muscle tissue development in Western domestic pigs. This finding is strongly consistent with the morphologic characteristics of Chinese and Western pigs, indicating that these group-specific CNVRs might have been preserved by artificial selection for the favored phenotypes during independent domestication of Chinese and Western pigs. In this study, we built high-resolution CNV maps in several domestic pig breeds and discovered the group specific CNVs by comparing Chinese and Western pigs, which could provide new insight into genomic variations during pigs’ independent domestication, and facilitate further functional studies of CNV-associated genes.
Collapse
Affiliation(s)
- Hongyang Wang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Chao Wang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Kui Yang
- Modern Educational & Technology Centre of Huazhong Agricultural University, Wuhan, PR China
| | - Jing Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Yu Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Yanan Wang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Xuewen Xu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
| | - Jennifer J. Michal
- Department of Animal Sciences, Washington State University, Pullman, WA, United States of America
| | - Zhihua Jiang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- Department of Animal Sciences, Washington State University, Pullman, WA, United States of America
| | - Bang Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, PR China
- The Cooperative Innovation Center for Sustainable Pig Production, Huazhong Agricultural University, Wuhan, PR China
- * E-mail:
| |
Collapse
|
29
|
Genome-wide analysis of copy number variations in Chinese sheep using array comparative genomic hybridization. Small Rumin Res 2015. [DOI: 10.1016/j.smallrumres.2015.04.014] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
30
|
Zhang Z, Mao L, Chen H, Bu F, Li G, Sun J, Li S, Sun H, Jiao C, Blakely R, Pan J, Cai R, Luo R, Van de Peer Y, Jacobsen E, Fei Z, Huang S. Genome-Wide Mapping of Structural Variations Reveals a Copy Number Variant That Determines Reproductive Morphology in Cucumber. THE PLANT CELL 2015; 27:1595-604. [PMID: 26002866 PMCID: PMC4498199 DOI: 10.1105/tpc.114.135848] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Revised: 03/26/2015] [Accepted: 04/30/2015] [Indexed: 05/18/2023]
Abstract
Structural variations (SVs) represent a major source of genetic diversity. However, the functional impact and formation mechanisms of SVs in plant genomes remain largely unexplored. Here, we report a nucleotide-resolution SV map of cucumber (Cucumis sativas) that comprises 26,788 SVs based on deep resequencing of 115 diverse accessions. The largest proportion of cucumber SVs was formed through nonhomologous end-joining rearrangements, and the occurrence of SVs is closely associated with regions of high nucleotide diversity. These SVs affect the coding regions of 1676 genes, some of which are associated with cucumber domestication. Based on the map, we discovered a copy number variation (CNV) involving four genes that defines the Female (F) locus and gives rise to gynoecious cucumber plants, which bear only female flowers and set fruit at almost every node. The CNV arose from a recent 30.2-kb duplication at a meiotically unstable region, likely via microhomology-mediated break-induced replication. The SV set provides a snapshot of structural variations in plants and will serve as an important resource for exploring genes underlying key traits and for facilitating practical breeding in cucumber.
Collapse
Affiliation(s)
- Zhonghua Zhang
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China
| | - Linyong Mao
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York 14853
| | - Huiming Chen
- Hunan Vegetable Research Institute, Hunan Academy of Agricultural Sciences, Changsha 410125, China
| | - Fengjiao Bu
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China Agricultural Genomic Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| | - Guangcun Li
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Jinjing Sun
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China
| | - Shuai Li
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China
| | - Honghe Sun
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York 14853
| | - Chen Jiao
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York 14853
| | - Rachel Blakely
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York 14853
| | - Junsong Pan
- Shanghai Jiaotong University, Shanghai 200240, China
| | - Run Cai
- Shanghai Jiaotong University, Shanghai 200240, China
| | - Ruibang Luo
- Department of Computer Science, University of Hong Kong, Hong Kong 999077, China
| | - Yves Van de Peer
- Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium Genomics Research Institute, University of Pretoria, Pretoria 0028, South Africa
| | - Evert Jacobsen
- Deparment of Plant Sciences, Laboratory of Plant Breeding, Wageningen University and Research Centre, 6700AA Wageningen, The Netherlands
| | - Zhangjun Fei
- Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, New York 14853 USDA-ARS Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853
| | - Sanwen Huang
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics, Beijing 100081, China Agricultural Genomic Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China
| |
Collapse
|
31
|
Landais E, Leroy C, Kleinfinger P, Brunet S, Koubi V, Pietrement C, Poli-Mérol ML, Fiquet C, Souchon PF, Beri M, Jonveaux P, Garnotel R, Gaillard D, Doco-Fenzy M. A pure familial 6q15q21 split duplication associated with obesity and transmitted with partial reduction. Am J Med Genet A 2015; 167:1275-84. [PMID: 25900228 DOI: 10.1002/ajmg.a.36995] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Accepted: 12/29/2014] [Indexed: 01/06/2023]
Abstract
Familial transmission of chromosome 6 duplications is rare. We report on the first observation of a maternally-inherited pure segmental 6q duplication split into two segments, 6q15q16.3 and 6q16.3q21, and associated with obesity. Obesity has previously been correlated to chromosome 6 q-arm deletion but has not yet been assessed in duplications. The aim of this study was to characterize the structure of these intrachromosomal insertional translocations by classic cytogenetic banding, array-CGH, FISH, M-banding and genotyping using microsatellites and SNP array analysis, in a mother and four offspring. The duplicated 6q segments, 9.75 Mb (dup 1) and 7.05 Mb (dup 2) in size in the mother, were inserted distally into two distinct chromosome 6q regions. They were transmitted to four offspring. A son and a daughter inherited the two unbalanced insertions and displayed, like the mother, an abnormal phenotype with facial dysmorphism, intellectual disability, and morbid obesity. Curiously, two daughters with a normal phenotype inherited only the smaller segment, 6q16.3q21. The abnormal phenotype was associated with the larger proximal 6q15q16.3 duplication. We hypothesize a mechanism for this exceptional phenomenon of recurrent reduction and transmission of the duplication during meiosis in a family. We expect the interpretation of our findings to be useful for genetic counseling and for understanding the mechanisms underlying these large segmental 6q duplications and their evolution.
Collapse
Affiliation(s)
- Emilie Landais
- CHU-Reims, HMB, Service de Génétique, France.,CHU-Reims, HMB, Plateforme Régionale de Biologie Innovante, France
| | - Camille Leroy
- CHU-Reims, HMB, Service de Génétique, France.,Université de Reims Champagne-Ardenne, UFR de médecine, France
| | | | | | - Valérie Koubi
- Service de génétique Médicale, Laboratoire de génétique moléculaire, CHU Hopital Necker enfants malades, Paris, France
| | | | - Marie-Laurence Poli-Mérol
- Université de Reims Champagne-Ardenne, UFR de médecine, France.,CHU-Reims, American Memorial Hospital, Service de Chirurgie pédiatrique, France
| | - Caroline Fiquet
- CHU-Reims, American Memorial Hospital, Service de Chirurgie pédiatrique, France.,SFR CAP Santé, Reims, EA 3801, France
| | | | - Mylène Beri
- CHU-Nancy, Laboratoire de Génétique Médicale, Nancy Université, France
| | - Philippe Jonveaux
- CHU-Nancy, Laboratoire de Génétique Médicale, Nancy Université, France
| | - Roselyne Garnotel
- CHU-Reims, Laboratoire de Biochimie Médicale et Biologie Moléculaire, CNRS UMR 6198, UFR, Médecine, France
| | - Dominique Gaillard
- CHU-Reims, HMB, Service de Génétique, France.,Université de Reims Champagne-Ardenne, UFR de médecine, France
| | - Martine Doco-Fenzy
- CHU-Reims, HMB, Service de Génétique, France.,SFR CAP Santé, Reims, EA 3801, France
| |
Collapse
|
32
|
Lee HE, Ayarpadikannan S, Kim HS. Role of transposable elements in genomic rearrangement, evolution, gene regulation and epigenetics in primates. Genes Genet Syst 2015; 90:245-57. [DOI: 10.1266/ggs.15-00016] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Hee-Eun Lee
- Department of Biological Sciences, College of Natural Sciences, Pusan National University
- Genetic Engineering Institute, Pusan National University
| | - Selvam Ayarpadikannan
- Department of Biological Sciences, College of Natural Sciences, Pusan National University
| | - Heui-Soo Kim
- Department of Biological Sciences, College of Natural Sciences, Pusan National University
- Genetic Engineering Institute, Pusan National University
| |
Collapse
|
33
|
Chen L, Zhou W, Zhang L, Zhang F. Genome architecture and its roles in human copy number variation. Genomics Inform 2014; 12:136-44. [PMID: 25705150 PMCID: PMC4330246 DOI: 10.5808/gi.2014.12.4.136] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2014] [Revised: 11/12/2014] [Accepted: 11/12/2014] [Indexed: 02/06/2023] Open
Abstract
Besides single-nucleotide variants in the human genome, large-scale genomic variants, such as copy number variations (CNVs), are being increasingly discovered as a genetic source of human diversity and the pathogenic factors of diseases. Recent experimental findings have shed light on the links between different genome architectures and CNV mutagenesis. In this review, we summarize various genomic features and discuss their contributions to CNV formation. Genomic repeats, including both low-copy and high-copy repeats, play important roles in CNV instability, which was initially known as DNA recombination events. Furthermore, it has been found that human genomic repeats can also induce DNA replication errors and consequently result in CNV mutations. Some recent studies showed that DNA replication timing, which reflects the high-order information of genomic organization, is involved in human CNV mutations. Our review highlights that genome architecture, from DNA sequence to high-order genomic organization, is an important molecular factor in CNV mutagenesis and human genomic instability.
Collapse
Affiliation(s)
- Lu Chen
- School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Weichen Zhou
- School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Ling Zhang
- School of Life Sciences, Fudan University, Shanghai 200438, China
| | - Feng Zhang
- School of Life Sciences, Fudan University, Shanghai 200438, China. ; Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai 200438, China
| |
Collapse
|
34
|
Extensive copy-number variation of young genes across stickleback populations. PLoS Genet 2014; 10:e1004830. [PMID: 25474574 PMCID: PMC4256280 DOI: 10.1371/journal.pgen.1004830] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 10/16/2014] [Indexed: 12/30/2022] Open
Abstract
Duplicate genes emerge as copy-number variations (CNVs) at the population level, and remain copy-number polymorphic until they are fixed or lost. The successful establishment of such structural polymorphisms in the genome plays an important role in evolution by promoting genetic diversity, complexity and innovation. To characterize the early evolutionary stages of duplicate genes and their potential adaptive benefits, we combine comparative genomics with population genomics analyses to evaluate the distribution and impact of CNVs across natural populations of an eco-genomic model, the three-spined stickleback. With whole genome sequences of 66 individuals from populations inhabiting three distinct habitats, we find that CNVs generally occur at low frequencies and are often only found in one of the 11 populations surveyed. A subset of CNVs, however, displays copy-number differentiation between populations, showing elevated within-population frequencies consistent with local adaptation. By comparing teleost genomes to identify lineage-specific genes and duplications in sticklebacks, we highlight rampant gene content differences among individuals in which over 30% of young duplicate genes are CNVs. These CNV genes are evolving rapidly at the molecular level and are enriched with functional categories associated with environmental interactions, depicting the dynamic early copy-number polymorphic stage of genes during population differentiation. After a locus is duplicated in a genome, individuals from a population instantaneously differ in the number of copies of this locus producing a copy-number variation (CNV). Over time, the joint effects of selection and other evolutionary forces will act to either eliminate the extra genetic copy or retain it. Depending on this evolutionary interplay, young duplications, including newly duplicated genes, can persist for millions of years as CNVs. CNVs may especially be prevalent between populations that have colonized and adapted to disparate environments in which selective pressures differ. Using whole genome sequences from several populations of three-spined sticklebacks that inhabit different environments, we find that a third of young duplicated genes are CNVs. These young CNV genes are enriched with environmental response functions and evolving rapidly at the molecular level, making them promising candidates for a role in the rapid ecological adaptation to novel environments.
Collapse
|
35
|
Li W, Freudenberg J. Characterizing regions in the human genome unmappable by next-generation-sequencing at the read length of 1000 bases. Comput Biol Chem 2014; 53 Pt A:108-17. [PMID: 25241312 DOI: 10.1016/j.compbiolchem.2014.08.015] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2014] [Indexed: 12/31/2022]
Abstract
Repetitive and redundant regions of a genome are particularly problematic for mapping sequencing reads. In the present paper, we compile a list of the unmappable regions in the human genome based on the following definition: hypothetical reads with length 1 kb which cannot be uniquely mapped with zero-mismatch alignment for the described regions, considering both the forward and reverse strand. The respective collection of unmappable regions covers 0.77% of the sequence of human autosomes and 8.25% of the sex chromosomes in the reference genome GRCh37/hg19 (overall 1.23%). Not surprisingly, our unmappable regions overlap greatly with segmental duplication, transposable elements, and structural variants. About 99.8% of bases in our unmappable regions are part of either segmental duplication or transposable elements and 98.3% overlap structural variant annotations. Notably, some of these regions overlap units with important biological functions, including 4% of protein-coding genes. In contrast, these regions have zero intersection with the ultraconserved elements, very low overlap with microRNAs, tRNAs, pseudogenes, CpG islands, tandem repeats, microsatellites, sensitive non-coding regions, and the mapping blacklist regions from the ENCODE project.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, NY 11030, USA.
| | - Jan Freudenberg
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, 350 Community Drive, Manhasset, NY 11030, USA
| |
Collapse
|
36
|
Jiang J, Wang J, Wang H, Zhang Y, Kang H, Feng X, Wang J, Yin Z, Bao W, Zhang Q, Liu JF. Global copy number analyses by next generation sequencing provide insight into pig genome variation. BMC Genomics 2014; 15:593. [PMID: 25023178 PMCID: PMC4111851 DOI: 10.1186/1471-2164-15-593] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 07/04/2014] [Indexed: 01/10/2023] Open
Abstract
Background Copy number variations (CNVs) confer significant effects on genetic innovation and phenotypic variation. Previous CNV studies in swine seldom focused on in-depth characterization of global CNVs. Results Using whole-genome assembly comparison (WGAC) and whole-genome shotgun sequence detection (WSSD) approaches by next generation sequencing (NGS), we probed formation signatures of both segmental duplications (SDs) and individualized CNVs in an integrated fashion, building the finest resolution CNV and SD maps of pigs so far. We obtained copy number estimates of all protein-coding genes with copy number variation carried by individuals, and further confirmed two genes with high copy numbers in Meishan pigs through an enlarged population. We determined genome-wide CNV hotspots, which were significantly enriched in SD regions, suggesting evolution of CNV hotspots may be affected by ancestral SDs. Through systematically enrichment analyses based on simulations and bioinformatics analyses, we revealed CNV-related genes undergo a different selective constraint from those CNV-unrelated regions, and CNVs may be associated with or affect pig health and production performance under recent selection. Conclusions Our studies lay out one way for characterization of CNVs in the pig genome, provide insight into the pig genome variation and prompt CNV mechanisms studies when using pigs as biomedical models for human diseases. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-593) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Jian-Feng Liu
- National Engineering Laboratory for Animal Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction, Ministry of Agriculture, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China.
| |
Collapse
|
37
|
Ebert G, Steininger A, Weißmann R, Boldt V, Lind-Thomsen A, Grune J, Badelt S, Heßler M, Peiser M, Hitzler M, Jensen LR, Müller I, Hu H, Arndt PF, Kuss AW, Tebel K, Ullmann R. Distribution of segmental duplications in the context of higher order chromatin organisation of human chromosome 7. BMC Genomics 2014; 15:537. [PMID: 24973960 PMCID: PMC4092221 DOI: 10.1186/1471-2164-15-537] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 06/17/2014] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Segmental duplications (SDs) are not evenly distributed along chromosomes. The reasons for this biased susceptibility to SD insertion are poorly understood. Accumulation of SDs is associated with increased genomic instability, which can lead to structural variants and genomic disorders such as the Williams-Beuren syndrome. Despite these adverse effects, SDs have become fixed in the human genome. Focusing on chromosome 7, which is particularly rich in interstitial SDs, we have investigated the distribution of SDs in the context of evolution and the three dimensional organisation of the chromosome in order to gain insights into the mutual relationship of SDs and chromatin topology. RESULTS Intrachromosomal SDs preferentially accumulate in those segments of chromosome 7 that are homologous to marmoset chromosome 2. Although this formerly compact segment has been re-distributed to three different sites during primate evolution, we can show by means of public data on long distance chromatin interactions that these three intervals, and consequently the paralogous SDs mapping to them, have retained their spatial proximity in the nucleus. Focusing on SD clusters implicated in the aetiology of the Williams-Beuren syndrome locus we demonstrate by cross-species comparison that these SDs have inserted at the borders of a topological domain and that they flank regions with distinct DNA conformation. CONCLUSIONS Our study suggests a link of nuclear architecture and the propagation of SDs across chromosome 7, either by promoting regional SD insertion or by contributing to the establishment of higher order chromatin organisation themselves. The latter could compensate for the high risk of structural rearrangements and thus may have contributed to their evolutionary fixation in the human genome.
Collapse
Affiliation(s)
- Grit Ebert
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
- />Department of Biology, Chemistry and Pharmacy, Free University Berlin, 14195 Berlin, Germany
| | - Anne Steininger
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
- />Department of Biology, Chemistry and Pharmacy, Free University Berlin, 14195 Berlin, Germany
| | - Robert Weißmann
- />Department of Human Genetics, University Medicine Greifswald, and Interfaculty Institute of Genetics and Functional Genomics, University of Greifswald, Fleischmannstraße 42-44, 17475 Greifswald, Germany
| | - Vivien Boldt
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
- />Department of Biology, Chemistry and Pharmacy, Free University Berlin, 14195 Berlin, Germany
| | - Allan Lind-Thomsen
- />Wilhelm Johannsen Centre for Functional Genome Research, Department of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3, DK-2200 Copenhagen, Denmark
| | - Jana Grune
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Stefan Badelt
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
- />Institute for Theoretical Chemistry, University of Vienna, Waehringer Straße 17, A-1090 Vienna, Austria
| | - Melanie Heßler
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Matthias Peiser
- />Unit Experimental Research, Department of Product Safety, Federal Institute for Bundeswehr Institute of Radiobiology affiliated, the University of Ulm, Neuherbergstraße 11, 80937 Munich, Germany
| | - Manuel Hitzler
- />Unit Experimental Research, Department of Product Safety, Federal Institute for Bundeswehr Institute of Radiobiology affiliated, the University of Ulm, Neuherbergstraße 11, 80937 Munich, Germany
| | - Lars R Jensen
- />Department of Human Genetics, University Medicine Greifswald, and Interfaculty Institute of Genetics and Functional Genomics, University of Greifswald, Fleischmannstraße 42-44, 17475 Greifswald, Germany
| | - Ines Müller
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Hao Hu
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Peter F Arndt
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Andreas W Kuss
- />Department of Human Genetics, University Medicine Greifswald, and Interfaculty Institute of Genetics and Functional Genomics, University of Greifswald, Fleischmannstraße 42-44, 17475 Greifswald, Germany
| | - Katrin Tebel
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | - Reinhard Ullmann
- />Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| |
Collapse
|
38
|
Zhang Q, Su B. Evolutionary origin and human-specific expansion of a cancer/testis antigen gene family. Mol Biol Evol 2014; 31:2365-75. [PMID: 24916032 DOI: 10.1093/molbev/msu188] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Cancer/testis (CT) antigens are encoded by germline genes and are aberrantly expressed in a number of human cancers. Interestingly, CT antigens are frequently involved in gene families that are highly expressed in germ cells. Here, we presented an evolutionary analysis of the CTAGE (cutaneous T-cell-lymphoma-associated antigen) gene family to delineate its molecular history and functional significance during primate evolution. Comparisons among human, chimpanzee, gorilla, orangutan, macaque, marmoset, and other mammals show a rapid and primate specific expansion of CTAGE family, which starts with an ancestral retroposition in the haplorhini ancestor. Subsequent DNA-based duplications lead to the prosperity of single-exon CTAGE copies in catarrhines, especially in humans. Positive selection was identified on the single-exon copies in comparison with functional constraint on the multiexon copies. Further sequence analysis suggests that the newly derived CTAGE genes may obtain regulatory elements from long terminal repeats. Our result indicates the dynamic evolution of primate genomes, and the recent expansion of this CT antigen family in humans may confer advantageous phenotypic traits during early human evolution.
Collapse
Affiliation(s)
- Qu Zhang
- Department of Human Evolutionary Biology, Graduate School of Art and Science, Harvard University
| | - Bing Su
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
39
|
Recurrent duplications of the annexin A1 gene (ANXA1) in autism spectrum disorders. Mol Autism 2014; 5:28. [PMID: 24720851 PMCID: PMC4098665 DOI: 10.1186/2040-2392-5-28] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 03/17/2014] [Indexed: 11/10/2022] Open
Abstract
Background Validating the potential pathogenicity of copy number variants (CNVs) identified in genome-wide studies of autism spectrum disorders (ASD) requires detailed assessment of case/control frequencies, inheritance patterns, clinical correlations, and functional impact. Here, we characterize a small recurrent duplication in the annexin A1 (ANXA1) gene, identified by the Autism Genome Project (AGP) study. Methods From the AGP CNV genomic screen in 2,147 ASD individuals, we selected for characterization an ANXA1 gene duplication that was absent in 4,964 population-based controls. We further screened the duplication in a follow-up sample including 1,496 patients and 410 controls, and evaluated clinical correlations and family segregation. Sequencing of exonic/downstream ANXA1 regions was performed in 490 ASD patients for identification of additional variants. Results The ANXA1 duplication, overlapping the last four exons and 3’UTR region, had an overall prevalence of 11/3,643 (0.30%) in unrelated ASD patients but was not identified in 5,374 controls. Duplication carriers presented no distinctive clinical phenotype. Family analysis showed neuropsychiatric deficits and ASD traits in multiple relatives carrying the duplication, suggestive of a complex genetic inheritance. Sequencing of exonic regions and the 3’UTR identified 11 novel changes, but no obvious variants with clinical significance. Conclusions We provide multilevel evidence for a role of ANXA1 in ASD etiology. Given its important role as mediator of glucocorticoid function in a wide variety of brain processes, including neuroprotection, apoptosis, and control of the neuroendocrine system, the results add ANXA1 to the growing list of rare candidate genetic etiological factors for ASD.
Collapse
|
40
|
Makino T, McLysaght A, Kawata M. Genome-wide deserts for copy number variation in vertebrates. Nat Commun 2014; 4:2283. [PMID: 23917329 DOI: 10.1038/ncomms3283] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 07/10/2013] [Indexed: 01/14/2023] Open
Abstract
Most copy number variations are neutral, but some are deleterious and associated with various human diseases. Copy number variations are distributed non-randomly in vertebrate genomes, and it was recently reported that ohnologs, which are duplicated genes derived from whole genome duplication, are refractory to copy number variations. However, it is unclear what genomic factors affect the deleterious effects of copy number variations and the biological significance of the biased genomic distribution of copy number variations remains poorly understood. Here we show that non-ohnologs neighbouring ohnologs are unlikely to have copy number variations, resulting in ohnolog-rich regions in vertebrate genomes being copy number variation deserts. Our results suggest that the genomic location of ohnologs is a determining factor in the retention of copy number variations and that the dosage-balanced ohnologs are likely to cause the deleterious effects of copy number variations in these regions. We propose that investigating copy number variation of genes in regions that are typically copy number variation deserts is an efficient means to find disease-related copy number variations.
Collapse
Affiliation(s)
- Takashi Makino
- Department of Ecology and Evolutionary Biology, Graduate School of Life Sciences, Tohoku University, 6-3, Aramaki Aza Aoba, Aoba-ku 980-8578, Japan.
| | | | | |
Collapse
|
41
|
Bickhart DM, Liu GE. The challenges and importance of structural variation detection in livestock. Front Genet 2014; 5:37. [PMID: 24600474 PMCID: PMC3927395 DOI: 10.3389/fgene.2014.00037] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2013] [Accepted: 01/31/2014] [Indexed: 01/25/2023] Open
Abstract
Recent studies in humans and other model organisms have demonstrated that structural variants (SVs) comprise a substantial proportion of variation among individuals of each species. Many of these variants have been linked to debilitating diseases in humans, thereby cementing the importance of refining methods for their detection. Despite progress in the field, reliable detection of SVs still remains a problem even for human subjects. Many of the underlying problems that make SVs difficult to detect in humans are amplified in livestock species, whose lower quality genome assemblies and incomplete gene annotation can often give rise to false positive SV discoveries. Regardless of the challenges, SV detection is just as important for livestock researchers as it is for human researchers, given that several productive traits and diseases have been linked to copy number variations (CNVs) in cattle, sheep, and pig. Already, there is evidence that many beneficial SVs have been artificially selected in livestock such as a duplication of the agouti signaling protein gene that causes white coat color in sheep. In this review, we will list current SV and CNV discoveries in livestock and discuss the problems that hinder routine discovery and tracking of these polymorphisms. We will also discuss the impacts of selective breeding on CNV and SV frequencies and mention how SV genotyping could be used in the future to improve genetic selection.
Collapse
Affiliation(s)
- Derek M Bickhart
- Animal Improvement Programs Laboratory, United States Department of Agriculture-Agricultural Research Service Beltsville, MD, USA
| | - George E Liu
- Bovine Functional Genomics Laboratory, United States Department of Agriculture-Agricultural Research Service Beltsville, MD, USA
| |
Collapse
|
42
|
Behura SK, Severson DW. Association of microsatellite pairs with segmental duplications in insect genomes. BMC Genomics 2013; 14:907. [PMID: 24359442 PMCID: PMC3878106 DOI: 10.1186/1471-2164-14-907] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Accepted: 12/16/2013] [Indexed: 11/30/2022] Open
Abstract
Background Segmental duplications (SDs), also known as low-copy repeats, are DNA sequences of length greater than 1 kb which are duplicated with a high degree of sequence identity (greater than 90%) causing instability in genomes. SDs are generally found in the genome as mosaic forms of duplicated sequences which are generated by a two-step process: first, multiple duplicated sequences are aggregated at specific genomic regions, and then, these primary duplications undergo multiple secondary duplications. However, the mechanism of how duplicated sequences are aggregated in the first place is not well understood. Results By analyzing the distribution of microsatellite sequences among twenty insect species in a genome-wide manner it was found that pairs of microsatellites along with the intervening sequences were duplicated multiple times in each genome. They were found as low copy repeats or segmental duplications when the duplicated loci were greater than 1 kb in length and had greater than 90% sequence similarity. By performing a sliding-window genomic analysis for number of paired microsatellites and number of segmental duplications, it was observed that regions rich in repetitive paired microsatellites tend to get richer in segmental duplication suggesting a “rich-gets-richer” mode of aggregation of the duplicated loci in specific regions of the genome. Results further show that the relationship between number of paired microsatellites and segmental duplications among the species is independent of the known phylogeny suggesting that association of microsatellites with segmental duplications may be a species-specific evolutionary process. It was also observed that the repetitive microsatellite pairs are associated with gene duplications but those sequences are rarely retained in the orthologous genes between species. Although some of the duplicated sequences with microsatellites as termini were found within transposable elements (TEs) of Drosophila, most of the duplications are found in the TE-free and gene-free regions of the genome. Conclusion The study clearly suggests that microsatellites are instrumental in extensive sequence duplications that may contribute to species-specific evolution of genome plasticity in insects.
Collapse
Affiliation(s)
- Susanta K Behura
- Eck Institute for Global Health, Department of Biological Sciences, University of Notre Dame, Notre Dame, IN 46556, USA.
| | | |
Collapse
|
43
|
Inferring polymorphism-induced regulatory gene networks active in human lymphocyte cell lines by weighted linear mixed model analysis of multiple RNA-Seq datasets. PLoS One 2013; 8:e78868. [PMID: 24205334 PMCID: PMC3813575 DOI: 10.1371/journal.pone.0078868] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 09/18/2013] [Indexed: 11/21/2022] Open
Abstract
Single-nucleotide polymorphisms (SNPs) contribute to the between-individual expression variation of many genes. A regulatory (trait-associated) SNP is usually located near or within a (host) gene, possibly influencing the gene’s transcription or/and post-transcriptional modification. But its targets may also include genes that are physically farther away from it. A heuristic explanation of such multiple-target interferences is that the host gene transfers the SNP genotypic effects to the distant gene(s) by a transcriptional or signaling cascade. These connections between the host genes (regulators) and the distant genes (targets) make the genetic analysis of gene expression traits a promising approach for identifying unknown regulatory relationships. In this study, through a mixed model analysis of multi-source digital expression profiling for 140 human lymphocyte cell lines (LCLs) and the genotypes distributed by the international HapMap project, we identified 45 thousands of potential SNP-induced regulatory relationships among genes (the significance level for the underlying associations between expression traits and SNP genotypes was set at FDR < 0.01). We grouped the identified relationships into four classes (paradigms) according to the two different mechanisms by which the regulatory SNPs affect their cis- and trans- regulated genes, modifying mRNA level or altering transcript splicing patterns. We further organized the relationships in each class into a set of network modules with the cis- regulated genes as hubs. We found that the target genes in a network module were often characterized by significant functional similarity, and the distributions of the target genes in three out of the four networks roughly resemble a power-law, a typical pattern of gene networks obtained from mutation experiments. By two case studies, we also demonstrated that significant biological insights can be inferred from the identified network modules.
Collapse
|
44
|
Livnat A. Interaction-based evolution: how natural selection and nonrandom mutation work together. Biol Direct 2013; 8:24. [PMID: 24139515 PMCID: PMC4231362 DOI: 10.1186/1745-6150-8-24] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Accepted: 09/26/2013] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND The modern evolutionary synthesis leaves unresolved some of the most fundamental, long-standing questions in evolutionary biology: What is the role of sex in evolution? How does complex adaptation evolve? How can selection operate effectively on genetic interactions? More recently, the molecular biology and genomics revolutions have raised a host of critical new questions, through empirical findings that the modern synthesis fails to explain: for example, the discovery of de novo genes; the immense constructive role of transposable elements in evolution; genetic variance and biochemical activity that go far beyond what traditional natural selection can maintain; perplexing cases of molecular parallelism; and more. PRESENTATION OF THE HYPOTHESIS Here I address these questions from a unified perspective, by means of a new mechanistic view of evolution that offers a novel connection between selection on the phenotype and genetic evolutionary change (while relying, like the traditional theory, on natural selection as the only source of feedback on the fit between an organism and its environment). I hypothesize that the mutation that is of relevance for the evolution of complex adaptation-while not Lamarckian, or "directed" to increase fitness-is not random, but is instead the outcome of a complex and continually evolving biological process that combines information from multiple loci into one. This allows selection on a fleeting combination of interacting alleles at different loci to have a hereditary effect according to the combination's fitness. TESTING AND IMPLICATIONS OF THE HYPOTHESIS This proposed mechanism addresses the problem of how beneficial genetic interactions can evolve under selection, and also offers an intuitive explanation for the role of sex in evolution, which focuses on sex as the generator of genetic combinations. Importantly, it also implies that genetic variation that has appeared neutral through the lens of traditional theory can actually experience selection on interactions and thus has a much greater adaptive potential than previously considered. Empirical evidence for the proposed mechanism from both molecular evolution and evolution at the organismal level is discussed, and multiple predictions are offered by which it may be tested. REVIEWERS This article was reviewed by Nigel Goldenfeld (nominated by Eugene V. Koonin), Jürgen Brosius and W. Ford Doolittle.
Collapse
Affiliation(s)
- Adi Livnat
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA, 24061,
USA
| |
Collapse
|
45
|
State-of-the-Art Technologies to Interrogate Genetic/Genomic Components of Drug Response. CURRENT GENETIC MEDICINE REPORTS 2013. [DOI: 10.1007/s40142-013-0022-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
46
|
A novel framework for the identification and analysis of duplicons between human and chimpanzee. BIOMED RESEARCH INTERNATIONAL 2013; 2013:264532. [PMID: 23984331 PMCID: PMC3747353 DOI: 10.1155/2013/264532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Revised: 06/25/2013] [Accepted: 07/10/2013] [Indexed: 11/30/2022]
Abstract
Human and other primate genomes consist of many segmental
duplications (SDs) due to fixation of copy number variations (CNVs). Structure of these duplications within the human genome has been shown to be a complex mosaic composed of juxtaposed subunits (called duplicons). These duplicons are difficult to be uncovered from the mosaic repeat structure. In addition, the distribution and evolution of duplicons among primates are still poorly investigated. In this paper, we develop a statistical framework for discovering duplicons via integration of a Hidden Markov Model (HMM) and a permutation test. Our comparative analysis indicates that the mosaic structure of duplicons is common in CNV/SD regions of both human and chimpanzee genomes, and a subset of core duplicons is shared by the majority of CNVs/SDs. Phylogenetic analyses using duplicons suggested that most CNVs/SDs share common duplication ancestry. Many human/chimpanzee duplicons flank both ends of CNVs, which may be hotspots of nonallelic homologous recombination.
Collapse
|
47
|
Zhang W, Edwards A, Fan W, Fang Z, Deininger P, Zhang K. Inferring the expression variability of human transposable element-derived exons by linear model analysis of deep RNA sequencing data. BMC Genomics 2013; 14:584. [PMID: 23984937 PMCID: PMC3765721 DOI: 10.1186/1471-2164-14-584] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2013] [Accepted: 08/13/2013] [Indexed: 12/14/2022] Open
Abstract
Background The exonization of transposable elements (TEs) has proven to be a significant mechanism for the creation of novel exons. Existing knowledge of the retention patterns of TE exons in mRNAs were mainly established by the analysis of Expressed Sequence Tag (EST) data and microarray data. Results This study seeks to validate and extend previous studies on the expression of TE exons by an integrative statistical analysis of high throughput RNA sequencing data. We collected 26 RNA-seq datasets spanning multiple tissues and cancer types. The exon-level digital expressions (indicating retention rates in mRNAs) were quantified by a double normalized measure, called the rescaled RPKM (Reads Per Kilobase of exon model per Million mapped reads). We analyzed the distribution profiles and the variability (across samples and between tissue/disease groups) of TE exon expressions, and compared them with those of other constitutive or cassette exons. We inferred the effects of four genomic factors, including the location, length, cognate TE family and TE nucleotide proportion (RTE, see Methods section) of a TE exon, on the exons’ expression level and expression variability. We also investigated the biological implications of an assembly of highly-expressed TE exons. Conclusion Our analysis confirmed prior studies from the following four aspects. First, with relatively high expression variability, most TE exons in mRNAs, especially those without exact counterparts in the UCSC RefSeq (Reference Sequence) gene tables, demonstrate low but still detectable expression levels in most tissue samples. Second, the TE exons in coding DNA sequences (CDSs) are less highly expressed than those in 3′ (5′) untranslated regions (UTRs). Third, the exons derived from chronologically ancient repeat elements, such as MIRs, tend to be highly expressed in comparison with those derived from younger TEs. Fourth, the previously observed negative relationship between the lengths of exons and the inclusion levels in transcripts is also true for exonized TEs. Furthermore, our study resulted in several novel findings. They include: (1) for the TE exons with non-zero expression and as shown in most of the studied biological samples, a high TE nucleotide proportion leads to their lower retention rates in mRNAs; (2) the considered genomic features (i.e. a continuous variable such as the exon length or a category indicator such as 3′UTR) influence the expression level and the expression variability (CV) of TE exons in an inverse manner; (3) not only the exons derived from Alu elements but also the exons from the TEs of other families were preferentially established in zinc finger (ZNF) genes.
Collapse
Affiliation(s)
- Wensheng Zhang
- Department of Computer Science, Xavier University of Louisiana, 1 Drexel Drive, New Orleans, LA 70125, USA.
| | | | | | | | | | | |
Collapse
|
48
|
Zhang Y, Haraksingh R, Grubert F, Abyzov A, Gerstein M, Weissman S, Urban AE. Child development and structural variation in the human genome. Child Dev 2013; 84:34-48. [PMID: 23311762 DOI: 10.1111/cdev.12051] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects of structural variation on normal child development, but such effects could be of considerable significance. This review provides an overview of the phenomenon of structural variation in the human genome sequence, describing the novel genomics technologies that are revolutionizing the way structural variation is studied and giving examples of genomic structural variations that affect child development.
Collapse
|
49
|
Ghani M, Sato C, Rogaeva E. Segmental duplications in genome-wide significant loci and housekeeping genes; warning for GAPDH and ACTB. Neurobiol Aging 2012; 34:1710.e1-4. [PMID: 23238109 DOI: 10.1016/j.neurobiolaging.2012.11.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2012] [Revised: 11/14/2012] [Accepted: 11/16/2012] [Indexed: 01/18/2023]
Abstract
Normalizing quantitative polymerase chain reaction (qPCR) data to a housekeeping gene is a critical step in qPCR analyses. Our bioinformatics analysis of 1978 housekeeping genes revealed that 348 of them, including GAPDH and ACTB, are not reliable normalizers for qPCR validation of genomic copy number variants because they overlap highly homologous segmental duplications. For RNA-based qPCR, it is also critical to ensure that the cDNA is not contaminated with genomic DNA if GAPDH or ACTB is used as an endogenous control. Furthermore, we observed that 138 significant single nucleotide polymorphisms (SNPs) reported in 134 published genome-wide association studies (GWAS) (out of 1093 GWAS) are mapped to regions affected by segmental duplications. This observation is important, because these SNPs could potentially tag copy number variations that might explain the GWAS signal. However, it is essential to ensure that the association between disease and such a SNP is not a false positive finding (due to incorrect genotype calls) or the result of an association with another homologous genomic region.
Collapse
Affiliation(s)
- Mahdi Ghani
- Tanz Centre for Research in Neurodegenerative Diseases, University of Toronto, Toronto, Ontario, Canada
| | | | | |
Collapse
|
50
|
Zichner T, Garfield DA, Rausch T, Stütz AM, Cannavó E, Braun M, Furlong EEM, Korbel JO. Impact of genomic structural variation in Drosophila melanogaster based on population-scale sequencing. Genome Res 2012; 23:568-79. [PMID: 23222910 PMCID: PMC3589545 DOI: 10.1101/gr.142646.112] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Genomic structural variation (SV) is a major determinant for phenotypic variation. Although it has been extensively studied in humans, the nucleotide resolution structure of SVs within the widely used model organism Drosophila remains unknown. We report a highly accurate, densely validated map of unbalanced SVs comprising 8962 deletions and 916 tandem duplications in 39 lines derived from short-read DNA sequencing in a natural population (the “Drosophila melanogaster Genetic Reference Panel,” DGRP). Most SVs (>90%) were inferred at nucleotide resolution, and a large fraction was genotyped across all samples. Comprehensive analyses of SV formation mechanisms using the short-read data revealed an abundance of SVs formed by mobile element and nonhomologous end-joining-mediated rearrangements, and clustering of variants into SV hotspots. We further observed a strong depletion of SVs overlapping genes, which, along with population genetics analyses, suggests that these SVs are often deleterious. We inferred several gene fusion events also highlighting the potential role of SVs in the generation of novel protein products. Expression quantitative trait locus (eQTL) mapping revealed the functional impact of our high-resolution SV map, with quantifiable effects at >100 genic loci. Our map represents a resource for population-level studies of SVs in an important model organism.
Collapse
Affiliation(s)
- Thomas Zichner
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | | | | | | | | | | | | | | |
Collapse
|