1
|
Belyayev A, de la Peña BQ, Corrales SV, Ling Low S, Frejová B, Sejfová Z, Josefiová J, Záveská E, Bertrand YJK, Chrtek J, Mráz P. Analysis of pericentromere composition and structure elucidated the history of the Hieracium alpinum L. genome, revealing waves of transposable elements insertions. Mob DNA 2024; 15:26. [PMID: 39548580 PMCID: PMC11566620 DOI: 10.1186/s13100-024-00336-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Accepted: 11/03/2024] [Indexed: 11/18/2024] Open
Abstract
BACKGROUND The centromere is one of the key regions of the eukaryotic chromosome. While maintaining its function, centromeric DNA may differ among closely related species. Here, we explored the composition and structure of the pericentromeres (a chromosomal region including a functional centromere) of Hieracium alpinum (Asteraceae), a member of one of the most diverse genera in the plant kingdom. Previously, we identified a pericentromere-specific tandem repeat that made it possible to distinguish reads within the Oxford Nanopore library attributed to the pericentromeres, separating them into a discrete subset and allowing comparison of the repeatome composition of this subset with the remaining genome. RESULTS We found that the main satellite DNA (satDNA) monomer forms long arrays of linear and block types in the pericentromeric heterochromatin of H. alpinum, and very often, single reads contain forward and reverse arrays and mirror each other. Beside the major, two new minor satDNA families were discovered. In addition to satDNAs, high amounts of LTR retrotransposons (TEs) with dominant of Tekay lineage, were detected in the pericentromeres. We were able to reconstruct four main TEs of the Ty3-gypsy and Ty1-copia superfamilies and compare their relative positions with satDNAs. The latter showed that the conserved domains (CDs) of the TE proteins are located between the newly discovered satDNAs, which appear to be parts of ancient Tekay LTRs that we were able to reconstruct. The dominant satDNA monomer shows a certain similarity to the GAG CD of the Angela retrotransposon. CONCLUSIONS The species-specific pericentromeric arrays of the H. alpinum genome are heterogeneous, exhibiting both linear and block type structures. High amounts of forward and reverse arrays of the main satDNA monomer point to multiple microinversions that could be the main mechanism for rapid structural evolution stochastically creating the uniqueness of an individual pericentromeric structure. The traces of TEs insertion waves remain in pericentromeres for a long time, thus "keeping memories" of past genomic events. We counted at least four waves of TEs insertions. In pericentromeres, TEs particles can be transformed into satDNA, which constitutes a background pool of minor families that, under certain conditions, can replace the dominant one(s).
Collapse
Affiliation(s)
- Alexander Belyayev
- Czech Academy of Sciences, Institute of Botany, Zámek 1, CZ-252 43, Průhonice, Czech Republic.
| | - Begoña Quirós de la Peña
- Czech Academy of Sciences, Institute of Botany, Zámek 1, CZ-252 43, Průhonice, Czech Republic
- Herbarium and Department of Botany, Charles University, Benátská 2, CZ-12801, Prague, Czech Republic
| | | | - Shook Ling Low
- Czech Academy of Sciences, Institute of Botany, Zámek 1, CZ-252 43, Průhonice, Czech Republic
| | - Barbora Frejová
- Czech Academy of Sciences, Institute of Botany, Zámek 1, CZ-252 43, Průhonice, Czech Republic
| | - Zuzana Sejfová
- Czech Academy of Sciences, Institute of Botany, Zámek 1, CZ-252 43, Průhonice, Czech Republic
| | - Jiřina Josefiová
- Czech Academy of Sciences, Institute of Botany, Zámek 1, CZ-252 43, Průhonice, Czech Republic
| | - Eliška Záveská
- Czech Academy of Sciences, Institute of Botany, Zámek 1, CZ-252 43, Průhonice, Czech Republic
| | - Yann J K Bertrand
- Czech Academy of Sciences, Institute of Botany, Zámek 1, CZ-252 43, Průhonice, Czech Republic
| | - Jindřich Chrtek
- Czech Academy of Sciences, Institute of Botany, Zámek 1, CZ-252 43, Průhonice, Czech Republic
- Herbarium and Department of Botany, Charles University, Benátská 2, CZ-12801, Prague, Czech Republic
| | - Patrik Mráz
- Herbarium and Department of Botany, Charles University, Benátská 2, CZ-12801, Prague, Czech Republic
| |
Collapse
|
2
|
Redelings BD, Holmes I, Lunter G, Pupko T, Anisimova M. Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications. Mol Biol Evol 2024; 41:msae177. [PMID: 39172750 PMCID: PMC11385596 DOI: 10.1093/molbev/msae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/02/2024] [Accepted: 07/09/2024] [Indexed: 08/24/2024] Open
Abstract
Insertions and deletions constitute the second most important source of natural genomic variation. Insertions and deletions make up to 25% of genomic variants in humans and are involved in complex evolutionary processes including genomic rearrangements, adaptation, and speciation. Recent advances in long-read sequencing technologies allow detailed inference of insertions and deletion variation in species and populations. Yet, despite their importance, evolutionary studies have traditionally ignored or mishandled insertions and deletions due to a lack of comprehensive methodologies and statistical models of insertions and deletion dynamics. Here, we discuss methods for describing insertions and deletion variation and modeling insertions and deletions over evolutionary time. We provide practical advice for tackling insertions and deletions in genomic sequences and illustrate our discussion with examples of insertions and deletion-induced effects in human and other natural populations and their contribution to evolutionary processes. We outline promising directions for future developments in statistical methodologies that would allow researchers to analyze insertions and deletion variation and their effects in large genomic data sets and to incorporate insertions and deletions in evolutionary inference.
Collapse
Affiliation(s)
| | - Ian Holmes
- Department of Bioengineering, University of California, Berkeley, CA 94720, USA
- Calico Life Sciences LLC, South San Francisco, CA 94080, USA
| | - Gerton Lunter
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9713 GZ, The Netherlands
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Maria Anisimova
- Institute of Computational Life Sciences, Zurich University of Applied Sciences, Wädenswil, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
3
|
Hanlon VCT, Lansdorp PM, Guryev V. A survey of current methods to detect and genotype inversions. Hum Mutat 2022; 43:1576-1589. [PMID: 36047337 DOI: 10.1002/humu.24458] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 08/26/2022] [Accepted: 08/29/2022] [Indexed: 11/11/2022]
Abstract
Polymorphic inversions are ubiquitous in humans, and they have been linked to both adaptation and disease. Following their discovery in Drosophila more than a century ago, inversions have proved to be more elusive than other structural variants. A wide variety of methods for the detection and genotyping of inversions have recently been developed: multiple techniques based on selective amplification by PCR, short- and long-read sequencing approaches, principal component analysis of small variant haplotypes, template strand sequencing, optical mapping, and various genome assembly methods. Many methods apply complex wet lab protocols or increasingly refined bioinformatic analyses. This review is an attempt to provide a practical summary and comparison of the methods that are in current use, with a focus on metrics such as the maximum size of segmental duplications at inversion breakpoints that each method can tolerate, the size range of inversions that they recover, their throughput, and whether the locations of putative inversions must be known beforehand. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
| | - Peter M Lansdorp
- Terry Fox Laboratory, BC Cancer Agency, Vancouver, BC, V5Z 1L3, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Center Groningen, 9713 AV, Groningen, The Netherlands
| |
Collapse
|
4
|
Gu L, Hou Y, Wang G, Liu Q, Ding W, Weng Q. Characterization of the chloroplast genome of Lonicera ruprechtiana Regel and comparison with other selected species of Caprifoliaceae. PLoS One 2022; 17:e0262813. [PMID: 35077482 PMCID: PMC8789150 DOI: 10.1371/journal.pone.0262813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 01/05/2022] [Indexed: 11/19/2022] Open
Abstract
Lonicera ruprechtiana Regel is widely used as a greening tree in China and also displays excellent pharmacological activities. The phylogenetic relationship between L. ruprechtiana and other members of Caprifoliaceae remains unclear. In this study, the complete cp genome of L. ruprechtiana was identified using high-throughput Illumina pair-end sequencing data. The circular cp genome was 154,611 bp long and has a large single-copy region of 88,182 bp and a small single-copy region of 18,713 bp, with the two parts separated by two inverted repeat (IR) regions (23,858 bp each). A total of 131 genes were annotated, including 8 ribosomal RNAs, 39 transfer RNAs, and 84 protein-coding genes (PCGs). In addition, 49 repeat sequences and 55 simple sequence repeat loci of 18 types were also detected. Codon usage analysis demonstrated that the Leu codon is preferential for the A/U ending. Maximum-likelihood phylogenetic analysis using 22 Caprifoliaceae species revealed that L. ruprechtiana was closely related to Lonicera insularis. Comparison of IR regions revealed that the cp genome of L. ruprechtiana was largely conserved with that of congeneric species. Moreover, synonymous (Ks) and non-synonymous (Ka) substitution rate analysis showed that most genes were under purifying selection pressure; ycf3, and some genes associated with subunits of NADH dehydrogenase, subunits of the cytochrome b/f complex, and subunits of the photosystem had been subjected to strong purifying selection pressure (Ka/Ks < 0.1). This study provides useful genetic information for future study of L. ruprechtiana evolution.
Collapse
Affiliation(s)
- Lei Gu
- School of Life Sciences, Guizhou Normal University, Guiyang, China
| | - Yunyan Hou
- School of Life Sciences, Guizhou Normal University, Guiyang, China
| | - Guangyi Wang
- School of Life Sciences, Guizhou Normal University, Guiyang, China
| | - Qiuping Liu
- School of Life Sciences, Guizhou Normal University, Guiyang, China
| | - Wei Ding
- Colleage of plant protection, Southwest University, Chongqing, China
| | - Qingbei Weng
- School of Life Sciences, Guizhou Normal University, Guiyang, China
- Qiannan Normal University for Nationalities, Duyun, China
- * E-mail:
| |
Collapse
|
5
|
Potapova NA, Kondrashov AS, Mirkin SM. Characteristics and possible mechanisms of formation of microinversions distinguishing human and chimpanzee genomes. Sci Rep 2022; 12:591. [PMID: 35022450 PMCID: PMC8755829 DOI: 10.1038/s41598-021-04621-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 12/28/2021] [Indexed: 12/02/2022] Open
Abstract
Genomic inversions come in various sizes. While long inversions are relatively easy to identify by aligning high-quality genome sequences, unambiguous identification of microinversions is more problematic. Here, using a set of extra stringent criteria to distinguish microinversions from other mutational events, we describe microinversions that occurred after the divergence of humans and chimpanzees. In total, we found 59 definite microinversions that range from 17 to 33 nucleotides in length. In majority of them, human genome sequences matched exactly the reverse-complemented chimpanzee genome sequences, implying that the inverted DNA segment was copied precisely. All these microinversions were flanked by perfect or nearly perfect inverted repeats pointing to their key role in their formation. Template switching at inverted repeats during DNA replication was previously discussed as a possible mechanism for the microinversion formation. However, many of definite microinversions found by us cannot be easily explained via template switching owing to the combination of the short length and imperfect nature of their flanking inverted repeats. We propose a novel, alternative mechanism that involves repair of a double-stranded break within the inverting segment via microhomology-mediated break-induced replication, which can consistently explain all definite microinversion events.
Collapse
Affiliation(s)
- Nadezhda A Potapova
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia, 127051.
| | - Alexey S Kondrashov
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Sergei M Mirkin
- Department of Biology, Tufts University, Medford, MA, 02155, USA.
| |
Collapse
|
6
|
Qu L, Wang L, He F, Han Y, Yang L, Wang MD, Zhu H. The Landscape of Micro-Inversions Provide Clues for Population Genetic Analysis of Humans. Interdiscip Sci 2020; 12:499-514. [PMID: 32929667 PMCID: PMC7658078 DOI: 10.1007/s12539-020-00392-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 09/02/2020] [Accepted: 09/03/2020] [Indexed: 11/04/2022]
Abstract
BACKGROUND Variations in the human genome have been studied extensively. However, little is known about the role of micro-inversions (MIs), generally defined as small (< 100 bp) inversions, in human evolution, diversity, and health. Depicting the pattern of MIs among diverse populations is critical for interpreting human evolutionary history and obtaining insight into genetic diseases. RESULTS In this paper, we explored the distribution of MIs in genomes from 26 human populations and 7 nonhuman primate genomes and analyzed the phylogenetic structure of the 26 human populations based on the MIs. We further investigated the functions of the MIs located within genes associated with human health. With hg19 as the reference genome, we detected 6968 MIs among the 1937 human samples and 24,476 MIs among the 7 nonhuman primate genomes. The analyses of MIs in human genomes showed that the MIs were rarely located in exonic regions. Nonhuman primates and human populations shared only 82 inverted alleles, and Africans had the most inverted alleles in common with nonhuman primates, which was consistent with the "Out of Africa" hypothesis. The clustering of MIs among the human populations also coincided with human migration history and ancestral lineages. CONCLUSIONS We propose that MIs are potential evolutionary markers for investigating population dynamics. Our results revealed the diversity of MIs in human populations and showed that they are essential to construct human population relationships and have a potential effect on human health.
Collapse
Affiliation(s)
- Li Qu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech and Emory University, Atlanta, GA, 30332, USA
| | - Luotong Wang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Feifei He
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Yilun Han
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - Longshu Yang
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China
- Center for Quantitative Biology, Peking University, Beijing, 100871, China
| | - May D Wang
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech and Emory University, Atlanta, GA, 30332, USA
| | - Huaiqiu Zhu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech and Emory University, Atlanta, GA, 30332, USA.
- Center for Quantitative Biology, Peking University, Beijing, 100871, China.
| |
Collapse
|
7
|
Frith MC, Khan S. A survey of localized sequence rearrangements in human DNA. Nucleic Acids Res 2019; 46:1661-1673. [PMID: 29272440 PMCID: PMC5829575 DOI: 10.1093/nar/gkx1266] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 12/07/2017] [Indexed: 01/29/2023] Open
Abstract
Genomes mutate and evolve in ways simple (substitution or deletion of bases) and complex (e.g. chromosome shattering). We do not fully understand what types of complex mutation occur, and we cannot routinely characterize arbitrarily-complex mutations in a high-throughput, genome-wide manner. Long-read DNA sequencing methods (e.g. PacBio, nanopore) are promising for this task, because one read may encompass a whole complex mutation. We describe an analysis pipeline to characterize arbitrarily-complex 'local' mutations, i.e. intrachromosomal mutations encompassed by one DNA read. We apply it to nanopore and PacBio reads from one human cell line (NA12878), and survey sequence rearrangements, both real and artifactual. Almost all the real rearrangements belong to recurring patterns or motifs: the most common is tandem multiplication (e.g. heptuplication), but there are also complex patterns such as localized shattering, which resembles DNA damage by radiation. Gene conversions are identified, including one between hemoglobin gamma genes. This study demonstrates a way to find intricate rearrangements with any number of duplications, deletions, and repositionings. It demonstrates a probability-based method to resolve ambiguous rearrangements involving highly similar sequences, as occurs in gene conversion. We present a catalog of local rearrangements in one human cell line, and show which rearrangement patterns occur.
Collapse
Affiliation(s)
- Martin C Frith
- Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan.,Graduate School of Frontier Sciences, University of Tokyo, Chiba 277-8562, Japan.,Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo 169-8555, Japan
| | - Sofia Khan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), AIST, Tokyo 169-8555, Japan
| |
Collapse
|
8
|
Qu L, Zhu H, Wang M. Micro-Inversions In Human Cancer Genomes. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2018; 2018:1323-1326. [PMID: 30440635 DOI: 10.1109/embc.2018.8512514] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
During the past few years, although scientists and researchers have studied variations in the cancer genomes, our current knowledge about the function of Micro-inversions (MIs) in cancer are still limited. MIs are generally defined as small inversions in DNA segments shorter than 100 bp. To expand our knowledge of their roles in cancer, we analyzed the MIs of 209 samples from four types of cancer, including hepatocellular carcinoma, lung cancer, pancreatic cancer, and bladder cancer. Within all the 209 samples, we identified 2,925 MIs, of which 1,519 (51.93%) are in gene regions. Of the 1,519 MIs in the gene regions, 106 (6.98%) are in the exon regions. We also analyzed 209 healthy samples as the control samples. We further analyzed the distribution of MIs in the four types of cancer among 24 chromosomes. Besides the chromosome preference, different cancers also have different preference for various genes. The MIs preference for different genes among four types of cancer may provide a guidance for the treatment and diagnosis on the four types of cancer. Medical doctors should concentrate more on chromosomes and the genes that MIs prefer to locate on. We also calculated the average count of MIs per individual among each cancer. From this result, we found that the bladder cancer has the most average count of MIs per individual, which means MIs may be more likely to exist in bladder cancer. According to our analysis, MIs play an important role in cancer and should be considered for further analysis.
Collapse
|
9
|
Lucas JMEX, Roest Crollius H. High precision detection of conserved segments from synteny blocks. PLoS One 2017; 12:e0180198. [PMID: 28671949 PMCID: PMC5495381 DOI: 10.1371/journal.pone.0180198] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 06/12/2017] [Indexed: 11/19/2022] Open
Abstract
A conserved segment, i.e. a segment of chromosome unbroken during evolution, is an important operational concept in comparative genomics. Until now, algorithms that are designed to identify conserved segments often return synteny blocks that overlap, synteny blocks that include micro-rearrangements or synteny blocks erroneously short. Here we present definitions of conserved segments and synteny blocks independent of any heuristic method and we describe four new post-processing strategies to refine synteny blocks into accurate conserved segments. The first strategy identifies micro-rearrangements, the second strategy identifies mono-genic conserved segments, the third returns non-overlapping segments and the fourth repairs incorrect ruptures of synteny. All these refinements are implemented in a new version of PhylDiag that has been benchmarked against i-ADHoRe 3.0 and Cyntenator, based on a realistic simulated evolution and true simulated conserved segments.
Collapse
Affiliation(s)
- Joseph MEX Lucas
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research, University, Paris, France
| | - Hugues Roest Crollius
- IBENS, Département de Biologie, Ecole Normale Supérieure, CNRS, Inserm, PSL Research, University, Paris, France
| |
Collapse
|
10
|
Gordon D, Huddleston J, Chaisson MJP, Hill CM, Kronenberg ZN, Munson KM, Malig M, Raja A, Fiddes I, Hillier LW, Dunn C, Baker C, Armstrong J, Diekhans M, Paten B, Shendure J, Wilson RK, Haussler D, Chin CS, Eichler EE. Long-read sequence assembly of the gorilla genome. Science 2016; 352:aae0344. [PMID: 27034376 PMCID: PMC4920363 DOI: 10.1126/science.aae0344] [Citation(s) in RCA: 235] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2015] [Accepted: 02/26/2016] [Indexed: 12/24/2022]
Abstract
Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome.
Collapse
Affiliation(s)
- David Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - John Huddleston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Mark J P Chaisson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Christopher M Hill
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Zev N Kronenberg
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Maika Malig
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Archana Raja
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Ian Fiddes
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - LaDeana W Hillier
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Joel Armstrong
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Jay Shendure
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Richard K Wilson
- McDonnell Genome Institute, Department of Medicine, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63108, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz and Howard Hughes Medical Institute, Santa Cruz, CA 95064, USA
| | - Chen-Shan Chin
- Pacific Biosciences of California, Menlo Park, CA 94025, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA 98195, USA. Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
11
|
He F, Li Y, Tang YH, Ma J, Zhu H. Identifying micro-inversions using high-throughput sequencing reads. BMC Genomics 2016; 17 Suppl 1:4. [PMID: 26818118 PMCID: PMC4895285 DOI: 10.1186/s12864-015-2305-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The identification of inversions of DNA segments shorter than read length (e.g., 100 bp), defined as micro-inversions (MIs), remains challenging for next-generation sequencing reads. It is acknowledged that MIs are important genomic variation and may play roles in causing genetic disease. However, current alignment methods are generally insensitive to detect MIs. Here we develop a novel tool, MID (Micro-Inversion Detector), to identify MIs in human genomes using next-generation sequencing reads. RESULTS The algorithm of MID is designed based on a dynamic programming path-finding approach. What makes MID different from other variant detection tools is that MID can handle small MIs and multiple breakpoints within an unmapped read. Moreover, MID improves reliability in low coverage data by integrating multiple samples. Our evaluation demonstrated that MID outperforms Gustaf, which can currently detect inversions from 30 bp to 500 bp. CONCLUSIONS To our knowledge, MID is the first method that can efficiently and reliably identify MIs from unmapped short next-generation sequencing reads. MID is reliable on low coverage data, which is suitable for large-scale projects such as the 1000 Genomes Project (1KGP). MID identified previously unknown MIs from the 1KGP that overlap with genes and regulatory elements in the human genome. We also identified MIs in cancer cell lines from Cancer Cell Line Encyclopedia (CCLE). Therefore our tool is expected to be useful to improve the study of MIs as a type of genetic variant in the human genome. The source code can be downloaded from: http://cqb.pku.edu.cn/ZhuLab/MID .
Collapse
Affiliation(s)
- Feifei He
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, and Center for Quantitative Biology, Peking University, Beijing, 100871, China.
| | - Yang Li
- Department of Bioengineering, University of Illinois, Urbana, IL, 61801, USA.
| | - Yu-Hang Tang
- Division of Applied Mathematics, Brown University, Providence, RI, 02912, USA.
| | - Jian Ma
- Department of Bioengineering, University of Illinois, Urbana, IL, 61801, USA. .,Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL, 61801, USA.
| | - Huaiqiu Zhu
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, and Center for Quantitative Biology, Peking University, Beijing, 100871, China.
| |
Collapse
|
12
|
Dobigny G, Britton-Davidian J, Robinson TJ. Chromosomal polymorphism in mammals: an evolutionary perspective. Biol Rev Camb Philos Soc 2015; 92:1-21. [PMID: 26234165 DOI: 10.1111/brv.12213] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Revised: 06/23/2015] [Accepted: 07/09/2015] [Indexed: 12/28/2022]
Abstract
Although chromosome rearrangements (CRs) are central to studies of genome evolution, our understanding of the evolutionary consequences of the early stages of karyotypic differentiation (i.e. polymorphism), especially the non-meiotic impacts, is surprisingly limited. We review the available data on chromosomal polymorphisms in mammals so as to identify taxa that hold promise for developing a more comprehensive understanding of chromosomal change. In doing so, we address several key questions: (i) to what extent are mammalian karyotypes polymorphic, and what types of rearrangements are principally involved? (ii) Are some mammalian lineages more prone to chromosomal polymorphism than others? More specifically, do (karyotypically) polymorphic mammalian species belong to lineages that are also characterized by past, extensive karyotype repatterning? (iii) How long can chromosomal polymorphisms persist in mammals? We discuss the evolutionary implications of these questions and propose several research avenues that may shed light on the role of chromosome change in the diversification of mammalian populations and species.
Collapse
Affiliation(s)
- Gauthier Dobigny
- Institut de Recherche pour le Développement, Centre de Biologie pour la Gestion des Populations (UMR IRD-INRA-Cirad-Montpellier SupAgro), Campus International de Baillarguet, CS30016, 34988, Montferrier-sur-Lez, France
| | - Janice Britton-Davidian
- Institut des Sciences de l'Evolution, Université de Montpellier, CNRS, IRD, EPHE, Cc065, Place Eugène Bataillon, 34095, Montpellier Cedex 5, France
| | - Terence J Robinson
- Evolutionary Genomics Group, Department of Botany and Zoology, Stellenbosch University, Private Bag X1, Matieland, Stellenbosch, 7062, South Africa
| |
Collapse
|
13
|
Frith MC, Kawaguchi R. Split-alignment of genomes finds orthologies more accurately. Genome Biol 2015; 16:106. [PMID: 25994148 PMCID: PMC4464727 DOI: 10.1186/s13059-015-0670-9] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Accepted: 05/08/2015] [Indexed: 04/29/2023] Open
Abstract
We present a new pair-wise genome alignment method, based on a simple concept of finding an optimal set of local alignments. It gains accuracy by not masking repeats, and by using a statistical model to quantify the (un)ambiguity of each alignment part. Compared to previous animal genome alignments, it aligns thousands of locations differently and with much higher similarity, strongly suggesting that the previous alignments are non-orthologous. The previous methods suffer from an overly-strong assumption of long un-rearranged blocks. The new alignments should help find interesting and unusual features, such as fast-evolving elements and micro-rearrangements, which are confounded by alignment errors.
Collapse
Affiliation(s)
- Martin C Frith
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
| | - Risa Kawaguchi
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan. .,Department of Computational Biology, Faculty of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan.
| |
Collapse
|
14
|
Jiang X, Peery A, Hall AB, Sharma A, Chen XG, Waterhouse RM, Komissarov A, Riehle MM, Shouche Y, Sharakhova MV, Lawson D, Pakpour N, Arensburger P, Davidson VLM, Eiglmeier K, Emrich S, George P, Kennedy RC, Mane SP, Maslen G, Oringanje C, Qi Y, Settlage R, Tojo M, Tubio JMC, Unger MF, Wang B, Vernick KD, Ribeiro JMC, James AA, Michel K, Riehle MA, Luckhart S, Sharakhov IV, Tu Z. Genome analysis of a major urban malaria vector mosquito, Anopheles stephensi. Genome Biol 2014; 15:459. [PMID: 25244985 PMCID: PMC4195908 DOI: 10.1186/s13059-014-0459-2] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2014] [Accepted: 09/03/2014] [Indexed: 12/24/2022] Open
Abstract
Background Anopheles stephensi is the key vector of malaria throughout the Indian subcontinent and Middle East and an emerging model for molecular and genetic studies of mosquito-parasite interactions. The type form of the species is responsible for the majority of urban malaria transmission across its range. Results Here, we report the genome sequence and annotation of the Indian strain of the type form of An. stephensi. The 221 Mb genome assembly represents more than 92% of the entire genome and was produced using a combination of 454, Illumina, and PacBio sequencing. Physical mapping assigned 62% of the genome onto chromosomes, enabling chromosome-based analysis. Comparisons between An. stephensi and An. gambiae reveal that the rate of gene order reshuffling on the X chromosome was three times higher than that on the autosomes. An. stephensi has more heterochromatin in pericentric regions but less repetitive DNA in chromosome arms than An. gambiae. We also identify a number of Y-chromosome contigs and BACs. Interspersed repeats constitute 7.1% of the assembled genome while LTR retrotransposons alone comprise more than 49% of the Y contigs. RNA-seq analyses provide new insights into mosquito innate immunity, development, and sexual dimorphism. Conclusions The genome analysis described in this manuscript provides a resource and platform for fundamental and translational research into a major urban malaria vector. Chromosome-based investigations provide unique perspectives on Anopheles chromosome evolution. RNA-seq analysis and studies of immunity genes offer new insights into mosquito biology and mosquito-parasite interactions. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0459-2) contains supplementary material, which is available to authorized users.
Collapse
|
15
|
Cáceres A, Sindi SS, Raphael BJ, Cáceres M, González JR. Identification of polymorphic inversions from genotypes. BMC Bioinformatics 2012; 13:28. [PMID: 22321652 PMCID: PMC3296650 DOI: 10.1186/1471-2105-13-28] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2011] [Accepted: 02/09/2012] [Indexed: 01/19/2023] Open
Abstract
Background Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies. Results We present a novel method to both identify polymorphic inversions from genome-wide genotype data and classify individuals as containing a normal or inverted allele. Our method, a generalization of a published method for haplotype data [1], utilizes linkage between groups of SNPs to partition a set of individuals into normal and inverted subpopulations. We employ a sliding window scan to identify regions likely to have an inversion, and accumulation of evidence from neighboring SNPs is used to accurately determine the inversion status of each subject. Further, our approach detects inversions directly from genotype data, thus increasing its usability to current genome-wide association studies (GWAS). Conclusions We demonstrate the accuracy of our method to detect inversions and classify individuals on principled-simulated genotypes, produced by the evolution of an inversion event within a coalescent model [2]. We applied our method to real genotype data from HapMap Phase III to characterize the inversion status of two known inversions within the regions 17q21 and 8p23 across 1184 individuals. Finally, we scan the full genomes of the European Origin (CEU) and Yoruba (YRI) HapMap samples. We find population-based evidence for 9 out of 15 well-established autosomic inversions, and for 52 regions previously predicted by independent experimental methods in ten (9+1) individuals [3,4]. We provide efficient implementations of both genotype and haplotype methods as a unified R package inveRsion.
Collapse
Affiliation(s)
- Alejandro Cáceres
- Center for Research in Environmental Epidemiology, and Institut Municipal d'Investigació Mèdica, Barcelona 08003, Spain.
| | | | | | | | | |
Collapse
|
16
|
Hara Y, Imanishi T. Abundance of ultramicro inversions within local alignments between human and chimpanzee genomes. BMC Evol Biol 2011; 11:308. [PMID: 22011259 PMCID: PMC3227671 DOI: 10.1186/1471-2148-11-308] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2011] [Accepted: 10/19/2011] [Indexed: 11/18/2022] Open
Abstract
Background Chromosomal inversion is one of the most important mechanisms of evolution. Recent studies of comparative genomics have revealed that chromosomal inversions are abundant in the human genome. While such previously characterized inversions are large enough to be identified as a single alignment or a string of local alignments, the impact of ultramicro inversions, which are such short that the local alignments completely cover them, on evolution is still uncertain. Results In this study, we developed a method for identifying ultramicro inversions by scanning of local alignments. This technique achieved a high sensitivity and a very low rate of false positives. We identified 2,377 ultramicro inversions ranging from five to 125 bp within the orthologous alignments between the human and chimpanzee genomes. The false positive rate was estimated to be around 4%. Based on phylogenetic profiles using the primate outgroups, 479 ultramicro inversions were inferred to have specifically inverted in the human lineage. Ultramicro inversions exclusively involving adenine and thymine were the most frequent; 461 inversions (19.4%) of the total. Furthermore, the density of ultramicro inversions in chromosome Y and the neighborhoods of transposable elements was higher than average. Sixty-five ultramicro inversions were identified within the exons of human protein-coding genes. Conclusions We defined ultramicro inversions as the inverted regions equal to or smaller than 125 bp buried within local alignments. Our observations suggest that ultramicro inversions are abundant among the human and chimpanzee genomes, and that location of the inversions correlated with the genome structural instability. Some of the ultramicro inversions may contribute to gene evolution. Our inversion-identification method is also applicable in the fine-tuning of genome alignments by distinguishing ultramicro inversions from nucleotide substitutions and indels.
Collapse
Affiliation(s)
- Yuichiro Hara
- Biomedicinal Information Research Center, National Institute of Advanced Industrial Science and Technology, Aomi 2-4-7, Koto-ku, Tokyo, Japan
| | | |
Collapse
|
17
|
Hou M, Yao P, Antonou A, Johns MA. Pico-inplace-inversions between human and chimpanzee. Bioinformatics 2011; 27:3266-75. [PMID: 21994225 DOI: 10.1093/bioinformatics/btr566] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION There have been several studies on the micro-inversions between human and chimpanzee, but there are large discrepancies among their results. Furthermore, all of them rely on alignment procedures or existing alignment results to identify inversions. However, the core alignment procedures do not take very small inversions into consideration. Therefore, their analyses cannot find inversions that are too small to be detected by a classic aligner. We call such inversions pico-inversions. RESULTS We re-analyzed human-chimpanzee alignment from the UCSC Genome Browser for micro-inplace-inversions and screened for pico-inplace-inversions using a likelihood ratio test. We report that the quantity of inplace-inversions between human and chimpanzee is substantially greater than what had previously been discovered. We also present the software tool PicoInversionMiner to detect pico-inplace-inversions between closely related species. AVAILABILITY Software tools, scripts and result data are available at http://faculty.cs.niu.edu/~hou/PicoInversion.html. CONTACT mhou@cs.niu.edu.
Collapse
Affiliation(s)
- Minmei Hou
- Department of Computer Science, Northern Illinois University, DeKalb, IL 60115, USA.
| | | | | | | |
Collapse
|
18
|
Braun EL, Kimball RT, Han KL, Iuhasz-Velez NR, Bonilla AJ, Chojnowski JL, Smith JV, Bowie RCK, Braun MJ, Hackett SJ, Harshman J, Huddleston CJ, Marks BD, Miglia KJ, Moore WS, Reddy S, Sheldon FH, Witt CC, Yuri T. Homoplastic microinversions and the avian tree of life. BMC Evol Biol 2011; 11:141. [PMID: 21612607 PMCID: PMC3123225 DOI: 10.1186/1471-2148-11-141] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Accepted: 05/25/2011] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Microinversions are cytologically undetectable inversions of DNA sequences that accumulate slowly in genomes. Like many other rare genomic changes (RGCs), microinversions are thought to be virtually homoplasy-free evolutionary characters, suggesting that they may be very useful for difficult phylogenetic problems such as the avian tree of life. However, few detailed surveys of these genomic rearrangements have been conducted, making it difficult to assess this hypothesis or understand the impact of microinversions upon genome evolution. RESULTS We surveyed non-coding sequence data from a recent avian phylogenetic study and found substantially more microinversions than expected based upon prior information about vertebrate inversion rates, although this is likely due to underestimation of these rates in previous studies. Most microinversions were lineage-specific or united well-accepted groups. However, some homoplastic microinversions were evident among the informative characters. Hemiplasy, which reflects differences between gene trees and the species tree, did not explain the observed homoplasy. Two specific loci were microinversion hotspots, with high numbers of inversions that included both the homoplastic as well as some overlapping microinversions. Neither stem-loop structures nor detectable sequence motifs were associated with microinversions in the hotspots. CONCLUSIONS Microinversions can provide valuable phylogenetic information, although power analysis indicates that large amounts of sequence data will be necessary to identify enough inversions (and similar RGCs) to resolve short branches in the tree of life. Moreover, microinversions are not perfect characters and should be interpreted with caution, just as with any other character type. Independent of their use for phylogenetic analyses, microinversions are important because they have the potential to complicate alignment of non-coding sequences. Despite their low rate of accumulation, they have clearly contributed to genome evolution, suggesting that active identification of microinversions will prove useful in future phylogenomic studies.
Collapse
Affiliation(s)
- Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Rebecca T Kimball
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Kin-Lan Han
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | | | - Amber J Bonilla
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Jena L Chojnowski
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Jordan V Smith
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Rauri CK Bowie
- Zoology Department, Field Museum of Natural History, 1400 S. Lakeshore Drive, Chicago, IL 60605, USA
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Michael J Braun
- Department of Vertebrate Zoology, Smithsonian Institution, 4210 Silver Hill Road, Suitland, MD 20746, USA
- Behavior, Ecology, Evolution, and Systematics Program, University of Maryland, College Park, MD 20742, USA
| | - Shannon J Hackett
- Zoology Department, Field Museum of Natural History, 1400 S. Lakeshore Drive, Chicago, IL 60605, USA
| | - John Harshman
- Zoology Department, Field Museum of Natural History, 1400 S. Lakeshore Drive, Chicago, IL 60605, USA
- 4869 Pepperwood Way, San Jose, CA 95124, USA
| | - Christopher J Huddleston
- Department of Vertebrate Zoology, Smithsonian Institution, 4210 Silver Hill Road, Suitland, MD 20746, USA
| | - Ben D Marks
- Museum of Natural Science and Department of Biological Sciences, 119 Foster Hall, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Kathleen J Miglia
- Department of Biological Sciences, Wayne State University, 5047 Gullen Mall, Detroit, MI 48202, USA
| | - William S Moore
- Department of Biological Sciences, Wayne State University, 5047 Gullen Mall, Detroit, MI 48202, USA
| | - Sushma Reddy
- Zoology Department, Field Museum of Natural History, 1400 S. Lakeshore Drive, Chicago, IL 60605, USA
- Biology Department, Loyola University Chicago, Chicago, IL 60626, USA
| | - Frederick H Sheldon
- Museum of Natural Science and Department of Biological Sciences, 119 Foster Hall, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Christopher C Witt
- Museum of Natural Science and Department of Biological Sciences, 119 Foster Hall, Louisiana State University, Baton Rouge, LA 70803, USA
- Department of Biology and Museum of Southwestern Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | - Tamaki Yuri
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
- Department of Vertebrate Zoology, Smithsonian Institution, 4210 Silver Hill Road, Suitland, MD 20746, USA
- Sam Noble Oklahoma Museum of Natural History, University of Oklahoma, Norman, OK 73072, USA
| |
Collapse
|
19
|
Sindi SS, Raphael BJ. Identification and frequency estimation of inversion polymorphisms from haplotype data. J Comput Biol 2010; 17:517-31. [PMID: 20377461 DOI: 10.1089/cmb.2009.0185] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Structural rearrangements, including copy-number alterations and inversions, are increasingly recognized as an important contributor to human genetic variation. Copy number variants are readily measured via array-based techniques like comparative genomic hybridization, but copy-neutral variants such as inversion polymorphisms remain difficult to identify without whole genome sequencing. We introduce a method to identify inversion polymorphisms and estimate their frequency in a population using readily available single nucleotide polymorphism (SNP) data. Our method uses a probabilistic model to describe a population as a mixture of forward and inverted chromosomes and identifies putative inversions by characteristic differences in haplotype frequencies around inversion breakpoints. On simulated data, our method accurately predicts inversions with frequencies as low as 25% in the population and reliably estimates inversion frequencies over a wide range. On the human HapMap Phase 2 data, we predict between 88 and 142 inversion polymorphisms with frequency ranging from 20 to 81 percent. Many of these correspond to known inversions or have other evidence supporting them, and the predicted inversion frequencies largely agree with the limited information presently available.
Collapse
Affiliation(s)
- Suzanne S Sindi
- Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912, USA
| | | |
Collapse
|
20
|
Cruciform-forming inverted repeats appear to have mediated many of the microinversions that distinguish the human and chimpanzee genomes. Chromosome Res 2009; 17:469-83. [PMID: 19475482 DOI: 10.1007/s10577-009-9039-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2009] [Revised: 04/08/2009] [Accepted: 04/08/2009] [Indexed: 10/20/2022]
Abstract
Submicroscopic inversions have contributed significantly to the genomic divergence between humans and chimpanzees over evolutionary time. Those microinversions which are flanked by segmental duplications (SDs) are presumed to have originated via non-allelic homologous recombination between SDs arranged in inverted orientation. However, the nature of the mechanisms underlying those inversions which are not flanked by SDs remains unclear. We have investigated 35 such inversions, ranging in size from 51-nt to 22056-nt, with the goal of characterizing the DNA sequences in the breakpoint-flanking regions. Using the macaque genome as an outgroup, we determined the lineage specificity of these inversions and noted that the majority (N = 31; 89%) were associated with deletions (of length between 1-nt and 6754-nt) immediately adjacent to one or both inversion breakpoints. Overrepresentations of both direct and inverted repeats, >or= 6-nt in length and capable of non-B DNA structure formation, were noted in the vicinity of breakpoint junctions suggesting that these repeats could have contributed to double strand breakage. Inverted repeats capable of cruciform structure formation were also found to be a common feature of the inversion breakpoint-flanking regions, consistent with these inversions having originated through the resolution of Holliday junction-like cruciforms. Sequences capable of non-B DNA structure formation have previously been implicated in promoting gross deletions and translocations causing human genetic disease. We conclude that non-B DNA forming sequences may also have promoted the occurrence of mutations in an evolutionary context, giving rise to at least some of the inversion/deletions which now serve to distinguish the human and chimpanzee genomes.
Collapse
|
21
|
Alekseyev MA, Pevzner PA. Breakpoint graphs and ancestral genome reconstructions. Genes Dev 2009; 19:943-57. [PMID: 19218533 PMCID: PMC2675983 DOI: 10.1101/gr.082784.108] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2008] [Accepted: 01/22/2009] [Indexed: 11/24/2022]
Abstract
Recently completed whole-genome sequencing projects marked the transition from gene-based phylogenetic studies to phylogenomics analysis of entire genomes. We developed an algorithm MGRA for reconstructing ancestral genomes and used it to study the rearrangement history of seven mammalian genomes: human, chimpanzee, macaque, mouse, rat, dog, and opossum. MGRA relies on the notion of the multiple breakpoint graphs to overcome some limitations of the existing approaches to ancestral genome reconstructions. MGRA also generates the rearrangement-based characters guiding the phylogenetic tree reconstruction when the phylogeny is unknown.
Collapse
Affiliation(s)
- Max A. Alekseyev
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0404, USA
| | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California at San Diego, La Jolla, California 92093-0404, USA
| |
Collapse
|
22
|
Robinson TJ, Ruiz-Herrera A. Defining the ancestral eutherian karyotype: a cladistic interpretation of chromosome painting and genome sequence assembly data. Chromosome Res 2008; 16:1133-41. [PMID: 19067196 DOI: 10.1007/s10577-008-1264-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2008] [Revised: 08/28/2008] [Accepted: 08/28/2008] [Indexed: 11/28/2022]
Abstract
A cladistic analysis of genome assemblies (syntenic associations) for eutherian mammals against two distant outgroup species--opossum and chicken--permitted a refinement of the 46-chromosome karyotype formerly inferred in the ancestral eutherian. We show that two intact chromosome pairs (corresponding to human chromosomes 13 and 18) and three conserved chromosome segments (10q, 19p and 8q in the human karyotype) are probably symplesiomorphic for Eutheria because they are also present as unaltered orthologues in one or both outgroups. Seven additional syntenies (4q/8p/4pq, 3p/21, 14/15, 10p/12pq/22qt, 19q/16q, 16p/7a and 12qt/22q), each involving human chromosomal segments that in various combinations correspond to complete chromosomes in the ancestral eutherian karyotype, are also present in one or both outgroup taxa and thus are probable symplesiomorphies for Eutheria. Interestingly, several of the symplesiomorphic characters identified in chicken and/or opossum are present in more distant outgroups such as pufferfish and zebrafish (for example 3p/21, 14/15, 19q/16q and 16p/7a), suggesting their retention since vertebrate common ancestry approximately 450 million years ago. However, eight intact pairs (corresponding to human chromosomes 1, 5, 6, 9, 11, 17, 20 and the X) and three chromosome segments (7b, 2p-q13 and 2q13-qter) are derived characters potentially consistent with eutherian monophyly. Our analyses clarify the distinction between shared-ancestral and shared-derived homology in the eutherian ancestral karyotype.
Collapse
Affiliation(s)
- Terence J Robinson
- Evolutionary Genomics Group, Department of Botany and Zoology, University of Stellenbosch, Private Bag X1, Matieland, 7602, South Africa.
| | | |
Collapse
|
23
|
Abstract
We formalize the problem of recovering the evolutionary history of a set of genomes that are related to an unseen common ancestor genome by operations of speciation, deletion, insertion, duplication, and rearrangement of segments of bases. The problem is examined in the limit as the number of bases in each genome goes to infinity. In this limit, the chromosomes are represented by continuous circles or line segments. For such an infinite-sites model, we present a polynomial-time algorithm to find the most parsimonious evolutionary history of any set of related present-day genomes.
Collapse
|
24
|
Prasad AB, Allard MW, NISC Comparative Sequencing Program, Green ED. Confirming the phylogeny of mammals by use of large comparative sequence data sets. Mol Biol Evol 2008; 25:1795-808. [PMID: 18453548 PMCID: PMC2515873 DOI: 10.1093/molbev/msn104] [Citation(s) in RCA: 168] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/07/2008] [Indexed: 11/13/2022] Open
Abstract
The ongoing generation of prodigious amounts of genomic sequence data from myriad vertebrates is providing unparalleled opportunities for establishing definitive phylogenetic relationships among species. The size and complexities of such comparative sequence data sets not only allow smaller and more difficult branches to be resolved but also present unique challenges, including large computational requirements and the negative consequences of systematic biases. To explore these issues and to clarify the phylogenetic relationships among mammals, we have analyzed a large data set of over 60 megabase pairs (Mb) of high-quality genomic sequence, which we generated from 41 mammals and 3 other vertebrates. All sequences are orthologous to a 1.9-Mb region of the human genome that encompasses the cystic fibrosis transmembrane conductance regulator gene (CFTR). To understand the characteristics and challenges associated with phylogenetic analyses of such a large data set, we partitioned the sequence data in several ways and utilized maximum likelihood, maximum parsimony, and Neighbor-Joining algorithms, implemented in parallel on Linux clusters. These studies yielded well-supported phylogenetic trees, largely confirming other recent molecular phylogenetic analyses. Our results provide support for rooting the placental mammal tree between Atlantogenata (Xenarthra and Afrotheria) and Boreoeutheria (Euarchontoglires and Laurasiatheria), illustrate the difficulty in resolving some branches even with large amounts of data (e.g., in the case of Laurasiatheria), and demonstrate the valuable role that very large comparative sequence data sets can play in refining our understanding of the evolutionary relationships of vertebrates.
Collapse
Affiliation(s)
- Arjun B Prasad
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | | |
Collapse
|
25
|
Abstract
Motivation: Modern techniques can yield the ordering and strandedness of genes on each chromosome of a genome; such data already exists for hundreds of organisms. The evolutionary mechanisms through which the set of the genes of an organism is altered and reordered are of great interest to systematists, evolutionary biologists, comparative genomicists and biomedical researchers. Perhaps the most basic concept in this area is that of evolutionary distance between two genomes: under a given model of genomic evolution, how many events most likely took place to account for the difference between the two genomes? Results: We present a method to estimate the true evolutionary distance between two genomes under the ‘double-cut-and-join’ (DCJ) model of genome rearrangement, a model under which a single multichromosomal operation accounts for all genomic rearrangement events: inversion, transposition, translocation, block interchange and chromosomal fusion and fission. Our method relies on a simple structural characterization of a genome pair and is both analytically and computationally tractable. We provide analytical results to describe the asymptotic behavior of genomes under the DCJ model, as well as experimental results on a wide variety of genome structures to exemplify the very high accuracy (and low variance) of our estimator. Our results provide a tool for accurate phylogenetic reconstruction from multichromosomal gene rearrangement data as well as a theoretical basis for refinements of the DCJ model to account for biological constraints. Availability: All of our software is available in source form under GPL at http://lcbb.epfl.ch Contact:bernard.moret@epfl.ch
Collapse
Affiliation(s)
- Yu Lin
- Laboratory for Computational Biology and Bioinformatics, Swiss Federal Institute of Technology (EPFL), EPFL-IIS-LCBB, INJ 230, Station 14, CH-1015 Lausanne, Switzerland
| | | |
Collapse
|
26
|
Darling AE, Miklós I, Ragan MA. Dynamics of genome rearrangement in bacterial populations. PLoS Genet 2008; 4:e1000128. [PMID: 18650965 PMCID: PMC2483231 DOI: 10.1371/journal.pgen.1000128] [Citation(s) in RCA: 164] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2007] [Accepted: 06/16/2008] [Indexed: 11/24/2022] Open
Abstract
Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of "symmetric inversions"-inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes.
Collapse
Affiliation(s)
- Aaron E Darling
- ARC Center of Excellence in Bioinformatics, The University of Queensland, St. Lucia, Queensland, Australia.
| | | | | |
Collapse
|
27
|
Raphael BJ, Volik S, Yu P, Wu C, Huang G, Linardopoulou EV, Trask BJ, Waldman F, Costello J, Pienta KJ, Mills GB, Bajsarowicz K, Kobayashi Y, Sridharan S, Paris PL, Tao Q, Aerni SJ, Brown RP, Bashir A, Gray JW, Cheng JF, de Jong P, Nefedov M, Ried T, Padilla-Nash HM, Collins CC. A sequence-based survey of the complex structural organization of tumor genomes. Genome Biol 2008; 9:R59. [PMID: 18364049 PMCID: PMC2397511 DOI: 10.1186/gb-2008-9-3-r59] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2007] [Revised: 02/20/2008] [Accepted: 03/25/2008] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using end sequencing profiling, which relies on paired-end sequencing of cloned tumor genomes. RESULTS In the present study brain, breast, ovary, and prostate tumors, along with three breast cancer cell lines, were surveyed using end sequencing profiling, yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization confirmed translocations and complex tumor genome structures that include co-amplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms revealed candidate somatic mutations and an elevated rate of novel single nucleotide polymorphisms in an ovarian tumor. CONCLUSION These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than was previously appreciated and that genomic fusions, including fusion transcripts and proteins, may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.
Collapse
Affiliation(s)
- Benjamin J Raphael
- Department of Computer Science & Center for Computational Molecular Biology, Brown University, Waterman Street, Providence, RI 02912-1910, USA
| | - Stanislav Volik
- Cancer Research Institute, UCSF Comprehensive Cancer Center, Sutter Street, San Francisco, CA 94115, USA
| | - Peng Yu
- Chinese National Human Genome Center, North Yongchang Road, BDA, Beijing, P.R.C. 100016
| | - Chunxiao Wu
- Shandong Provincial Hospital, JingWuWeiQi Road, Jinan, P.R.C. 250021
| | - Guiqing Huang
- Cancer Research Institute, UCSF Comprehensive Cancer Center, Sutter Street, San Francisco, CA 94115, USA
| | - Elena V Linardopoulou
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Fairview Avenue N, Seattle, WA 98109, USA
| | - Barbara J Trask
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Fairview Avenue N, Seattle, WA 98109, USA
| | - Frederic Waldman
- Cancer Research Institute, UCSF Comprehensive Cancer Center, Sutter Street, San Francisco, CA 94115, USA
| | - Joseph Costello
- Cancer Research Institute, UCSF Comprehensive Cancer Center, Sutter Street, San Francisco, CA 94115, USA
| | - Kenneth J Pienta
- The University of Michigan, Departments of Internal Medicine and Urology, E Medical Center Drive, Ann Arbor, MI 48109-0330, USA
| | - Gordon B Mills
- MD Anderson Cancer Center, University of Texas, Holcombe Blvd, Houston, TX 77030, USA
| | - Krystyna Bajsarowicz
- Cancer Research Institute, UCSF Comprehensive Cancer Center, Sutter Street, San Francisco, CA 94115, USA
| | - Yasuko Kobayashi
- Cancer Research Institute, UCSF Comprehensive Cancer Center, Sutter Street, San Francisco, CA 94115, USA
| | - Shivaranjani Sridharan
- Cancer Research Institute, UCSF Comprehensive Cancer Center, Sutter Street, San Francisco, CA 94115, USA
| | - Pamela L Paris
- Cancer Research Institute, UCSF Comprehensive Cancer Center, Sutter Street, San Francisco, CA 94115, USA
| | - Quanzhou Tao
- Amplicon Express, NE Eastgate Blvd, Pullman, WA 99163, USA
| | - Sarah J Aerni
- BioMedical Informatics Program, Stanford University, Stanford, CA 94305, USA
| | - Raymond P Brown
- Bioinformatics Program, University of California, San Diego, Gilman Drive, La Jolla, CA 92093, USA
| | - Ali Bashir
- Bioinformatics Program, University of California, San Diego, Gilman Drive, La Jolla, CA 92093, USA
| | - Joe W Gray
- Lawrence Berkeley National Laboratory, Life Sciences Division, Cyclotron Road, Berkeley, CA 94720-8268, USA
| | - Jan-Fang Cheng
- Lawrence Berkeley National Laboratory, Genomics Division and Joint Genome Institute, Cyclotron Road, Berkeley, CA 94720, USA
| | - Pieter de Jong
- BACPAC Resources Children's Hospital Oakland, 52nd Street, Oakland, CA 94609, USA
| | - Mikhail Nefedov
- BACPAC Resources Children's Hospital Oakland, 52nd Street, Oakland, CA 94609, USA
| | - Thomas Ried
- Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, South Drive, Bldg. 50, MSC-8010, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Hesed M Padilla-Nash
- Section of Cancer Genomics, Genetics Branch, Center for Cancer Research, South Drive, Bldg. 50, MSC-8010, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Colin C Collins
- Cancer Research Institute, UCSF Comprehensive Cancer Center, Sutter Street, San Francisco, CA 94115, USA
| |
Collapse
|
28
|
Abstract
Chromosomal inversions have an important role in evolution, and an increasing number of inversion polymorphisms are being identified in the human population. The evolutionary history of these inversions and the mechanisms by which they arise are therefore of significant interest. Previously, a polymorphic inversion on human chromosome Xq28 that includes the FLNA and EMD loci was discovered and hypothesized to have been the result of nonallelic homologous recombination (NAHR) between near-identical inverted duplications flanking this region. Here, we carried out an in-depth study of the orthologous region in 27 additional eutherians and report that this inversion is not specific to humans, but has occurred independently and repeatedly at least 10 times in multiple eutherian lineages. Moreover, inverted duplications flank the FLNA-EMD region in all 16 species for which high-quality sequence assemblies are available. Based on detailed sequence analyses, we propose a model in which the observed inverted duplications originated from a common duplication event that predates the eutherian radiation. Subsequent gene conversion homogenized the duplications, thereby providing a continuous substrate for NAHR that led to the recurrent inversion of this segment of the genome. These results provide an extreme example in support of the evolutionary breakpoint reusage hypothesis and point out that some near-identical human segmental duplications may, in fact, have originated >100 million years ago.
Collapse
|