1
|
Song W, Li C, Lu Y, Shen D, Jia Y, Huo Y, Piao W, Jin H. Chlomito: a novel tool for precise elimination of organelle genome contamination from nuclear genome assembly. FRONTIERS IN PLANT SCIENCE 2024; 15:1430443. [PMID: 39258299 PMCID: PMC11385003 DOI: 10.3389/fpls.2024.1430443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 08/01/2024] [Indexed: 09/12/2024]
Abstract
Introduction Accurate reference genomes are fundamental to understanding biological evolution, biodiversity, hereditary phenomena and diseases. However, many assembled nuclear chromosomes are often contaminated by organelle genomes, which will mislead bioinformatic analysis, and genomic and transcriptomic data interpretation. Methods To address this issue, we developed a tool named Chlomito, aiming at precise identification and elimination of organelle genome contamination from nuclear genome assembly. Compared to conventional approaches, Chlomito utilized new metrics, alignment length coverage ratio (ALCR) and sequencing depth ratio (SDR), thereby effectively distinguishing true organelle genome sequences from those transferred into nuclear genomes via horizontal gene transfer (HGT). Results The accuracy of Chlomito was tested using sequencing data from Plum, Mango and Arabidopsis. The results confirmed that Chlomito can accurately detect contigs originating from the organelle genomes, and the identified contigs covered most regions of the organelle reference genomes, demonstrating efficiency and precision of Chlomito. Considering user convenience, we further packaged this method into a Docker image, simplified the data processing workflow. Discussion Overall, Chlomito provides an efficient, accurate and convenient method for identifying and removing contigs derived from organelle genomes in genomic assembly data, contributing to the improvement of genome assembly quality.
Collapse
Affiliation(s)
- Wei Song
- Laboratory of Genetics and Disorders, Key Laboratory of Molecular Medicine and Biotherapy, Aerospace Center Hospital, School of Life Science, Beijing Institute of Technology, Beijing, China
| | - Chong Li
- Laboratory of Genetics and Disorders, Key Laboratory of Molecular Medicine and Biotherapy, Aerospace Center Hospital, School of Life Science, Beijing Institute of Technology, Beijing, China
| | - Yanming Lu
- Laboratory of Genetics and Disorders, Key Laboratory of Molecular Medicine and Biotherapy, Aerospace Center Hospital, School of Life Science, Beijing Institute of Technology, Beijing, China
| | - Dawei Shen
- Research Institute for Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Yunxiao Jia
- Laboratory of Genetics and Disorders, Key Laboratory of Molecular Medicine and Biotherapy, Aerospace Center Hospital, School of Life Science, Beijing Institute of Technology, Beijing, China
| | - Yixin Huo
- Laboratory of Genetics and Disorders, Key Laboratory of Molecular Medicine and Biotherapy, Aerospace Center Hospital, School of Life Science, Beijing Institute of Technology, Beijing, China
| | - Weilan Piao
- Laboratory of Genetics and Disorders, Key Laboratory of Molecular Medicine and Biotherapy, Aerospace Center Hospital, School of Life Science, Beijing Institute of Technology, Beijing, China
- Advanced Technology Research Institute, Beijing Institute of Technology, Jinan, China
| | - Hua Jin
- Laboratory of Genetics and Disorders, Key Laboratory of Molecular Medicine and Biotherapy, Aerospace Center Hospital, School of Life Science, Beijing Institute of Technology, Beijing, China
- Advanced Technology Research Institute, Beijing Institute of Technology, Jinan, China
- Department of Pathology, Aerospace Center Hospital, Beijing, China
| |
Collapse
|
2
|
McLay TGB, Murphy DJ, Holmes GD, Mathews S, Brown GK, Cantrill DJ, Udovicic F, Allnutt TR, Jackson CJ. A genome resource for Acacia, Australia's largest plant genus. PLoS One 2022; 17:e0274267. [PMID: 36240205 PMCID: PMC9565413 DOI: 10.1371/journal.pone.0274267] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 08/24/2022] [Indexed: 11/05/2022] Open
Abstract
Acacia (Leguminosae, Caesalpinioideae, mimosoid clade) is the largest and most widespread genus of plants in the Australian flora, occupying and dominating a diverse range of environments, with an equally diverse range of forms. For a genus of its size and importance, Acacia currently has surprisingly few genomic resources. Acacia pycnantha, the golden wattle, is a woody shrub or tree occurring in south-eastern Australia and is the country's floral emblem. To assemble a genome for A. pycnantha, we generated long-read sequences using Oxford Nanopore Technology, 10x Genomics Chromium linked reads, and short-read Illumina sequences, and produced an assembly spanning 814 Mb, with a scaffold N50 of 2.8 Mb, and 98.3% of complete Embryophyta BUSCOs. Genome annotation predicted 47,624 protein-coding genes, with 62.3% of the genome predicted to comprise transposable elements. Evolutionary analyses indicated a shared genome duplication event in the Caesalpinioideae, and conflict in the relationships between Cercis (subfamily Cercidoideae) and subfamilies Caesalpinioideae and Papilionoideae (pea-flowered legumes). Comparative genomics identified a suite of expanded and contracted gene families in A. pycnantha, and these were annotated with both GO terms and KEGG functional categories. One expanded gene family of particular interest is involved in flowering time and may be associated with the characteristic synchronous flowering of Acacia. This genome assembly and annotation will be a valuable resource for all studies involving Acacia, including the evolution, conservation, breeding, invasiveness, and physiology of the genus, and for comparative studies of legumes.
Collapse
Affiliation(s)
- Todd G. B. McLay
- Royal Botanic Gardens Victoria, South Yarra, Victoria, Australia
- School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Australian Biodiversity Research, CSIRO, Black Mountain, Australian Capital Territory, Australia
| | - Daniel J. Murphy
- Royal Botanic Gardens Victoria, South Yarra, Victoria, Australia
| | - Gareth D. Holmes
- Royal Botanic Gardens Victoria, South Yarra, Victoria, Australia
| | - Sarah Mathews
- Centre for Australian Biodiversity Research, CSIRO, Black Mountain, Australian Capital Territory, Australia
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Gillian K. Brown
- Queensland Herbarium, Department of Environment and Science, Toowong, Queensland, Australia
| | | | - Frank Udovicic
- Royal Botanic Gardens Victoria, South Yarra, Victoria, Australia
| | | | - Chris J. Jackson
- Royal Botanic Gardens Victoria, South Yarra, Victoria, Australia
| |
Collapse
|
3
|
Sinding MHS, Ciucani MM, Ramos-Madrigal J, Carmagnini A, Rasmussen JA, Feng S, Chen G, Vieira FG, Mattiangeli V, Ganjoo RK, Larson G, Sicheritz-Pontén T, Petersen B, Frantz L, Gilbert MTP, Bradley DG. Kouprey ( Bos sauveli) genomes unveil polytomic origin of wild Asian Bos. iScience 2021; 24:103226. [PMID: 34712923 PMCID: PMC8531564 DOI: 10.1016/j.isci.2021.103226] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 08/11/2021] [Accepted: 10/01/2021] [Indexed: 12/30/2022] Open
Abstract
The evolution of the genera Bos and Bison, and the nature of gene flow between wild and domestic species, is poorly understood, with genomic data of wild species being limited. We generated two genomes from the likely extinct kouprey (Bos sauveli) and analyzed them alongside other Bos and Bison genomes. We found that B. sauveli possessed genomic signatures characteristic of an independent species closely related to Bos javanicus and Bos gaurus. We found evidence for extensive incomplete lineage sorting across the three species, consistent with a polytomic diversification of the major ancestry in the group, potentially followed by secondary gene flow. Finally, we detected significant gene flow from an unsampled Asian Bos-like source into East Asian zebu cattle, demonstrating both that the full genomic diversity and evolutionary history of the Bos complex has yet to be elucidated and that museum specimens and ancient DNA are valuable resources to do so. We generated two genomes from the likely extinct kouprey (Bos sauveli) Extensive mt and nuclear-genome-wide incomplete lineage sorting across wild Asian Bos Initial polytomic diversification of the wild Asian Bos—kouprey, banteng, and gaur
Collapse
Affiliation(s)
| | | | | | - Alberto Carmagnini
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| | - Jacob Agerbo Rasmussen
- Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Laboratory of Genomics and Molecular Medicine, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Shaohong Feng
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
| | - Guangji Chen
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
- University of Chinese Academy of Sciences, Beijing, China
| | | | | | | | - Greger Larson
- The Palaeogenomics and Bio-Archaeology Research Network, Research Laboratory for Archaeology and History of Art, University of Oxford, Oxford, UK
| | - Thomas Sicheritz-Pontén
- Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, AIMST University, Kedah, Malaysia
| | - Bent Petersen
- Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, AIMST University, Kedah, Malaysia
| | - Laurent Frantz
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
- Palaeogenomics Group, Department of Veterinary Sciences, Ludwig Maximilian University, Munich, Germany
| | - M. Thomas P. Gilbert
- Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Center for Evolutionary Hologenomics, University of Copenhagen, Copenhagen, Denmark
- Norwegian University of Science and Technology, University Museum, Trondheim, Norway
| | - Daniel G. Bradley
- Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
4
|
Diroma MA, Modi A, Lari M, Sineo L, Caramelli D, Vai S. New Insights Into Mitochondrial DNA Reconstruction and Variant Detection in Ancient Samples. Front Genet 2021; 12:619950. [PMID: 33679884 PMCID: PMC7930628 DOI: 10.3389/fgene.2021.619950] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/12/2021] [Indexed: 11/13/2022] Open
Abstract
Ancient DNA (aDNA) studies are frequently focused on the analysis of the mitochondrial DNA (mtDNA), which is much more abundant than the nuclear genome, hence can be better retrieved from ancient remains. However, postmortem DNA damage and contamination make the data analysis difficult because of DNA fragmentation and nucleotide alterations. In this regard, the assessment of the heteroplasmic fraction in ancient mtDNA has always been considered an unachievable goal due to the complexity in distinguishing true endogenous variants from artifacts. We implemented and applied a computational pipeline for mtDNA analysis to a dataset of 30 ancient human samples from an Iron Age necropolis in Polizzello (Sicily, Italy). The pipeline includes several modules from well-established tools for aDNA analysis and a recently released variant caller, which was specifically conceived for mtDNA, applied for the first time to aDNA data. Through a fine-tuned filtering on variant allele sequencing features, we were able to accurately reconstruct nearly complete (>88%) mtDNA genome for almost all the analyzed samples (27 out of 30), depending on the degree of preservation and the sequencing throughput, and to get a reliable set of variants allowing haplogroup prediction. Additionally, we provide guidelines to deal with possible artifact sources, including nuclear mitochondrial sequence (NumtS) contamination, an often-neglected issue in ancient mtDNA surveys. Potential heteroplasmy levels were also estimated, although most variants were likely homoplasmic, and validated by data simulations, proving that new sequencing technologies and software are sensitive enough to detect partially mutated sites in ancient genomes and discriminate true variants from artifacts. A thorough functional annotation of detected and filtered mtDNA variants was also performed for a comprehensive evaluation of these ancient samples.
Collapse
Affiliation(s)
- Maria Angela Diroma
- Dipartimento di Biologia, Università degli Studi di Firenze, Florence, Italy
| | - Alessandra Modi
- Dipartimento di Biologia, Università degli Studi di Firenze, Florence, Italy
| | - Martina Lari
- Dipartimento di Biologia, Università degli Studi di Firenze, Florence, Italy
| | - Luca Sineo
- Dipartimento di Scienze e Tecnologie Biologiche, Chimiche e Farmaceutiche, Università degli Studi di Palermo, Palermo, Italy
| | - David Caramelli
- Dipartimento di Biologia, Università degli Studi di Firenze, Florence, Italy
| | - Stefania Vai
- Dipartimento di Biologia, Università degli Studi di Firenze, Florence, Italy
| |
Collapse
|
5
|
Schmid S, Neuenschwander S, Pitteloud C, Heckel G, Pajkovic M, Arlettaz R, Alvarez N. Spatial and temporal genetic dynamics of the grasshopper Oedaleus decorus revealed by museum genomics. Ecol Evol 2018; 8:1480-1495. [PMID: 29435226 PMCID: PMC5792620 DOI: 10.1002/ece3.3699] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 11/06/2017] [Accepted: 11/10/2017] [Indexed: 12/14/2022] Open
Abstract
Analyzing genetic variation through time and space is important to identify key evolutionary and ecological processes in populations. However, using contemporary genetic data to infer the dynamics of genetic diversity may be at risk of a bias, as inferences are performed from a set of extant populations, setting aside unavailable, rare, or now extinct lineages. Here, we took advantage of new developments in next-generation sequencing to analyze the spatial and temporal genetic dynamics of the grasshopper Oedaleus decorus, a steppic Southwestern-Palearctic species. We applied a recently developed hybridization capture (hyRAD) protocol that allows retrieving orthologous sequences even from degraded DNA characteristic of museum specimens. We identified single nucleotide polymorphisms in 68 historical and 51 modern samples in order to (i) unravel the spatial genetic structure across part of the species distribution and (ii) assess the loss of genetic diversity over the past century in Swiss populations. Our results revealed (i) the presence of three potential glacial refugia spread across the European continent and converging spatially in the Alpine area. In addition, and despite a limited population sample size, our results indicate (ii) a loss of allelic richness in contemporary Swiss populations compared to historical populations, whereas levels of expected heterozygosities were not significantly different. This observation is compatible with an increase in the bottleneck magnitude experienced by central European populations of O. decorus following human-mediated land-use change impacting steppic habitats. Our results confirm that application of hyRAD to museum samples produces valuable information to study genetic processes across time and space.
Collapse
Affiliation(s)
- Sarah Schmid
- Department of Ecology and EvolutionUniversity of LausanneLausanneSwitzerland
| | | | - Camille Pitteloud
- Department of Environmental Systems ScienceEidgenössische Technische Hochschule ZürichZürichSwitzerland
| | - Gerald Heckel
- Institute of Ecology and EvolutionUniversity of BernBernSwitzerland
| | - Mila Pajkovic
- Department of Ecology and EvolutionUniversity of LausanneLausanneSwitzerland
| | - Raphaël Arlettaz
- Institute of Ecology and EvolutionUniversity of BernBernSwitzerland
| | - Nadir Alvarez
- Department of Ecology and EvolutionUniversity of LausanneLausanneSwitzerland
- Natural History Museum of GenevaGenevaSwitzerland
| |
Collapse
|
6
|
Miller ME, Liberatore KL, Kianian SF. Optimization and Comparative Analysis of Plant Organellar DNA Enrichment Methods Suitable for Next-generation Sequencing. J Vis Exp 2017. [PMID: 28784941 DOI: 10.3791/55528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
Plant organellar genomes contain large, repetitive elements that may undergo pairing or recombination to form complex structures and/or sub-genomic fragments. Organellar genomes also exist in admixtures within a given cell or tissue type (heteroplasmy), and an abundance of subtypes may change throughout development or when under stress (sub-stoichiometric shifting). Next-generation sequencing (NGS) technologies are required to obtain deeper understanding of organellar genome structure and function. Traditional sequencing studies use several methods to obtain organellar DNA: (1) If a large amount of starting tissue is used, it is homogenized and subjected to differential centrifugation and/or gradient purification. (2) If a smaller amount of tissue is used (i.e., if seeds, material, or space is limited), the same process is performed as in (1), followed by whole-genome amplification to obtain sufficient DNA. (3) Bioinformatics analysis can be used to sequence the total genomic DNA and to parse out organellar reads. All these methods have inherent challenges and tradeoffs. In (1), it may be difficult to obtain such a large amount of starting tissue; in (2), whole-genome amplification could introduce a sequencing bias; and in (3), homology between nuclear and organellar genomes could interfere with assembly and analysis. In plants with large nuclear genomes, it is advantageous to enrich for organellar DNA to reduce sequencing costs and sequence complexity for bioinformatics analyses. Here, we compare a traditional differential centrifugation method with a fourth method, an adapted CpG-methyl pulldown approach, to separate the total genomic DNA into nuclear and organellar fractions. Both methods yield sufficient DNA for NGS, DNA that is highly enriched for organellar sequences, albeit at different ratios in mitochondria and chloroplasts. We present the optimization of these methods for wheat leaf tissue and discuss major advantages and disadvantages of each approach in the context of sample input, protocol ease, and downstream application.
Collapse
Affiliation(s)
- Marisa E Miller
- Cereal Disease Laboratory, United States Department of Agriculture-Agricultural Research Service; Department of Horticultural Science, University of Minnesota
| | - Katie L Liberatore
- Cereal Disease Laboratory, United States Department of Agriculture-Agricultural Research Service; Department of Plant Pathology, University of Minnesota
| | - Shahryar F Kianian
- Cereal Disease Laboratory, United States Department of Agriculture-Agricultural Research Service; Department of Plant Pathology, University of Minnesota;
| |
Collapse
|
7
|
da Fonseca RR, Albrechtsen A, Themudo GE, Ramos-Madrigal J, Sibbesen JA, Maretty L, Zepeda-Mendoza ML, Campos PF, Heller R, Pereira RJ. Next-generation biology: Sequencing and data analysis approaches for non-model organisms. Mar Genomics 2016; 30:3-13. [DOI: 10.1016/j.margen.2016.04.012] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Revised: 03/23/2016] [Accepted: 04/26/2016] [Indexed: 10/21/2022]
|