1
|
Twelve newly assembled jasmine chloroplast genomes: unveiling genomic diversity, phylogenetic relationships and evolutionary patterns among Oleaceae and Jasminum species. BMC PLANT BIOLOGY 2024; 24:331. [PMID: 38664619 PMCID: PMC11044428 DOI: 10.1186/s12870-024-04995-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Accepted: 04/08/2024] [Indexed: 04/29/2024]
Abstract
BACKGROUND Jasmine (Jasminum), renowned for its ornamental value and captivating fragrance, has given rise to numerous species and accessions. However, limited knowledge exists regarding the evolutionary relationships among various Jasminum species. RESULTS In the present study, we sequenced seven distinct Jasminum species, resulting in the assembly of twelve high-quality complete chloroplast (cp) genomes. Our findings revealed that the size of the 12 cp genomes ranged from 159 to 165 kb and encoded 134-135 genes, including 86-88 protein-coding genes, 38-40 tRNA genes, and 8 rRNA genes. J. nudiflorum exhibited a larger genome size compared to other species, mainly attributed to the elevated number of forward repeats (FRs). Despite the typically conservative nature of chloroplasts, variations in the presence or absence of accD have been observed within J. sambac. The calculation of nucleotide diversity (Pi) values for 19 cp genomes indicated that potential mutation hotspots were more likely to be located in LSC regions than in other regions, particularly in genes ycf2, rbcL, atpE, ndhK, and ndhC (Pi > 0.2). Ka/Ks values revealed strong selection pressure on the genes rps2, atpA, rpoA, rpoC1, and rpl33 when comparing J. sambac with the three most closely related species (J. auriculatum, J. multiflorum, and J. dichotomum). Additionally, SNP identification, along with the results of Structure, PCA, and phylogenetic tree analyses, divided the Jasminum cp genomes into six groups. Notably, J. polyanthum showed gene flow signals from both the G5 group (J. nudiflorum) and the G3 group (J. tortuosum and J. fluminense). Phylogenetic tree analysis reflected that most species from the same genus clustered together with robust support in Oleaceae, strongly supporting the monophyletic nature of cp genomes within the genus Jasminum. CONCLUSION Overall, this study provides comprehensive insights into the genomic composition, variation, and phylogenetic relationships among various Jasminum species. These findings enhance our understanding of the genetic diversity and evolutionary history of Jasminum.
Collapse
|
2
|
Stochastic organelle genome segregation through Arabidopsis development and reproduction. THE NEW PHYTOLOGIST 2024; 241:896-910. [PMID: 37925790 PMCID: PMC10841260 DOI: 10.1111/nph.19288] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 09/07/2023] [Indexed: 11/07/2023]
Abstract
Organelle DNA (oDNA) in mitochondria and plastids is vital for plant (and eukaryotic) life. Selection against damaged oDNA is mediated in part by segregation - sorting different oDNA types into different cells in the germline. Plants segregate oDNA very rapidly, with oDNA recombination protein MSH1 a key driver of this segregation, but we have limited knowledge of the dynamics of this segregation within plants and between generations. Here, we reveal how oDNA evolves through Arabidopsis thaliana development and reproduction. We combine stochastic modelling, Bayesian inference, and model selection with new and existing tissue-specific oDNA measurements from heteroplasmic Arabidopsis plant lines through development and between generations. Segregation proceeds gradually but continually during plant development, with a more rapid increase between inflorescence formation and the next generation. When MSH1 is compromised, the majority of observed segregation can be achieved through partitioning at cell divisions. When MSH1 is functional, mtDNA segregation is far more rapid; we show that increased oDNA gene conversion is a plausible mechanism quantitatively explaining this acceleration. These findings reveal the quantitative, time-dependent details of oDNA segregation in Arabidopsis. We also discuss the support for different models of the plant germline provided by these observations.
Collapse
|
3
|
Whole mitochondrial and chloroplast genome sequencing of Tunisian date palm cultivars: diversity and evolutionary relationships. BMC Genomics 2023; 24:772. [PMID: 38093186 PMCID: PMC10720229 DOI: 10.1186/s12864-023-09872-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 12/04/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND Date palm (Phoenix dactylifera L.) is the most widespread crop in arid and semi-arid regions and has great traditional and socioeconomic importance, with its fruit well-known for its high nutritional and health value. However, the genetic variation of date palm cultivars is often neglected. The advent of high-throughput sequencing has made possible the resequencing of whole organelle (mitochondria and chloroplast) genomes to explore the genetic diversity and phylogenetic relationships of cultivated plants with unprecedented detail. RESULTS Whole organelle genomes of 171 Tunisian accessions (135 females and 36 males) were sequenced. Targeted bioinformatics pipelines were used to identify date palm haplotypes and genome variants, aiming to provide variant annotation and investigate patterns of evolutionary relationship. Our results revealed the existence of unique haplotypes, identified by 45 chloroplastic and 156 mitochondrial SNPs. Estimation of the effect of these SNPs on genes functions was predicted in silico. CONCLUSIONS The results of this study have important implications, in the light of ongoing environmental changes, for the conservation and sustainable use of the genetic resources of date palm cultivars in Tunisia, where monoculture threatens biodiversity leading to genetic erosion. These data will be useful for breeding and genetic improvement programs of the date palm through selective cross-breeding.
Collapse
|
4
|
The Chloroplast Genome of Endive ( Cichorium endivia L.): Cultivar Structural Variants and Transcriptome Responses to Stress Due to Rain Extreme Events. Genes (Basel) 2023; 14:1829. [PMID: 37761969 PMCID: PMC10531310 DOI: 10.3390/genes14091829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 09/15/2023] [Accepted: 09/19/2023] [Indexed: 09/29/2023] Open
Abstract
The chloroplast (cp) genome diversity has been used in phylogeny studies, breeding, and variety protection, and its expression has been shown to play a role in stress response. Smooth- and curly-leafed endives (Cichorium endivia var. latifolium and var. crispum) are of nutritional and economic importance and are the target of ever-changing breeding programmes. A reference cp genome sequence was assembled and annotated (cultivar 'Confiance'), which was 152,809 base pairs long, organized into the angiosperm-typical quadripartite structure, harboring two inverted repeats separated by the large- and short- single copy regions. The annotation included 136 genes, 90 protein-coding genes, 38 transfer, and 8 ribosomal RNAs and the sequence generated a distinct phyletic group within Asteraceae with the well-separated C. endivia and intybus species. SSR variants within the reference genome were mostly of tri-nucleotide type, and the cytosine to uracil (C/U) RNA editing recurred. The cp genome was nearly fully transcribed, hence sequence polymorphism was investigated by RNA-Seq of seven cultivars, and the SNP number was higher in smooth- than curly-leafed ones. All cultivars maintained C/U changes in identical positions, suggesting that RNA editing patterns were conserved; most cultivars shared SNPs of moderate impact on protein changes in the ndhD, ndhA, and psbF genes, suggesting that their variability may have a potential role in adaptive response. The cp transcriptome expression was investigated in leaves of plants affected by pre-harvest rainfall and rainfall excess plus waterlogging events characterized by production loss, compared to those of a cycle not affected by extreme rainfall. Overall, the analyses evidenced stress- and cultivar-specific responses, and further revealed that genes of the Cytochrome b6/f, and PSI-PSII systems were commonly affected and likely to be among major targets of extreme rain-related stress.
Collapse
|
5
|
Nuclear DNA segments homologous to mitochondrial DNA are obstacles for detecting heteroplasmy in sugar beet (Beta vulgaris L.). PLoS One 2023; 18:e0285430. [PMID: 37552681 PMCID: PMC10409277 DOI: 10.1371/journal.pone.0285430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 04/21/2023] [Indexed: 08/10/2023] Open
Abstract
Heteroplasmy, the coexistence of multiple mitochondrial DNA (mtDNA) sequences in a cell, is well documented in plants. Next-generation sequencing technology (NGS) has made it feasible to sequence entire genomes. Thus, NGS has the potential to detect heteroplasmy; however, the methods and pitfalls in heteroplasmy detection have not been fully investigated and identified. One obstacle for heteroplasmy detection is the sequence homology between mitochondrial-, plastid-, and nuclear DNA, of which the influence of nuclear DNA segments homologous to mtDNA (numt) need to be minimized. To detect heteroplasmy, we first excluded nuclear DNA sequences of sugar beet (Beta vulgaris) line EL10 from the sugar beet mtDNA sequence. NGS reads were obtained from single plants of sugar beet lines NK-195BRmm-O and NK-291BRmm-O and mapped to the unexcluded mtDNA regions. More than 1000 sites exhibited intra-individual polymorphism as detected by genome browsing analysis. We focused on a 309-bp region where 12 intra-individual polymorphic sites were closely linked to each other. Although the existence of DNA molecules having variant alleles at the 12 sites was confirmed by PCR amplification from NK-195BRmm-O and NK-291BRmm-O, these variants were not always called by six variant-calling programs, suggesting that these programs are inappropriate for intra-individual polymorphism detection. When we changed the nuclear DNA reference, a numt absent from EL10 was found to include the 309-bp region. Genetic segregation of an F2 population from NK-195BRmm-O x NK-291BRmm-O supported the numt origin of the variant alleles. Using four references, we found that numt detection exhibited reference dependency, and extreme polymorphism of numts exists among sugar beet lines. One of the identified numts absent from EL10 is also associated with another intra-individual polymorphic site in NK-195mm-O. Our data suggest that polymorphism among numts is unexpectedly high within sugar beets, leading to confusion about the true degree of heteroplasmy.
Collapse
|
6
|
Complete chloroplast genome of two nutmeg species, Myristica argentea Warb. 1891 and Myristica fatua Houtt. 1774 (Myristicaceae). Mitochondrial DNA B Resour 2023; 8:751-755. [PMID: 37485420 PMCID: PMC10361002 DOI: 10.1080/23802359.2023.2233154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 06/29/2023] [Indexed: 07/25/2023] Open
Abstract
Myristica argentea Warb. 1891 and M. fatua Houtt. 1774 are two South-East Asian food tree species. They are harvested from the wild or cultivated for local uses as a condiment (nutmeg and mace), medicine, and source of wood. In this study, we reconstructed the complete chloroplast (cp) genomes of these two species from whole genome sequencing data using the Illumina NovaSeq platform. The genome sizes of M. argentea and M. fatua were respectively 155,871 base pairs (bp) and 155,898 bp, including 126 genes and an overall GC content of 39.20% in both species. Our study provides useful resources for future evolutionary research and diversity analysis of Myristica species.
Collapse
|
7
|
Chloroplast genomes of four Carex species: Long repetitive sequences trigger dramatic changes in chloroplast genome structure. FRONTIERS IN PLANT SCIENCE 2023; 14:1100876. [PMID: 36778700 PMCID: PMC9911286 DOI: 10.3389/fpls.2023.1100876] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Accepted: 01/13/2023] [Indexed: 06/18/2023]
Abstract
The chloroplast genomes of angiosperms usually have a stable circular quadripartite structure that exhibits high consistency in genome size and gene order. As one of the most diverse genera of angiosperms, Carex is of great value for the study of evolutionary relationships and speciation within its genus, but the study of the structure of its chloroplast genome is limited due to its highly expanded and restructured genome with a large number of repeats. In this study, we provided a more detailed account of the chloroplast genomes of Carex using a hybrid assembly of second- and third-generation sequencing and examined structural variation within this genus. The study revealed that chloroplast genomes of four Carex species are significantly longer than that of most angiosperms and are characterized by high sequence rearrangement rates, low GC content and gene density, and increased repetitive sequences. The location of chloroplast genome structural variation in the species of Carex studied is closely related to the positions of long repeat sequences; this genus provides a typical example of chloroplast structural variation and expansion caused by long repeats. Phylogenetic relationships constructed based on the chloroplast protein-coding genes support the latest taxonomic system of Carex, while revealing that structural variation in the chloroplast genome of Carex may have some phylogenetic significance. Moreover, this study demonstrated a hybrid assembly approach based on long and short reads to analyze complex chloroplast genome assembly and also provided an important reference for the analysis of structural rearrangements of chloroplast genomes in other taxa.
Collapse
|
8
|
Phylogenomic analyses based on the plastid genome and concatenated nrDNA sequence data reveal cytonuclear discordance in genus Atractylodes (Asteraceae: Carduoideae). FRONTIERS IN PLANT SCIENCE 2022; 13:1045423. [PMID: 36531370 PMCID: PMC9752137 DOI: 10.3389/fpls.2022.1045423] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 11/10/2022] [Indexed: 05/31/2023]
Abstract
Atractylodes species are widely distributed across East Asia and are cultivated as medicinal herbs in China, Japan, and Korea. Their unclear morphological characteristics and low levels of genetic divergence obscure the taxonomic relationships among these species. In this study, 24 plant samples were collected representing five species of Atractylodes located in China; of these, 23 belonged to members of the A. lancea complex. High-throughput sequencing was used to obtain the concatenated nrDNA sequences (18S-ITS1-5.8S-ITS2-28S) and plastid genomes. The concatenated nrDNA sequence lengths for all the Atractylodes species were 5,849 bp, and the GC content was 55%. The lengths of the whole plastid genome sequences ranged from 152,138 bp (A. chinensis) to 153,268 bp (A. lancea), while their insertion/deletion sites were mainly distributed in the intergenic regions. Furthermore, 33, 34, 36, 31, and 32 tandem repeat sequences, as well as 30, 30, 29, 30, and 30 SSR loci, were detected in A. chinensis, A. koreana, A. lancea, A. japonica, and A. macrocephala, respectively. In addition to these findings, a considerable number of heteroplasmic variations were detected in the plastid genomes, implying a complicated phylogenetic history for Atractylodes. The results of the phylogenetic analysis involving concatenated nrDNA sequences showed that A. lancea and A. japonica formed two separate clades, with A. chinensis and A. koreana constituting their sister clade, while A. lancea, A. koreana, A. chinensis, and A. japonica were found based on plastid datasets to represent a mixed clade on the phylogenetic tree. Phylogenetic network analysis suggested that A. lancea may have hybridized with the common ancestor of A. chinensis and A. japonica, while ABBA-BABA tests of SNPs in the plastid genomes showed that A. chinensis was more closely related to A. japonica than to A. lancea. This study reveals the extensive discordance and complexity of the relationships across the members of the A. lancea complex (A. lancea, A. chinensis, A. koreana, and A. japonica) according to cytonuclear genomic data; this may be caused by interspecific hybridization or gene introgression.
Collapse
|
9
|
Recovery of chloroplast genomes from medieval millet grains excavated from the Areni-1 cave in southern Armenia. Sci Rep 2022; 12:15164. [PMID: 36071150 PMCID: PMC9452526 DOI: 10.1038/s41598-022-17931-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 08/02/2022] [Indexed: 11/13/2022] Open
Abstract
Panicum miliaceum L. was domesticated in northern China at least 7000 years ago and was subsequentially adopted in many areas throughout Eurasia. One such locale is Areni-1 an archaeological cave site in Southern Armenia, where vast quantities archaeobotanical material were well preserved via desiccation. The rich botanical material found at Areni-1 includes P. miliaceum grains that were identified morphologically and14C dated to the medieval period (873 ± 36 CE and 1118 ± 35 CE). To investigate the demographic and evolutionary history of the Areni-1 millet, we used ancient DNA extraction, hybridization capture enrichment, and high throughput sequencing to assemble three chloroplast genomes from the medieval grains and then compared these sequences to 50 modern P. miliaceum chloroplast genomes. Overall, the chloroplast genomes contained a low amount of diversity with domesticated accessions separated by a maximum of 5 SNPs and little inference on demography could be made. However, in phylogenies the chloroplast genomes separated into two clades, similar to what has been reported for nuclear DNA from P. miliaceum. The chloroplast genomes of two wild (undomesticated) accessions of P. miliaceum contained a relatively large number of variants, 11 SNPs, not found in the domesticated accessions. These results demonstrate that P. miliaceum grains from archaeological sites can preserve DNA for at least 1000 years and serve as a genetic resource to study the domestication of this cereal crop.
Collapse
|
10
|
Software Choice and Sequencing Coverage Can Impact Plastid Genome Assembly-A Case Study in the Narrow Endemic Calligonum bakuense. FRONTIERS IN PLANT SCIENCE 2022; 13:779830. [PMID: 35874012 PMCID: PMC9296850 DOI: 10.3389/fpls.2022.779830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Accepted: 06/13/2022] [Indexed: 06/15/2023]
Abstract
Most plastid genome sequences are assembled from short-read whole-genome sequencing data, yet the impact that sequencing coverage and the choice of assembly software can have on the accuracy of the resulting assemblies is poorly understood. In this study, we test the impact of both factors on plastid genome assembly in the threatened and rare endemic shrub Calligonum bakuense. We aim to characterize the differences across plastid genome assemblies generated by different assembly software tools and levels of sequencing coverage and to determine if these differences are large enough to affect the phylogenetic position inferred for C. bakuense compared to congeners. Four assembly software tools (FastPlast, GetOrganelle, IOGA, and NOVOPlasty) and seven levels of sequencing coverage across the plastid genome (original sequencing depth, 2,000x, 1,000x, 500x, 250x, 100x, and 50x) are compared in our analyses. The resulting assemblies are evaluated with regard to reproducibility, contig number, gene complement, inverted repeat length, and computation time; the impact of sequence differences on phylogenetic reconstruction is assessed. Our results show that software choice can have a considerable impact on the accuracy and reproducibility of plastid genome assembly and that GetOrganelle produces the most consistent assemblies for C. bakuense. Moreover, we demonstrate that a sequencing coverage between 500x and 100x can reduce both the sequence variability across assembly contigs and computation time. When comparing the most reliable plastid genome assemblies of C. bakuense, a sequence difference in only three nucleotide positions is detected, which is less than the difference potentially introduced through software choice.
Collapse
|
11
|
Born in the mitochondrion and raised in the nucleus: evolution of a novel tandem repeat family in Medicago polymorpha (Fabaceae). THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 110:389-406. [PMID: 35061308 DOI: 10.1111/tpj.15676] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 01/13/2022] [Indexed: 06/14/2023]
Abstract
Plant nuclear genomes harbor sequence elements derived from the organelles (mitochondrion and plastid) through intracellular gene transfer (IGT). Nuclear genomes also show a dramatic range of repeat content, suggesting that any sequence can be readily amplified. These two aspects of plant nuclear genomes are well recognized but have rarely been linked. Through investigation of 31 Medicago taxa we detected exceptionally high post-IGT amplification of mitochondrial (mt) DNA sequences containing rps10 in the nuclear genome of Medicago polymorpha and closely related species. The amplified sequences were characterized as tandem arrays of five distinct repeat motifs (2157, 1064, 987, 971, and 587 bp) that have diverged from the mt genome (mitogenome) in the M. polymorpha nuclear genome. The mt rps10-like arrays were identified in seven loci (six intergenic and one telomeric) of the nuclear chromosome assemblies and were the most abundant tandem repeat family, representing 1.6-3.0% of total genomic DNA, a value approximately three-fold greater than the entire mitogenome in M. polymorpha. Compared to a typical mt gene, the mt rps10-like sequence coverage level was 691.5-7198-fold higher in M. polymorpha and closely related species. In addition to the post-IGT amplification, our analysis identified the canonical telomeric repeat and the species-specific satellite arrays that are likely attributable to an ancestral chromosomal fusion in M. polymorpha. A possible relationship between chromosomal instability and the mt rps10-like tandem repeat family in the M. polymorpha clade is discussed.
Collapse
|
12
|
Identification of QTLs Controlling Resistance to Anthracnose Disease in Water Yam (Dioscorea alata). Genes (Basel) 2022; 13:genes13020347. [PMID: 35205389 PMCID: PMC8872494 DOI: 10.3390/genes13020347] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 02/08/2022] [Accepted: 02/11/2022] [Indexed: 02/04/2023] Open
Abstract
Anthracnose disease caused by a fungus Colletotrichum gloeosporioides is the primary cause of yield loss in water yam (Dioscorea alata), the widely cultivated species of yam. Resistance to yam anthracnose disease (YAD) is a prime target in breeding initiatives to develop durable-resistant cultivars for sustainable management of the disease in water yam cultivation. This study aimed at tagging quantitative trait loci (QTL) for anthracnose disease resistance in a bi-parental mapping population of D. alata. Parent genotypes and their recombinant progenies were genotyped using the Genotyping by Sequencing (GBS) platform and phenotyped in two crop cycles for two years. A high-density genetic linkage map was built with 3184 polymorphic Single Nucleotide Polymorphism (NSP) markers well distributed across the genome, covering 1460.94 cM total length. On average, 163 SNP markers were mapped per chromosome with 0.58 genetic distances between SNPs. Four QTL regions related to yam anthracnose disease resistance were identified on three chromosomes. The proportion of phenotypic variance explained by these QTLs ranged from 29.54 to 39.40%. The QTL regions identified showed genes that code for known plant defense responses such as GDSL-like Lipase/Acylhydrolase, Protein kinase domain, and F-box protein. The results from the present study provide valuable insight into the genetic architecture of anthracnose resistance in water yam. The candidate markers identified herewith form a relevant resource to apply marker-assisted selection as an alternative to a conventional labor-intensive screening for anthracnose resistance in water yam.
Collapse
|
13
|
Adaptive potential of
Coffea canephora
from Uganda in response to climate change. Mol Ecol 2022; 31:1800-1819. [DOI: 10.1111/mec.16360] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Revised: 11/12/2021] [Accepted: 01/06/2022] [Indexed: 11/28/2022]
|
14
|
Global high-throughput genotyping of organellar genomes reveals insights into the origin and spread of invasive starry stonewort (Nitellopsis obtusa). Biol Invasions 2021. [DOI: 10.1007/s10530-021-02591-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
Abstract
AbstractAquatic invasive species are damaging to native ecosystems. Preventing their spread and achieving comprehensive control measures requires an understanding of the genetic structure of an invasive population. Organellar genomes (plastid and mitochondrial) are useful for population level analyses of invasive plant distributions. In this study we generate complete organellar reference genomes using PacBio sequencing, then use these reference sequences for SNP calling of high-throughput, multiplexed, Illumina based organellar sequencing of fresh and historical samples from across the native and invasive range of Nitellopsis obtusa (Desv. in Loisel.) J.Groves, an invasive macroalgae. The data generated by the analytical pipeline we develop indicate introduction to North America from Western Europe. A single nucleotide transversion in the plastid genome separates a group of five samples from Michigan and Wisconsin that either resulted from introductions of two closely related genotypes or a mutation that has arisen in the invasive range. This transversion will serve as a useful tool to understand how Nitellopsis obtusa moves across the landscape. The methods and analyses described here are broadly applicable to invasive and native plant and algae species, and allow efficient genotyping of variable quality samples, including 100-year-old herbarium specimens, to determine population structure and geographic distributions.
Collapse
|
15
|
Gene flow, linked selection, and divergent sorting of ancient polymorphism shape genomic divergence landscape in a group of edaphic specialists. Mol Ecol 2021; 31:104-118. [PMID: 34664755 DOI: 10.1111/mec.16226] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 10/07/2021] [Accepted: 10/11/2021] [Indexed: 12/24/2022]
Abstract
Interpreting the formation of genomic variation landscape, especially genomic regions with elevated differentiation (i.e. islands), is fundamental to a better understanding of the genomic consequences of adaptation and speciation. Edaphic islands provide excellent systems for understanding the interplay of gene flow and selection in driving population divergence and speciation. However, discerning the relative contribution of these factors that modify patterns of genomic variation remains difficult. We analysed 132 genomes from five recently divergent species in Primulina genus, with four species distributed in Karst limestone habitats and the fifth one growing in Danxia habitats. We demonstrated that both gene flow and linked selection have contributed to genome-wide variation landscape, where genomic regions with elevated differentiation (i.e., islands) were largely derived by divergent sorting of ancient polymorphism. Specifically, we identified several lineage-specific genomic islands that might have facilitated adaptation of P. suichuanensis to Danxia habitats. Our study is amongst the first cases disentangling evolutionary processes that shape genomic variation of plant specialists, and demonstrates the important role of ancient polymorphism in the formation of genomic islands that potentially mediate adaptation and speciation of endemic plants in special soil habitats.
Collapse
|
16
|
Breed-specific reference sequence optimized mapping accuracy of NGS analyses for pigs. BMC Genomics 2021; 22:736. [PMID: 34641784 PMCID: PMC8507312 DOI: 10.1186/s12864-021-08030-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 09/22/2021] [Indexed: 11/17/2022] Open
Abstract
Background Reference sequences play a vital role in next-generation sequencing (NGS), impacting mapping quality during genome analyses. However, reference genomes usually do not represent the full range of genetic diversity of a species as a result of geographical divergence and independent demographic events of different populations. For the mitochondrial genome (mitogenome), which occurs in high copy numbers in cells and is strictly maternally inherited, an optimal reference sequence has the potential to make mitogenome alignment both more accurate and more efficient. In this study, we used three different types of reference sequences for mitogenome mapping, i.e., the commonly used reference sequence (CU-ref), the breed-specific reference sequence (BS-ref) and the sample-specific reference sequence (SS-ref), respectively, and compared the accuracy of mitogenome alignment and SNP calling among them, for the purpose of proposing the optimal reference sequence for mitochondrial DNA (mtDNA) analyses of specific populations Results Four pigs, representing three different breeds, were high-throughput sequenced, subsequently mapping reads to the reference sequences mentioned above, resulting in a largest mapping ratio and a deepest coverage without increased running time when aligning reads to a BS-ref. Next, single nucleotide polymorphism (SNP) calling was carried out by 18 detection strategies with the three tools SAMtools, VarScan and GATK with different parameters, using the bam results mapping to BS-ref. The results showed that all eighteen strategies achieved the same high specificity and sensitivity, which suggested a high accuracy of mitogenome alignment by the BS-ref because of a low requirement for SNP calling tools and parameter choices. Conclusions This study showed that different reference sequences representing different genetic relationships to sample reads influenced mitogenome alignment, with the breed-specific reference sequences being optimal for mitogenome analyses, which provides a refined processing perspective for NGS data. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08030-1.
Collapse
|
17
|
Complete Chloroplast Genomes of Fagus sylvatica L. Reveal Sequence Conservation in the Inverted Repeat and the Presence of Allelic Variation in NUPTs. Genes (Basel) 2021; 12:1357. [PMID: 34573338 PMCID: PMC8468245 DOI: 10.3390/genes12091357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 08/21/2021] [Accepted: 08/27/2021] [Indexed: 12/17/2022] Open
Abstract
Growing amounts of genomic data and more efficient assembly tools advance organelle genomics at an unprecedented scale. Genomic resources are increasingly used for phylogenetic analyses of many plant species, but are less frequently used to investigate within-species variability and phylogeography. In this study, we investigated genetic diversity of Fagus sylvatica, an important broadleaved tree species of European forests, based on complete chloroplast genomes of 18 individuals sampled widely across the species distribution. Our results confirm the hypothesis of a low cpDNA diversity in European beech. The chloroplast genome size was remarkably stable (158,428 ± 37 bp). The polymorphic markers, 12 microsatellites (SSR), four SNPs and one indel, were found only in the single copy regions, while inverted repeat regions were monomorphic both in terms of length and sequence, suggesting highly efficient suppression of mutation. The within-individual analysis of polymorphisms showed >9k of markers which were proportionally present in gene and non-gene areas. However, an investigation of the frequency of alternate alleles revealed that the source of this diversity originated likely from nuclear-encoded plastome remnants (NUPTs). Phylogeographic and Mantel correlation analysis based on the complete chloroplast genomes exhibited clustering of individuals according to geographic distance in the first distance class, suggesting that the novel markers and in particular the cpSSRs could provide a more detailed picture of beech population structure in Central Europe.
Collapse
|
18
|
Characterization of Penaeus vannamei mitogenome focusing on genetic diversity. PLoS One 2021; 16:e0255291. [PMID: 34329352 PMCID: PMC8323954 DOI: 10.1371/journal.pone.0255291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 07/13/2021] [Indexed: 11/23/2022] Open
Abstract
The diversity of the Penaeus vannamei mitochondrial genome has still been poorly characterized, there are no validated mitochondrial markers available for populational studies, and the heteroplasmy has not yet been investigated in this species. In this study, metagenomic reads extracted from the muscle of a single individual were used to assemble the mitochondrial genome (mtDNA). These data associated with mitochondrial genomes previously described allowed to evaluate the inter-individual variability and heteroplasmy. Comparison among 45 mtDNA control regions led to the detection of conserved and variable segments and the characterization of two hypervariable regions. The analysis of diversity revealed mostly low frequency polymorphisms, and heteroplasmy was found in practically all mitochondrial genes, with a high occurrence of indels. These results indicate that the design of mitochondrial markers for P. vannamei must be done with caution. The mapping of conserved and variable regions and the characterization of heteroplasmy presented here will contribute to increasing the efficiency of mitochondrial markers for population or individual studies.
Collapse
|
19
|
Comparative Mitogenomic Analysis Reveals Gene and Intron Dynamics in Rubiaceae and Intra-Specific Diversification in Damnacanthus indicus. Int J Mol Sci 2021; 22:ijms22137237. [PMID: 34281291 PMCID: PMC8268409 DOI: 10.3390/ijms22137237] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 06/24/2021] [Accepted: 07/01/2021] [Indexed: 12/20/2022] Open
Abstract
The dynamic evolution of mitochondrial gene and intron content has been reported across the angiosperms. However, a reference mitochondrial genome (mitogenome) is not available in Rubiaceae. The phylogenetic utility of mitogenome data at a species level is rarely assessed. Here, we assembled mitogenomes of six Damnacanthus indicus (Rubiaceae, Rubioideae) representing two varieties (var. indicus and var. microphyllus). The gene and intron content of D. indicus was compared with mitogenomes from representative angiosperm species and mitochondrial contigs from the other Rubiaceae species. Mitogenome structural rearrangement and sequence divergence in D. indicus were analyzed in six individuals. The size of the mitogenome in D. indicus varied from 417,661 to 419,435 bp. Comparing the number of intact mitochondrial protein-coding genes in other Gentianales taxa (38), D. indicus included 32 genes representing several losses. The intron analysis revealed a shift from cis to trans splicing of a nad1 intron (nad1i728) in D. indicus and it is a shared character with the other four Rubioideae taxa. Two distinct mitogenome structures (type A and B) were identified. Two-step direct repeat-mediated recombination was proposed to explain structural changes between type A and B mitogenomes. The five individuals from two varieties in D. indicus diverged well in the whole mitogenome-level comparison with one exception. Collectively, our study elucidated the mitogenome evolution in Rubiaceae along with D. indicus and showed the reliable phylogenetic utility of the whole mitogenome data at a species-level evolution.
Collapse
|
20
|
Abstract
BACKGROUND Chloroplasts are intracellular organelles that enable plants to conduct photosynthesis. They arose through the symbiotic integration of a prokaryotic cell into an eukaryotic host cell and still contain their own genomes with distinct genomic information. Plastid genomes accommodate essential genes and are regularly utilized in biotechnology or phylogenetics. Different assemblers that are able to assess the plastid genome have been developed. These assemblers often use data of whole genome sequencing experiments, which usually contain reads from the complete chloroplast genome. RESULTS The performance of different assembly tools has never been systematically compared. Here, we present a benchmark of seven chloroplast assembly tools, capable of succeeding in more than 60% of known real data sets. Our results show significant differences between the tested assemblers in terms of generating whole chloroplast genome sequences and computational requirements. The examination of 105 data sets from species with unknown plastid genomes leads to the assembly of 20 novel chloroplast genomes. CONCLUSIONS We create docker images for each tested tool that are freely available for the scientific community and ensure reproducibility of the analyses. These containers allow the analysis and screening of data sets for chloroplast genomes using standard computational infrastructure. Thus, large scale screening for chloroplasts within genomic sequencing data is feasible.
Collapse
|
21
|
A systematic comparison of chloroplast genome assembly tools. Genome Biol 2020; 21:254. [PMID: 32988404 DOI: 10.1101/665869] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 08/22/2020] [Indexed: 05/21/2023] Open
Abstract
BACKGROUND Chloroplasts are intracellular organelles that enable plants to conduct photosynthesis. They arose through the symbiotic integration of a prokaryotic cell into an eukaryotic host cell and still contain their own genomes with distinct genomic information. Plastid genomes accommodate essential genes and are regularly utilized in biotechnology or phylogenetics. Different assemblers that are able to assess the plastid genome have been developed. These assemblers often use data of whole genome sequencing experiments, which usually contain reads from the complete chloroplast genome. RESULTS The performance of different assembly tools has never been systematically compared. Here, we present a benchmark of seven chloroplast assembly tools, capable of succeeding in more than 60% of known real data sets. Our results show significant differences between the tested assemblers in terms of generating whole chloroplast genome sequences and computational requirements. The examination of 105 data sets from species with unknown plastid genomes leads to the assembly of 20 novel chloroplast genomes. CONCLUSIONS We create docker images for each tested tool that are freely available for the scientific community and ensure reproducibility of the analyses. These containers allow the analysis and screening of data sets for chloroplast genomes using standard computational infrastructure. Thus, large scale screening for chloroplasts within genomic sequencing data is feasible.
Collapse
|
22
|
Mitochondrial DNA enrichment reduced NUMT contamination in porcine NGS analyses. Brief Bioinform 2020; 21:1368-1377. [PMID: 31204429 DOI: 10.1093/bib/bbz060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 04/19/2019] [Indexed: 12/24/2022] Open
Abstract
Genetic associations between mitochondrial DNA (mtDNA) and economic traits have been widely reported for pigs, which indicate the importance of mtDNA. However, studies on mtDNA heteroplasmy in pigs are rare. Next generation sequencing (NGS) methodologies have emerged as a promising genomic approach for detection of mitochondrial heteroplasmy. Due to the short reads, flexible bioinformatic analyses and the contamination of nuclear mitochondrial sequences (NUMTs), NGS was expected to increase false-positive detection of heteroplasmy. In this study, Sanger sequencing was performed as a gold standard to detect heteroplasmy with a detection sensitivity of 5% in pigs and then one whole-genome sequencing method (WGS) and two mtDNA enrichment sequencing methods (Capture and LongPCR) were carried out. The aim of this study was to determine whether mitochondrial heteroplasmy identification from NGS data was affected by NUMTs. We find that WGS generated more false intra-individual polymorphisms and less mapping specificity than the two enrichment sequencing methods, suggesting NUMTs indeed led to false-positive mitochondrial heteroplasmies from NGS data. In addition, to accurately detect mitochondrial diversity, three commonly used tools-SAMtools, VarScan and GATK-with different parameter values were compared. VarScan achieved the best specificity and sensitivity when considering the base alignment quality re-computation and the minimum variant frequency of 0.25. It also suggested bioinformatic workflow interfere in the identification of mtDNA SNPs. In conclusion, intra-individual polymorphism in pig mitochondria from NGS data was confused with NUMTs, and mtDNA-specific enrichment is essential before high-throughput sequencing in the detection of mitochondrial genome sequences.
Collapse
|
23
|
Abstract
Background Plastid genomes typically display a circular, quadripartite structure with two inverted repeat regions, which challenges automatic assembly procedures. The correct assembly of plastid genomes is a prerequisite for the validity of subsequent analyses on genome structure and evolution. The average coverage depth of a genome assembly is often used as an indicator of assembly quality. Visualizing coverage depth across a draft genome is a critical step, which allows users to inspect the quality of the assembly and, where applicable, identify regions of reduced assembly confidence. Despite the interplay between genome structure and assembly quality, no contemporary, user-friendly software tool can visualize the coverage depth of a plastid genome assembly while taking its quadripartite genome structure into account. A software tool is needed that fills this void. Results We introduce ’PACVr’, an R package that visualizes the coverage depth of a plastid genome assembly in relation to the circular, quadripartite structure of the genome as well as the individual plastome genes. By using a variable window approach, the tool allows visualizations on different calculation scales. It also confirms sequence equality of, as well as visualizes gene synteny between, the inverted repeat regions of the input genome. As a tool for plastid genomics, PACVr provides the functionality to identify regions of coverage depth above or below user-defined threshold values and helps to identify non-identical IR regions. To allow easy integration into bioinformatic workflows, PACVr can be invoked from a Unix shell, facilitating its use in automated quality control. We illustrate the application of PACVr on four empirical datasets and compare visualizations generated by PACVr with those of alternative software tools. Conclusions PACVr provides a user-friendly tool to visualize (a) the coverage depth of a plastid genome assembly on a circular, quadripartite plastome map and in relation to individual plastome genes, and (b) gene synteny across the inverted repeat regions. It contributes to optimizing plastid genome assemblies and increasing the reliability of publicly available plastome sequences. The software, example datasets, technical documentation, and a tutorial are available with the package at https://cran.r-project.org/package=PACVr.
Collapse
|
24
|
Plastome phylogeography in two African rain forest legume trees reveals that Dahomey Gap populations originate from the Cameroon volcanic line. Mol Phylogenet Evol 2020; 150:106854. [PMID: 32439485 DOI: 10.1016/j.ympev.2020.106854] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 05/08/2020] [Accepted: 05/13/2020] [Indexed: 11/29/2022]
Abstract
Paleo-environmental data show that the distribution of African rain forests was affected by Quaternary climate changes. In particular, the Dahomey Gap (DG) - a 200 km wide savanna corridor currently separating the West African and Central African rain forest blocks and containing relict rain forest fragments - was forested during the mid-Holocene and possibly during previous interglacial periods, whereas it was dominated by open vegetation (savanna) during glacial periods. Genetic signatures of past population fragmentation and demographic changes have been found in some African forest plant species using nuclear markers, but such events appear not to have been synchronous or shared across species. To better understand the colonization history of the DG by rain forest trees through seed dispersal, the plastid genomes of two widespread African forest legume trees, Anthonotha macrophylla and Distemonanthus benthamianus, were sequenced in 47 individuals for each species, providing unprecedented phylogenetic resolution of their maternal lineages (857 and 115 SNPs, respectively). Both species exhibit distinct lineages separating three regions: 1. Upper Guinea (UG, i.e. the West African forest block), 2. the area ranging from the DG to the Cameroon volcanic line (CVL), and 3. Lower Guinea (LG, the western part of the Central African forest block) where three lineages co-occur. In both species, the DG populations (including southern Nigeria west of Cross River) exhibit much lower genetic diversity than UG and LG populations, and their plastid lineages originate from the CVL, confirming the role of the CVL as an ancient forest refuge. Despite the similar phylogeographic structures displayed by A. macrophylla and D. benthamianus, molecular dating indicates very contrasting ages of lineage divergence (UG diverged from LG since c. 7 Ma and 0.7 Ma, respectively) and DG colonization (probably following the Mid Pleistocene Transition and the Last Glacial Maximum, respectively). The stability of forest refuge areas and repeated similar forest shrinking/expanding events during successive glacial periods might explain why similar phylogeographic patterns can be generated over contrasting timescales.
Collapse
|
25
|
CPGAVAS2, an integrated plastome sequence annotator and analyzer. Nucleic Acids Res 2020; 47:W65-W73. [PMID: 31066451 PMCID: PMC6602467 DOI: 10.1093/nar/gkz345] [Citation(s) in RCA: 547] [Impact Index Per Article: 136.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 04/15/2019] [Accepted: 04/24/2019] [Indexed: 12/21/2022] Open
Abstract
We previously developed a web server CPGAVAS for annotation, visualization and GenBank submission of plastome sequences. Here, we upgrade the server into CPGAVAS2 to address the following challenges: (i) inaccurate annotation in the reference sequence likely causing the propagation of errors; (ii) difficulty in the annotation of small exons of genes petB, petD and rps16 and trans-splicing gene rps12; (iii) lack of annotation for other genome features and their visualization, such as repeat elements; and (iv) lack of modules for diversity analysis of plastomes. In particular, CPGAVAS2 provides two reference datasets for plastome annotation. The first dataset contains 43 plastomes whose annotation have been validated or corrected by RNA-seq data. The second one contains 2544 plastomes curated with sequence alignment. Two new algorithms are also implemented to correctly annotate small exons and trans-splicing genes. Tandem and dispersed repeats are identified, whose results are displayed on a circular map together with the annotated genes. DNA-seq and RNA-seq data can be uploaded for identification of single-nucleotide polymorphism sites and RNA-editing sites. The results of two case studies show that CPGAVAS2 annotates better than several other servers. CPGAVAS2 will likely become an indispensible tool for plastome research and can be accessed from http://www.herbalgenomics.org/cpgavas2.
Collapse
|
26
|
ECuADOR-Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines. PeerJ 2020; 8:e8699. [PMID: 32292644 PMCID: PMC7147433 DOI: 10.7717/peerj.8699] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 02/06/2020] [Indexed: 11/25/2022] Open
Abstract
Background With the rapid increase in availability of genomic resources offered by Next-Generation Sequencing (NGS) and the availability of free online genomic databases, efficient and standardized metadata curation approaches have become increasingly critical for the post-processing stages of biological data. Especially in organelle-based studies using circular chloroplast genome datasets, the assembly of the main structural regions in random order and orientation represents a major limitation in our ability to easily generate “ready-to-align” datasets for phylogenetic reconstruction, at both small and large taxonomic scales. In addition, current practices discard the most variable regions of the genomes to facilitate the alignment of the remaining coding regions. Nevertheless, no software is currently available to perform curation to such a degree, through simple detection, organization and positioning of the main plastome regions, making it a time-consuming and error-prone process. Here we introduce a fast and user friendly software ECuADOR, a Perl script specifically designed to automate the detection and reorganization of newly assembled plastomes obtained from any source available (NGS, sanger sequencing or assembler output). Methods ECuADOR uses a sliding-window approach to detect long repeated sequences in draft sequences, which then identifies the inverted repeat regions (IRs), even in case of artifactual breaks or sequencing errors and automates the rearrangement of the sequence to the widely used LSC–Irb–SSC–IRa order. This facilitates rapid post-editing steps such as creation of genome alignments, detection of variable regions, SNP detection and phylogenomic analyses. Results ECuADOR was successfully tested on plant families throughout the angiosperm phylogeny by curating 161 chloroplast datasets. ECuADOR first identified and reordered the central regions (LSC–Irb–SSC–IRa) for each dataset and then produced a new annotation for the chloroplast sequences. The process took less than 20 min with a maximum memory requirement of 150 MB and an accuracy of over 99%. Conclusions ECuADOR is the sole de novo one-step recognition and re-ordination tool that provides facilitation in the post-processing analysis of the extra nuclear genomes from NGS data. The program is available at https://github.com/BiodivGenomic/ECuADOR/.
Collapse
|
27
|
Complete Chloroplast Genome of Pinus densiflora Siebold & Zucc. and Comparative Analysis with Five Pine Trees. FORESTS 2019. [DOI: 10.3390/f10070600] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Pinus densiflora (Korean red pine) is widely distributed in East Asia and considered one of the most important species in Korea. In this study, the complete chloroplast genome of P. densiflora was sequenced by combining the advantages of Oxford Nanopore MinION and Illumina MiSeq. The sequenced genome was then compared with that of a previously published conifer plastome. The chloroplast genome was found to be circular and comprised of a quadripartite structure, including 113 genes encoding 73 proteins, 36 tRNAs and 4 rRNAs. It had short inverted repeat regions and lacked ndh gene family genes, which is consistent with other Pinaceae species. The gene content of P. densiflora was found to be most similar to that of P. sylvestris. The newly attempted sequencing method could be considered an alternative method for obtaining accurate genetic information, and the chloroplast genome sequence of P. densiflora revealed in this study can be used in the phylogenetic analysis of Pinus species.
Collapse
|
28
|
Tension and Resolution: Dynamic, Evolving Populations of Organelle Genomes within Plant Cells. MOLECULAR PLANT 2019; 12:764-783. [PMID: 30445187 DOI: 10.1016/j.molp.2018.11.002] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 10/25/2018] [Accepted: 11/07/2018] [Indexed: 06/09/2023]
Abstract
Mitochondria and plastids form dynamic, evolving populations physically embedded in the fluctuating environment of the plant cell. Their evolutionary heritage has shaped how the cell controls the genetic structure and the physical behavior of its organelle populations. While the specific genes involved in these processes are gradually being revealed, the governing principles underlying this controlled behavior remain poorly understood. As the genetic and physical dynamics of these organelles are central to bioenergetic performance and plant physiology, this challenges both fundamental biology and strategies to engineer better-performing plants. This article reviews current knowledge of the physical and genetic behavior of mitochondria and chloroplasts in plant cells. An overarching hypothesis is proposed whereby organelles face a tension between genetic robustness and individual control and responsiveness, and different species resolve this tension in different ways. As plants are immobile and thus subject to fluctuating environments, their organelles are proposed to favor individual responsiveness, sacrificing genetic robustness. Several notable features of plant organelles, including large genomes, mtDNA recombination, fragmented organelles, and plastid/mitochondrial differences may potentially be explained by this hypothesis. Finally, the ways that quantitative and systems biology can help shed light on the plethora of open questions in this field are highlighted.
Collapse
|
29
|
icHET: interactive visualization of cytoplasmic heteroplasmy. Bioinformatics 2019; 35:4411-4412. [DOI: 10.1093/bioinformatics/btz300] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2018] [Revised: 02/07/2019] [Accepted: 04/20/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Summary
Although heteroplasmy has been studied extensively in animal systems, there is a lack of tools for analyzing, exploring and visualizing heteroplasmy at the genome-wide level in other taxonomic systems. We introduce icHET, which is a computational workflow that produces an interactive visualization that facilitates the exploration, analysis and discovery of heteroplasmy across multiple genomic samples. icHET works on short reads from multiple samples from any organism with an organellar reference genome (mitochondrial or plastid) and a nuclear reference genome.
Availability and implementation
The software is available at https://github.com/vtphan/HeteroplasmyWorkflow.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
|
30
|
Contrasting patterns of diversification between Amazonian and Atlantic forest clades of Neotropical lianas (Amphilophium, Bignonieae) inferred from plastid genomic data. Mol Phylogenet Evol 2019; 133:92-106. [DOI: 10.1016/j.ympev.2018.12.021] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2018] [Revised: 11/20/2018] [Accepted: 12/16/2018] [Indexed: 01/23/2023]
|
31
|
Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case. BMC Genomics 2018; 19:977. [PMID: 30594129 PMCID: PMC6311037 DOI: 10.1186/s12864-018-5348-8] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 12/03/2018] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Chloroplasts are organelles that conduct photosynthesis in plant and algal cells. The information chloroplast genome contained is widely used in agriculture and studies of evolution and ecology. Correctly assembling chloroplast genomes can be challenging because the chloroplast genome contains a pair of long inverted repeats (10-30 kb). Typically, it is simply assumed that the gross structure of the chloroplast genome matches the most commonly observed structure of two single-copy regions separated by a pair of inverted repeats. The advent of long-read sequencing technologies should remove the need to make this assumption by providing sufficient information to completely span the inverted repeat regions. Yet, long-reads tend to have higher error rates than short-reads, and relatively little is known about the best way to combine long- and short-reads to obtain the most accurate chloroplast genome assemblies. Using Eucalyptus pauciflora, the snow gum, as a test case, we evaluated the effect of multiple parameters, such as different coverage of long-(Oxford nanopore) and short-(Illumina) reads, different long-read lengths, different assembly pipelines, with a view to determining the most accurate and efficient approach to chloroplast genome assembly. RESULTS Hybrid assemblies combining at least 20x coverage of both long-reads and short-reads generated a single contig spanning the entire chloroplast genome with few or no detectable errors. Short-read-only assemblies generated three contigs (the long single copy, short single copy and inverted repeat regions) of the chloroplast genome. These contigs contained few single-base errors but tended to exclude several bases at the beginning or end of each contig. Long-read-only assemblies tended to create multiple contigs with a much higher single-base error rate. The chloroplast genome of Eucalyptus pauciflora is 159,942 bp, contains 131 genes of known function. CONCLUSIONS Our results suggest that very accurate assemblies of chloroplast genomes can be achieved using a combination of at least 20x coverage of long- and short-reads respectively, provided that the long-reads contain at least ~5x coverage of reads longer than the inverted repeat region. We show that further increases in coverage give little or no improvement in accuracy, and that hybrid assemblies are more accurate than long-read-only or short-read-only assemblies.
Collapse
|
32
|
Comparative assessment shows the reliability of chloroplast genome assembly using RNA-seq. Sci Rep 2018; 8:17404. [PMID: 30479362 PMCID: PMC6258696 DOI: 10.1038/s41598-018-35654-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Accepted: 11/09/2018] [Indexed: 11/08/2022] Open
Abstract
Chloroplast genomes (cp genomes) are widely used in comparative genomics, population genetics, and phylogenetic studies. Obtaining chloroplast genomes from RNA-Seq data seems feasible due to the almost full transcription of cpDNA. However, the reliability of chloroplast genomes assembled from RNA-Seq instead of genomic DNA libraries remains to be thoroughly verified. In this study, we assembled chloroplast genomes for three Erysimum (Brassicaceae) species from three RNA-Seq replicas and from one genomic library of each species, using a streamlined bioinformatics protocol. We compared these assembled genomes, confirming that assembled cp genomes from RNA-Seq data were highly similar to each other and to those from genomic libraries in terms of overall structure, size, and composition. Although post-transcriptional modifications, such as RNA-editing, may introduce variations in the RNA-seq data, the assembly of cp genomes from RNA-seq appeared to be reliable. Moreover, RNA-Seq assembly was less sensitive to sources of error such as the recovery of nuclear plastid DNAs (NUPTs). Although some precautions should be taken when producing reference genomes in non-model plants, we conclude that assembling cp genomes from RNA-Seq data is a fast, accurate, and reliable strategy.
Collapse
|
33
|
The Rise and Fall of African Rice Cultivation Revealed by Analysis of 246 New Genomes. Curr Biol 2018; 28:2274-2282.e6. [PMID: 29983312 DOI: 10.1016/j.cub.2018.05.066] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Revised: 04/10/2018] [Accepted: 05/24/2018] [Indexed: 12/23/2022]
Abstract
African rice (Oryza glaberrima) was domesticated independently from Asian rice. The geographical origin of its domestication remains elusive. Using 246 new whole-genome sequences, we inferred the cradle of its domestication to be in the Inner Niger Delta. Domestication was preceded by a sharp decline of most wild populations that started more than 10,000 years ago. The wild population collapse occurred during the drying of the Sahara. This finding supports the hypothesis that depletion of wild resources in the Sahara triggered African rice domestication. African rice cultivation strongly expanded 2,000 years ago. During the last 5 centuries, a sharp decline of its cultivation coincided with the introduction of Asian rice in Africa. A gene, PROG1, associated with an erect plant architecture phenotype, showed convergent selection in two rice cultivated species, Oryza glaberrima from Africa and Oryza sativa from Asia. In contrast, a shattering gene, SH5, showed selection signature during African rice domestication, but not during Asian rice domestication. Overall, our genomic data revealed a complex history of African rice domestication influenced by important climatic changes in the Saharan area, by the expansion of African agricultural society, and by recent replacement by another domesticated species.
Collapse
|
34
|
Analysis of heteroplasmy in bank voles inhabiting the Chernobyl exclusion zone: A commentary on Baker et al. (2017) "Elevated mitochondrial genome variation after 50 generations of radiation exposure in a wild rodent.". Evol Appl 2018; 11:820-826. [PMID: 29875822 PMCID: PMC5978973 DOI: 10.1111/eva.12578] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Accepted: 11/03/2017] [Indexed: 12/19/2022] Open
|
35
|
Evolution in the Amphi-Atlantic tropical genus Guibourtia (Fabaceae, Detarioideae), combining NGS phylogeny and morphology. Mol Phylogenet Evol 2017; 120:83-93. [PMID: 29222064 DOI: 10.1016/j.ympev.2017.11.026] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Revised: 11/22/2017] [Accepted: 11/30/2017] [Indexed: 11/16/2022]
Abstract
Tropical rain forests support a remarkable diversity of tree species, questioning how and when this diversity arose. The genus Guibourtia (Fabaceae, Detarioideae), characterized by two South American and 13 African tree species growing in various tropical biomes, is an interesting model to address the role of biogeographic processes and adaptation to contrasted environments on species diversification. Combining whole plastid genome sequencing and morphological characters analysis, we studied the timing of speciation and diversification processes in Guibourtia through molecular dating and ancestral habitats reconstruction. All species except G. demeusei and G. copallifera appear monophyletic. Dispersal from Africa to America across the Atlantic Ocean is the most plausible hypothesis to explain the occurrence of Neotropical Guibourtia species, which diverged ca. 11.8 Ma from their closest African relatives. The diversification of the three main clades of African Guibourtia is concomitant to Miocene global climate changes, highlighting pre-Quaternary speciation events. These clades differ by their reproductive characters, which validates the three subgenera previously described: Pseudocopaiva, Guibourtia and Gorskia. Within most monophyletic species, plastid lineages start diverging from each other during the Pliocene or early Pleistocene, suggesting that these species already arose during this period. The multiple transitions between rain forests and dry forests/savannahs inferred here through the plastid phylogeny in each Guibourtia subgenus address thus new questions about the role of phylogenetic relationships in shaping ecological niche and morphological similarity among taxa.
Collapse
|
36
|
Diversity of Treegourd (Crescentia cujete) Suggests Introduction and Prehistoric Dispersal Routes into Amazonia. Front Ecol Evol 2017. [DOI: 10.3389/fevo.2017.00150] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
37
|
The complete chloroplast genome of Primulina and two novel strategies for development of high polymorphic loci for population genetic and phylogenetic studies. BMC Evol Biol 2017; 17:224. [PMID: 29115917 PMCID: PMC5678776 DOI: 10.1186/s12862-017-1067-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 10/31/2017] [Indexed: 12/03/2022] Open
Abstract
Background Primulina Hance is an emerging model for studying evolutionary divergence, adaptation and speciation of the karst flora. However, phylogenetic relationships within the genus have not been resolved due to low variation detected in the cpDNA regions. Chloroplast genomes can provide important information for phylogenetic and population genetic studies. Recent advances in next-generation sequencing (NGS) techniques greatly facilitate sequencing whole chloroplast genomes for multiple individuals. Consequently, novel strategies for development of highly polymorphic loci for population genetic and phylogenetic studies based on NGS data are needed. Methods For development of high polymorphic loci for population genetic and phylogenetic studies, two novel strategies are proposed here. The first protocol develops lineage-specific highly variable markers from the true high variation regions (Con_Seas) across whole cp genomes, instead of traditional noncoding regions. The pipeline has been integrated into a single perl script, and named "Con_Sea_Identification_and_PIC_Calculation". The second method assembles chloroplast fragments (poTs) and sub-super-marker (CpContigs) through our "SACRing" pipeline. This approach can fundamentally alter the strategies used in phylogenetic and population genetic studies based on cp markers, facilitating a transition from traditional Sanger sequencing to RAD-Seq. Both of these scripts are available at https://github.com/scbgfengchao/. Results Three complete Primulina chloroplast genomes were assembled from genome survey data, and then two novel strategies were developed to yield highly polymorphic markers. For experimental evaluation of the first protocol, a set of Primulina species were used for PCR amplification. The results showed that these newly developed markers are more variable than traditional ones, and seem to be a better choice for phylogenetic and population studies in Primulina. The second method was also successfully applied in population genetic studies of 21 individuals from three natural populations of Primulina. Conclusions These two novel strategies may provide a pathway for similar research in other non-model species. The newly developed high polymorphic loci in this study will promote further the phylogenetic and population genetic studies in Primulina and other genera of the family Gesneriaceae. Electronic supplementary material The online version of this article (10.1186/s12862-017-1067-z) contains supplementary material, which is available to authorized users.
Collapse
|
38
|
Molecular basis of African yam domestication: analyses of selection point to root development, starch biosynthesis, and photosynthesis related genes. BMC Genomics 2017; 18:782. [PMID: 29025393 PMCID: PMC5639766 DOI: 10.1186/s12864-017-4143-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 10/02/2017] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND After cereals, root and tuber crops are the main source of starch in the human diet. Starch biosynthesis was certainly a significant target for selection during the domestication of these crops. But domestication of these root and tubers crops is also associated with gigantism of storage organs and changes of habitat. RESULTS We studied here, the molecular basis of domestication in African yam, Dioscorea rotundata. The genomic diversity in the cultivated species is roughly 30% less important than its wild relatives. Two percent of all the genes studied showed evidences of selection. Two genes associated with the earliest stages of starch biosynthesis and storage, the sucrose synthase 4 and the sucrose-phosphate synthase 1 showed evidence of selection. An adventitious root development gene, a SCARECROW-LIKE gene was also selected during yam domestication. Significant selection for genes associated with photosynthesis and phototropism were associated with wild to cultivated change of habitat. If the wild species grow as vines in the shade of their tree tutors, cultivated yam grows in full light in open fields. CONCLUSIONS Major rewiring of aerial development and adaptation for efficient photosynthesis in full light characterized yam domestication.
Collapse
|
39
|
Human management and hybridization shape treegourd fruits in the Brazilian Amazon Basin. Evol Appl 2017; 10:577-589. [PMID: 28616065 PMCID: PMC5469164 DOI: 10.1111/eva.12474] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 03/01/2017] [Indexed: 11/30/2022] Open
Abstract
Local people's perceptions of cultivated and wild agrobiodiversity, as well as their management of hybridization are still understudied in Amazonia. Here we analyze domesticated treegourd (Crescentia cujete), whose versatile fruits have technological, symbolic, and medicinal uses. A wild relative (C. amazonica) of the cultivated species grows spontaneously in Amazonian flooded forests. We demonstrated, using whole chloroplast sequences and nuclear microsatellites, that the two species are strongly differentiated. Nonetheless, they hybridize readily throughout Amazonia and the proportions of admixture correlate with fruit size variation of cultivated trees. New morphotypes arise from hybridization, which are recognized by people and named as local varieties. Small hybrid fruits are used to make the important symbolic rattle (maracá), suggesting that management of hybrid trees is an ancient human practice in Amazonia. Effective conservation of Amazonian agrobiodiversity needs to incorporate this interaction between wild and cultivated populations that is managed by smallholder families. Beyond treegourd, our study clearly shows that hybridization plays an important role in tree crop phenotypic diversification and that the integration of molecular analyses and farmers’ perceptions of diversity help disentangle crop domestication history.
Collapse
|
40
|
Correlated evolutionary rates across genomic compartments in Annonaceae. Mol Phylogenet Evol 2017; 114:63-72. [PMID: 28578201 DOI: 10.1016/j.ympev.2017.05.026] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2017] [Revised: 05/29/2017] [Accepted: 05/29/2017] [Indexed: 11/28/2022]
Abstract
The molecular clock hypothesis is an important concept in biology. Deviations from a constant rate of nucleotide substitution have been found widely among lineages, genomes, genes and individual sites. Phylogenetic research can accommodate for these differences in applying specific models of evolution. Lineage-specific rate heterogeneity however can generate bi- or multimodal distributions of substitution rates across the branches of a tree and this may mislead phylogenetic inferences with currently available models. The plant family Annonaceae is an excellent case to study lineage-specific rate heterogeneity. The two major sister subfamilies, Annonoideae and Malmeoideae, have shown great discrepancies in branch lengths. We used high-throughput sequencing data of 72 genes, 99 spacers and 16 introns from 24 chloroplast genomes and nuclear ribosomal DNA of 23 species to study the molecular rate of evolution in Annonaceae. In all analyses, longer branch lengths and/or higher substitution rates were found for the Annonoideae compared to the Malmeoideae. The Annonaceae had wide variability in chloroplast length, ranging from minimal 175,684bp to 201,723 for Annonoideae and minimal 152,357 to 170,985bp in Malmeoideae, mostly reflecting variation in inverted-repeat length. The Annonoideae showed a higher GC-content in the conserved parts of the chloroplast genome and higher omega (dN/dS)-ratios than the Malmeoideae, which could indicate less stringent purifying selection, a pattern that has been found in groups with small population sizes. This study generates new insights into the processes causing lineage-specific rate heterogeneity, which could lead to improved phylogenetic methods.
Collapse
|
41
|
Chloroplast sequence of treegourd ( Crescentia cujete, Bignoniaceae) to study phylogeography and domestication. APPLICATIONS IN PLANT SCIENCES 2016; 4:apps1600048. [PMID: 27785381 PMCID: PMC5077280 DOI: 10.3732/apps.1600048] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 08/30/2016] [Indexed: 05/07/2023]
Abstract
PREMISE OF THE STUDY Crescentia cujete (Bignoniaceae) fruit rinds are traditionally used for storage vessels and handicrafts. We assembled its chloroplast genome and identified single-nucleotide polymorphisms (SNPs). METHODS AND RESULTS Using a genome skimming approach, the whole chloroplast of C. cujete was assembled using 3,106,928 sequence reads of 150 bp. The chloroplast is 154,662 bp in length, structurally divided into a large single copy region (84,788 bp), a small single copy region (18,299 bp), and two inverted repeat regions (51,575 bp) with 88 genes annotated. By resequencing the whole chloroplast, we identified 66 SNPs in C. cujete (N = 30) and 68 SNPs in C. amazonica (N = 6). Nucleotide diversity was estimated at 1.1 × 10-3 and 3.5 × 10-3 for C. cujete and C. amazonica, respectively. CONCLUSIONS This broadened C. cujete genetic toolkit will be important to study the origin, domestication, diversity, and phylogeography of treegourds in the Neotropics.
Collapse
|
42
|
Phylogeography of the genus Podococcus (Palmae/Arecaceae) in Central African rain forests: Climate stability predicts unique genetic diversity. Mol Phylogenet Evol 2016; 105:126-138. [PMID: 27521478 DOI: 10.1016/j.ympev.2016.08.005] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2015] [Revised: 08/08/2016] [Accepted: 08/09/2016] [Indexed: 11/16/2022]
Abstract
The tropical rain forests of Central Africa contain high levels of species diversity. Paleovegetation or biodiversity patterns suggested successive contraction/expansion phases on this rain forest cover during the last glacial maximum (LGM). Consequently, the hypothesis of the existence of refugia e.g. habitat stability that harbored populations during adverse climatic periods has been proposed. Understory species are tightly associated to forest cover and consequently are ideal markers of forest dynamics. Here, we used two central African rain forest understory species of the palm genus, Podococcus, to assess the role of past climate variation on their distribution and genetic diversity. Species distribution modeling in the present and at the LGM was used to estimate areas of climatic stability. Genetic diversity and phylogeography were estimated by sequencing near complete plastomes for over 120 individuals. Areas of climatic stability were mainly located in mountainous areas like the Monts de Cristal and Monts Doudou in Gabon, but also lowland coastal forests in southeast Cameroon and northeast Gabon. Genetic diversity analyses shows a clear North-South structure of genetic diversity within one species. This divide was estimated to have originated some 500,000years ago. We show that, in Central Africa, high and unique genetic diversity is strongly correlated with inferred areas of climatic stability since the LGM. Our results further highlight the importance of coastal lowland rain forests in Central Africa as harboring not only high species diversity but also important high levels of unique genetic diversity. In the context of strong human pressure on coastal land use and destruction, such unique diversity hotspots need to be considered in future conservation planning.
Collapse
|