1
|
Zhou H, Su X, Song B. ACMGA: a reference-free multiple-genome alignment pipeline for plant species. BMC Genomics 2024; 25:515. [PMID: 38796435 PMCID: PMC11127342 DOI: 10.1186/s12864-024-10430-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 05/20/2024] [Indexed: 05/28/2024] Open
Abstract
BACKGROUND The short-read whole-genome sequencing (WGS) approach has been widely applied to investigate the genomic variation in the natural populations of many plant species. With the rapid advancements in long-read sequencing and genome assembly technologies, high-quality genome sequences are available for a group of varieties for many plant species. These genome sequences are expected to help researchers comprehensively investigate any type of genomic variants that are missed by the WGS technology. However, multiple genome alignment (MGA) tools designed by the human genome research community might be unsuitable for plant genomes. RESULTS To fill this gap, we developed the AnchorWave-Cactus Multiple Genome Alignment (ACMGA) pipeline, which improved the alignment of repeat elements and could identify long (> 50 bp) deletions or insertions (INDELs). We conducted MGA using ACMGA and Cactus for 8 Arabidopsis (Arabidopsis thaliana) and 26 Maize (Zea mays) de novo assembled genome sequences and compared them with the previously published short-read variant calling results. MGA identified more single nucleotide variants (SNVs) and long INDELs than did previously published WGS variant callings. Additionally, ACMGA detected significantly more SNVs and long INDELs in repetitive regions and the whole genome than did Cactus. Compared with the results of Cactus, the results of ACMGA were more similar to the previously published variants called using short-read. These two MGA pipelines identified numerous multi-allelic variants that were missed by the WGS variant calling pipeline. CONCLUSIONS Aligning de novo assembled genome sequences could identify more SNVs and INDELs than mapping short-read. ACMGA combines the advantages of AnchorWave and Cactus and offers a practical solution for plant MGA by integrating global alignment, a 2-piece-affine-gap cost strategy, and the progressive MGA algorithm.
Collapse
Affiliation(s)
- Huafeng Zhou
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, 266071, China
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong, 261325, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, 266071, China.
| | - Baoxing Song
- National Key Laboratory of Wheat Improvement, Peking University Institute of Advanced Agricultural Sciences, Shandong Laboratory of Advanced Agriculture Sciences in Weifang, Weifang, Shandong, 261325, China.
- Key Laboratory of Maize Biology and Genetic Breeding in Arid Area of Northwest Region of the Ministry of Agriculture, College of Agronomy, Northwest A&F University, Yangling, Shaanxi, 712100, China.
| |
Collapse
|
2
|
Lian Q, Huettel B, Walkemeier B, Mayjonade B, Lopez-Roques C, Gil L, Roux F, Schneeberger K, Mercier R. A pan-genome of 69 Arabidopsis thaliana accessions reveals a conserved genome structure throughout the global species range. Nat Genet 2024; 56:982-991. [PMID: 38605175 PMCID: PMC11096106 DOI: 10.1038/s41588-024-01715-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 03/11/2024] [Indexed: 04/13/2024]
Abstract
Although originally primarily a system for functional biology, Arabidopsis thaliana has, owing to its broad geographical distribution and adaptation to diverse environments, developed into a powerful model in population genomics. Here we present chromosome-level genome assemblies of 69 accessions from a global species range. We found that genomic colinearity is very conserved, even among geographically and genetically distant accessions. Along chromosome arms, megabase-scale rearrangements are rare and typically present only in a single accession. This indicates that the karyotype is quasi-fixed and that rearrangements in chromosome arms are counter-selected. Centromeric regions display higher structural dynamics, and divergences in core centromeres account for most of the genome size variations. Pan-genome analyses uncovered 32,986 distinct gene families, 60% being present in all accessions and 40% appearing to be dispensable, including 18% private to a single accession, indicating unexplored genic diversity. These 69 new Arabidopsis thaliana genome assemblies will empower future genetic research.
Collapse
Affiliation(s)
- Qichao Lian
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Bruno Huettel
- Max Planck-Genome-centre Cologne, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Birgit Walkemeier
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Baptiste Mayjonade
- Laboratoire des Interactions Plantes-Microbes-Environnement, Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement, CNRS, Université de Toulouse, Castanet-Tolosan, France
| | | | - Lisa Gil
- INRAE, GeT-PlaGe, Genotoul, Castanet-Tolosan, France
| | - Fabrice Roux
- Laboratoire des Interactions Plantes-Microbes-Environnement, Institut National de Recherche pour l'Agriculture, l'Alimentation et l'Environnement, CNRS, Université de Toulouse, Castanet-Tolosan, France
| | - Korbinian Schneeberger
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany.
- Faculty of Biology, Ludwig-Maximilians-University Munich, Planegg-Martinsried, Germany.
- Cluster of Excellence on Plant Sciences, Heinrich-Heine University, Düsseldorf, Germany.
| | - Raphael Mercier
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne, Germany.
- Cluster of Excellence on Plant Sciences, Heinrich-Heine University, Düsseldorf, Germany.
| |
Collapse
|
3
|
Thoben C, Pucker B. Automatic annotation of the bHLH gene family in plants. BMC Genomics 2023; 24:780. [PMID: 38102570 PMCID: PMC10722790 DOI: 10.1186/s12864-023-09877-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 12/06/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND The bHLH transcription factor family is named after the basic helix-loop-helix (bHLH) domain that is a characteristic element of their members. Understanding the function and characteristics of this family is important for the examination of a wide range of functions. As the availability of genome sequences and transcriptome assemblies has increased significantly, the need for automated solutions that provide reliable functional annotations is emphasised. RESULTS A phylogenetic approach was adapted for the automatic identification and functional annotation of the bHLH transcription factor family. The bHLH_annotator, designed for the automated functional annotation of bHLHs, was implemented in Python3. Sequences of bHLHs described in literature were collected to represent the full diversity of bHLH sequences. Previously described orthologs form the basis for the functional annotation assignment to candidates which are also screened for bHLH-specific motifs. The pipeline was successfully deployed on the two Arabidopsis thaliana accessions Col-0 and Nd-1, the monocot species Dioscorea dumetorum, and a transcriptome assembly of Croton tiglium. Depending on the applied search parameters for the initial candidates in the pipeline, species-specific candidates or members of the bHLH family which experienced domain loss can be identified. CONCLUSIONS The bHLH_annotator allows a detailed and systematic investigation of the bHLH family in land plant species and classifies candidates based on bHLH-specific characteristics, which distinguishes the pipeline from other established functional annotation tools. This provides the basis for the functional annotation of the bHLH family in land plants and the systematic examination of a wide range of functions regulated by this transcription factor family.
Collapse
Affiliation(s)
- Corinna Thoben
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & Braunschweig Integrated, Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Boas Pucker
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & Braunschweig Integrated, Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany.
| |
Collapse
|
4
|
Wolff K, Friedhoff R, Schwarzer F, Pucker B. Data literacy in genome research. J Integr Bioinform 2023; 20:jib-2023-0033. [PMID: 38047760 PMCID: PMC10777367 DOI: 10.1515/jib-2023-0033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Accepted: 11/15/2023] [Indexed: 12/05/2023] Open
Abstract
With an ever increasing amount of research data available, it becomes constantly more important to possess data literacy skills to benefit from this valuable resource. An integrative course was developed to teach students the fundamentals of data literacy through an engaging genome sequencing project. Each cohort of students performed planning of the experiment, DNA extraction, nanopore sequencing, genome sequence assembly, prediction of genes in the assembled sequence, and assignment of functional annotation terms to predicted genes. Students learned how to communicate science through writing a protocol in the form of a scientific paper, providing comments during a peer-review process, and presenting their findings as part of an international symposium. Many students enjoyed the opportunity to own a project and to work towards a meaningful objective.
Collapse
Affiliation(s)
- Katharina Wolff
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| | - Ronja Friedhoff
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| | - Friderieke Schwarzer
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| | - Boas Pucker
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| |
Collapse
|
5
|
Rempel A, Choudhary N, Pucker B. KIPEs3: Automatic annotation of biosynthesis pathways. PLoS One 2023; 18:e0294342. [PMID: 37972102 PMCID: PMC10653506 DOI: 10.1371/journal.pone.0294342] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Flavonoids and carotenoids are pigments involved in stress mitigation and numerous other processes. Both pigment classes can contribute to flower and fruit coloration. Flavonoid aglycones and carotenoids are produced by a pathway that is largely conserved across land plants. Glycosylations, acylations, and methylations of the flavonoid aglycones can be species-specific and lead to a plethora of biochemically diverse flavonoids. We previously developed KIPEs for the automatic annotation of biosynthesis pathways and presented an application on the flavonoid aglycone biosynthesis. KIPEs3 is an improved version with additional features and the potential to identify not just the core biosynthesis players, but also candidates involved in the decoration steps and in the transport of flavonoids. Functionality of KIPEs3 is demonstrated through the analysis of the flavonoid biosynthesis in Arabidopsis thaliana Nd-1, Capsella grandiflora, and Dioscorea dumetorum. We demonstrate the applicability of KIPEs to other pathways by adding the carotenoid biosynthesis to the repertoire. As a technical proof of concept, the carotenoid biosynthesis was analyzed in the same species and Daucus carota. KIPEs3 is available as an online service to enable access without prior bioinformatics experience. KIPEs3 facilitates the automatic annotation and analysis of biosynthesis pathways with a consistent and high quality in a large number of plant species. Numerous genome sequencing projects are generating a huge amount of data sets that can be analyzed to identify evolutionary patterns and promising candidate genes for biotechnological and breeding applications.
Collapse
Affiliation(s)
- Andreas Rempel
- Genome Informatics, Faculty of Technology & Center for Biotechnology, Bielefeld University, Bielefeld, Germany
- Graduate School “Digital Infrastructure for the Life Sciences” (DILS), Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, Bielefeld, Germany
| | - Nancy Choudhary
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| | - Boas Pucker
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & BRICS, TU Braunschweig, Braunschweig, Germany
| |
Collapse
|
6
|
Buyukyoruk M, Henriques WS, Wiedenheft B. Clarifying CRISPR: Why Repeats Identified in the Human Genome Should Not Be Considered CRISPRs. CRISPR J 2023; 6:216-221. [PMID: 37042651 PMCID: PMC10277986 DOI: 10.1089/crispr.2022.0106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 02/24/2023] [Indexed: 04/13/2023] Open
Abstract
Clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated genes (cas) are essential components of adaptive immune systems that protect bacteria and archaea from viral infection. CRISPR-Cas systems are found in about 40% of bacterial and 85% of archaeal genomes, but not in eukaryotic genomes. Recently, an article published in Communications Biology reported the identification of 12,572 putative CRISPRs in the human genome, which they call "hCRISPR." In this study, we attempt to reproduce this analysis and show that repetitive elements identified as putative CRISPR loci in the human genome contain neither the repeat-spacer-repeat architecture nor the cas genes characteristic of functional CRISPR systems.
Collapse
Affiliation(s)
- Murat Buyukyoruk
- Department of Microbiology and Cell Biology, Montana State University, Bozeman, Montana, USA
| | - William S. Henriques
- Department of Microbiology and Cell Biology, Montana State University, Bozeman, Montana, USA
| | - Blake Wiedenheft
- Department of Microbiology and Cell Biology, Montana State University, Bozeman, Montana, USA
| |
Collapse
|
7
|
GALA: a computational framework for de novo chromosome-by-chromosome assembly with long reads. Nat Commun 2023; 14:204. [PMID: 36639368 PMCID: PMC9839709 DOI: 10.1038/s41467-022-35670-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 12/16/2022] [Indexed: 01/15/2023] Open
Abstract
High-quality genome assembly has wide applications in genetics and medical studies. However, it is still very challenging to achieve gap-free chromosome-scale assemblies using current workflows for long-read platforms. Here we report on GALA (Gap-free long-read Assembly tool), a computational framework for chromosome-based sequencing data separation and de novo assembly implemented through a multi-layer graph that identifies discordances within preliminary assemblies and partitions the data into chromosome-scale scaffolding groups. The subsequent independent assembly of each scaffolding group generates a gap-free assembly likely free from the mis-assembly errors which usually hamper existing workflows. This flexible framework also allows us to integrate data from various technologies, such as Hi-C, genetic maps, and even motif analyses to generate gap-free chromosome-scale assemblies. As a proof of principle we de novo assemble the C. elegans genome using combined PacBio and Nanopore sequencing data and a rice cultivar genome using Nanopore sequencing data from publicly available datasets. We also demonstrate the proposed method's applicability with a gap-free assembly of the human genome using PacBio high-fidelity (HiFi) long reads. Thus, our method enables straightforward assembly of genomes with multiple data sources and overcomes barriers that at present restrict the application of de novo genome assembly technology.
Collapse
|
8
|
Redkar A, Cevik V, Bailey K, Zhao H, Kim DS, Zou Z, Furzer OJ, Fairhead S, Borhan MH, Holub EB, Jones JDG. The Arabidopsis WRR4A and WRR4B paralogous NLR proteins both confer recognition of multiple Albugo candida effectors. THE NEW PHYTOLOGIST 2023; 237:532-547. [PMID: 35838065 PMCID: PMC10087428 DOI: 10.1111/nph.18378] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Accepted: 07/05/2022] [Indexed: 05/26/2023]
Abstract
The oomycete Albugo candida causes white blister rust, an important disease of Brassica crops. Distinct races of A. candida are defined by their capacity to infect different host plant species. Each A. candida race encodes secreted proteins with a CX2 CX5 G ('CCG') motif that are polymorphic and show presence/absence variation, and are therefore candidate effectors. The White Rust Resistance 4 (WRR4) locus in Arabidopsis thaliana accession Col-0 contains three genes that encode intracellular nucleotide-binding domain leucine-rich repeat immune receptors. The Col-0 alleles of WRR4A and WRR4B confer resistance to multiple A. candida races, although both WRR4A and WRR4B can be overcome by the Col-0-virulent race 4 isolate AcEx1. Comparison of CCG candidate effectors in avirulent and virulent races, and transient co-expression of CCG effectors from four A. candida races in Nicotiana sp. or A. thaliana, revealed CCG effectors that trigger WRR4A- or WRR4B-dependent hypersensitive responses. We found eight WRR4A-recognised CCGs and four WRR4B-recognised CCGs, the first recognised proteins from A. candida for which the cognate immune receptors in A. thaliana are known. This multiple recognition capacity potentially explains the broad-spectrum resistance to several A. candida races conferred by WRR4 paralogues. We further show that of five tested CCGs, three confer enhanced disease susceptibility when expressed in planta, consistent with A. candida CCG proteins being effectors.
Collapse
Affiliation(s)
- Amey Redkar
- The Sainsbury LaboratoryUniversity of East AngliaNorwichNR4 7UHUK
- Department of BotanySavitribai Phule Pune UniversityGaneshkhindPune411007India
| | - Volkan Cevik
- The Sainsbury LaboratoryUniversity of East AngliaNorwichNR4 7UHUK
- The Milner Centre for Evolution, Department of Biology and BiochemistryUniversity of BathBathBA2 7AYUK
| | - Kate Bailey
- The Sainsbury LaboratoryUniversity of East AngliaNorwichNR4 7UHUK
| | - He Zhao
- The Sainsbury LaboratoryUniversity of East AngliaNorwichNR4 7UHUK
| | - Dae Sung Kim
- The Sainsbury LaboratoryUniversity of East AngliaNorwichNR4 7UHUK
- Present address:
State Key Laboratory of Biocatalysis and Enzyme EngineeringHubei UniversityWuhan430062China
| | - Zhou Zou
- The Milner Centre for Evolution, Department of Biology and BiochemistryUniversity of BathBathBA2 7AYUK
| | - Oliver J. Furzer
- The Sainsbury LaboratoryUniversity of East AngliaNorwichNR4 7UHUK
- Department of BiologyUniversity of North CarolinaChapel HillNC27599USA
| | - Sebastian Fairhead
- The Sainsbury LaboratoryUniversity of East AngliaNorwichNR4 7UHUK
- School of Life SciencesWarwick Crop Centre, University of WarwickWellesbourneCV35 9EFUK
| | - M. Hossein Borhan
- Agriculture and Agri‐Food Canada107 Science PlaceSaskatoonSKS7N 0X2Canada
| | - Eric B. Holub
- School of Life SciencesWarwick Crop Centre, University of WarwickWellesbourneCV35 9EFUK
| | | |
Collapse
|
9
|
Rabanal FA, Gräff M, Lanz C, Fritschi K, Llaca V, Lang M, Carbonell-Bejerano P, Henderson I, Weigel D. Pushing the limits of HiFi assemblies reveals centromere diversity between two Arabidopsis thaliana genomes. Nucleic Acids Res 2022; 50:12309-12327. [PMID: 36453992 PMCID: PMC9757041 DOI: 10.1093/nar/gkac1115] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 09/13/2022] [Accepted: 11/10/2022] [Indexed: 12/05/2022] Open
Abstract
Although long-read sequencing can often enable chromosome-level reconstruction of genomes, it is still unclear how one can routinely obtain gapless assemblies. In the model plant Arabidopsis thaliana, other than the reference accession Col-0, all other accessions de novo assembled with long-reads until now have used PacBio continuous long reads (CLR). Although these assemblies sometimes achieved chromosome-arm level contigs, they inevitably broke near the centromeres, excluding megabases of DNA from analysis in pan-genome projects. Since PacBio high-fidelity (HiFi) reads circumvent the high error rate of CLR technologies, albeit at the expense of read length, we compared a CLR assembly of accession Eyach15-2 to HiFi assemblies of the same sample. The use of five different assemblers starting from subsampled data allowed us to evaluate the impact of coverage and read length. We found that centromeres and rDNA clusters are responsible for 71% of contig breaks in the CLR scaffolds, while relatively short stretches of GA/TC repeats are at the core of >85% of the unfilled gaps in our best HiFi assemblies. Since the HiFi technology consistently enabled us to reconstruct gapless centromeres and 5S rDNA clusters, we demonstrate the value of the approach by comparing these previously inaccessible regions of the genome between the Eyach15-2 accession and the reference accession Col-0.
Collapse
Affiliation(s)
| | | | - Christa Lanz
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Katrin Fritschi
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Victor Llaca
- Genomics Technologies, Corteva Agriscience, Johnston, IA 50131, USA
| | - Michelle Lang
- Genomics Technologies, Corteva Agriscience, Johnston, IA 50131, USA
| | - Pablo Carbonell-Bejerano
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Ian Henderson
- Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK
| | - Detlef Weigel
- Correspondence may also be addressed to Detlef Weigel. Tel: +49 7071 601 1410;
| |
Collapse
|
10
|
Canaguier A, Guilbaud R, Denis E, Magdelenat G, Belser C, Istace B, Cruaud C, Wincker P, Le Paslier MC, Faivre-Rampant P, Barbe V. Oxford Nanopore and Bionano Genomics technologies evaluation for plant structural variation detection. BMC Genomics 2022; 23:317. [PMID: 35448948 PMCID: PMC9026655 DOI: 10.1186/s12864-022-08499-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 03/17/2022] [Indexed: 11/10/2022] Open
Abstract
Background Structural Variations (SVs) are genomic rearrangements derived from duplication, deletion, insertion, inversion, and translocation events. In the past, SVs detection was limited to cytological approaches, then to Next-Generation Sequencing (NGS) short reads and partitioned assemblies. Nowadays, technologies such as DNA long read sequencing and optical mapping have revolutionized the understanding of SVs in genomes, due to the enhancement of the power of SVs detection. This study aims to investigate performance of two techniques, 1) long-read sequencing obtained with the MinION device (Oxford Nanopore Technologies) and 2) optical mapping obtained with Saphyr device (Bionano Genomics) to detect and characterize SVs in the genomes of the two ecotypes of Arabidopsis thaliana, Columbia-0 (Col-0) and Landsberg erecta 1 (Ler-1). Results We described the SVs detected from the alignment of the best ONT assembly and DLE-1 optical maps of A. thaliana Ler-1 against the public reference genome Col-0 TAIR10.1. After filtering (SV > 1 kb), 1184 and 591 Ler-1 SVs were retained from ONT and Bionano technologies respectively. A total of 948 Ler-1 ONT SVs (80.1%) corresponded to 563 Bionano SVs (95.3%) leading to 563 common locations. The specific locations were scrutinized to assess improvement in SV detection by either technology. The ONT SVs were mostly detected near TE and gene features, and resistance genes seemed particularly impacted. Conclusions Structural variations linked to ONT sequencing error were removed and false positives limited, with high quality Bionano SVs being conserved. When compared with the Col-0 TAIR10.1 reference genome, most of the detected SVs discovered by both technologies were found in the same locations. ONT assembly sequence leads to more specific SVs than Bionano one, the latter being more efficient to characterize large SVs. Even if both technologies are complementary approaches, ONT data appears to be more adapted to large scale populations studies, while Bionano performs better in improving assembly and describing specificity of a genome compared to a reference. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08499-4.
Collapse
Affiliation(s)
- Aurélie Canaguier
- Université Paris-Saclay, INRAE, Etude du Polymorphisme des Génomes Végétaux EPGV, 91000, Evry-Courcouronnes, France
| | - Romane Guilbaud
- Université Paris-Saclay, INRAE, Etude du Polymorphisme des Génomes Végétaux EPGV, 91000, Evry-Courcouronnes, France
| | - Erwan Denis
- Genoscope, Institut de biologie François-Jacob, Commissariat à l'Energie Atomique CEA, Université Paris-Saclay, Evry, France
| | - Ghislaine Magdelenat
- Genoscope, Institut de biologie François-Jacob, Commissariat à l'Energie Atomique CEA, Université Paris-Saclay, Evry, France
| | - Caroline Belser
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Benjamin Istace
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Corinne Cruaud
- Genoscope, Institut de biologie François-Jacob, Commissariat à l'Energie Atomique CEA, Université Paris-Saclay, Evry, France
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Marie-Christine Le Paslier
- Université Paris-Saclay, INRAE, Etude du Polymorphisme des Génomes Végétaux EPGV, 91000, Evry-Courcouronnes, France
| | - Patricia Faivre-Rampant
- Université Paris-Saclay, INRAE, Etude du Polymorphisme des Génomes Végétaux EPGV, 91000, Evry-Courcouronnes, France.
| | - Valérie Barbe
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| |
Collapse
|
11
|
Fields PD, Waneka G, Naish M, Schatz MC, Henderson IR, Sloan DB. Complete sequence of a 641-kb insertion of mitochondrial DNA in the Arabidopsis thaliana nuclear genome. Genome Biol Evol 2022; 14:6572048. [PMID: 35446419 PMCID: PMC9071559 DOI: 10.1093/gbe/evac059] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/20/2022] [Indexed: 11/14/2022] Open
Abstract
Intracellular transfers of mitochondrial DNA continue to shape nuclear genomes. Chromosome 2 of the model plant Arabidopsis thaliana contains one of the largest known nuclear insertions of mitochondrial DNA (numts). Estimated at over 600 kb in size, this numt is larger than the entire Arabidopsis mitochondrial genome. The primary Arabidopsis nuclear reference genome contains less than half of the numt because of its structural complexity and repetitiveness. Recent data sets generated with improved long-read sequencing technologies (PacBio HiFi) provide an opportunity to finally determine the accurate sequence and structure of this numt. We performed a de novo assembly using sequencing data from recent initiatives to span the Arabidopsis centromeres, producing a gap-free sequence of the Chromosome 2 numt, which is 641 kb in length and has 99.933% nucleotide sequence identity with the actual mitochondrial genome. The numt assembly is consistent with the repetitive structure previously predicted from fiber-based fluorescent in situ hybridization. Nanopore sequencing data indicate that the numt has high levels of cytosine methylation, helping to explain its biased spectrum of nucleotide sequence divergence and supporting previous inferences that it is transcriptionally inactive. The original numt insertion appears to have involved multiple mitochondrial DNA copies with alternative structures that subsequently underwent an additional duplication event within the nuclear genome. This work provides insights into numt evolution, addresses one of the last unresolved regions of the Arabidopsis reference genome, and represents a resource for distinguishing between highly similar numt and mitochondrial sequences in studies of transcription, epigenetic modifications, and de novo mutations.
Collapse
Affiliation(s)
- Peter D Fields
- Department of Biology, Colorado State University, Fort Collins, CO, USA.,Department of Environmental Sciences, Zoology, University of Basel, Basel, Switzerland
| | - Gus Waneka
- Department of Biology, Colorado State University, Fort Collins, CO, USA
| | - Matthew Naish
- Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Ian R Henderson
- Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Daniel B Sloan
- Department of Biology, Colorado State University, Fort Collins, CO, USA
| |
Collapse
|
12
|
Automatic identification and annotation of MYB gene family members in plants. BMC Genomics 2022; 23:220. [PMID: 35305581 PMCID: PMC8933966 DOI: 10.1186/s12864-022-08452-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 03/07/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND MYBs are among the largest transcription factor families in plants. Consequently, members of this family are involved in a plethora of processes including development and specialized metabolism. The MYB families of many plant species were investigated in the last two decades since the first investigation looked at Arabidopsis thaliana. This body of knowledge and characterized sequences provide the basis for the identification, classification, and functional annotation of candidate sequences in new genome and transcriptome assemblies. RESULTS A pipeline for the automatic identification and functional annotation of MYBs in a given sequence data set was implemented in Python. MYB candidates are identified, screened for the presence of a MYB domain and other motifs, and finally placed in a phylogenetic context with well characterized sequences. In addition to technical benchmarking based on existing annotation, the transcriptome assembly of Croton tiglium and the annotated genome sequence of Castanea crenata were screened for MYBs. Results of both analyses are presented in this study to illustrate the potential of this application. The analysis of one species takes only a few minutes depending on the number of predicted sequences and the size of the MYB gene family. This pipeline, the required bait sequences, and reference sequences for a classification are freely available on github: https://github.com/bpucker/MYB_annotator . CONCLUSIONS This automatic annotation of the MYB gene family in novel assemblies makes genome-wide investigations consistent and paves the way for comparative studies in the future. Candidate genes for in-depth analyses are presented based on their orthology to previously characterized sequences which allows the functional annotation of the newly identified MYBs with high confidence. The identification of orthologs can also be harnessed to detect duplication and deletion events.
Collapse
|
13
|
Pucker B, Irisarri I, de Vries J, Xu B. Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions. QUANTITATIVE PLANT BIOLOGY 2022; 3:e5. [PMID: 37077982 PMCID: PMC10095996 DOI: 10.1017/qpb.2021.18] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/24/2021] [Accepted: 12/21/2021] [Indexed: 05/03/2023]
Abstract
Third-generation long-read sequencing is transforming plant genomics. Oxford Nanopore Technologies and Pacific Biosciences are offering competing long-read sequencing technologies and enable plant scientists to investigate even large and complex plant genomes. Sequencing projects can be conducted by single research groups and sequences of smaller plant genomes can be completed within days. This also resulted in an increased investigation of genomes from multiple species in large scale to address fundamental questions associated with the origin and evolution of land plants. Increased accessibility of sequencing devices and user-friendly software allows more researchers to get involved in genomics. Current challenges are accurately resolving diploid or polyploid genome sequences and better accounting for the intra-specific diversity by switching from the use of single reference genome sequences to a pangenome graph.
Collapse
Affiliation(s)
- Boas Pucker
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
- Institute of Plant Biology & Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
- Author for correspondence: Boas Pucker E-mail:
| | - Iker Irisarri
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
| | - Jan de Vries
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
- Department of Applied Bioinformatics, Göttingen Center for Molecular Biosciences (GZMB), University of Goettingen, Göttingen, Germany
| | - Bo Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
14
|
Yuan Y, Bayer PE, Batley J, Edwards D. Current status of structural variation studies in plants. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:2153-2163. [PMID: 34101329 PMCID: PMC8541774 DOI: 10.1111/pbi.13646] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 05/23/2023]
Abstract
Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic diversity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever-greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.
Collapse
Affiliation(s)
- Yuxuan Yuan
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
- School of Life Sciences and State Key Laboratory for AgrobiotechnologyThe Chinese University of Hong KongHong Kong SARChina
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| | - David Edwards
- School of Biological Sciences and Institute of AgricultureThe University of Western AustraliaPerthWAAustralia
| |
Collapse
|
15
|
Bornowski N, Michel KJ, Hamilton JP, Ou S, Seetharam AS, Jenkins J, Grimwood J, Plott C, Shu S, Talag J, Kennedy M, Hundley H, Singan VR, Barry K, Daum C, Yoshinaga Y, Schmutz J, Hirsch CN, Hufford MB, de Leon N, Kaeppler SM, Buell CR. Genomic variation within the maize stiff-stalk heterotic germplasm pool. THE PLANT GENOME 2021; 14:e20114. [PMID: 34275202 DOI: 10.1002/tpg2.20114] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 05/06/2021] [Indexed: 05/28/2023]
Abstract
The stiff-stalk heterotic group in Maize (Zea mays L.) is an important source of inbreds used in U.S. commercial hybrid production. Founder inbreds B14, B37, B73, and, to a lesser extent, B84, are found in the pedigrees of a majority of commercial seed parent inbred lines. We created high-quality genome assemblies of B84 and four expired Plant Variety Protection (ex-PVP) lines LH145 representing B14, NKH8431 of mixed descent, PHB47 representing B37, and PHJ40, which is a Pioneer Hi-Bred International (PHI) early stiff-stalk type. Sequence was generated using long-read sequencing achieving highly contiguous assemblies of 2.13-2.18 Gbp with N50 scaffold lengths >200 Mbp. Inbred-specific gene annotations were generated using a core five-tissue gene expression atlas, whereas transposable element (TE) annotation was conducted using de novo and homology-directed methodologies. Compared with the reference inbred B73, synteny analyses revealed extensive collinearity across the five stiff-stalk genomes, although unique components of the maize pangenome were detected. Comparison of this set of stiff-stalk inbreds with the original Iowa Stiff Stalk Synthetic breeding population revealed that these inbreds represent only a proportion of variation in the original stiff-stalk pool and there are highly conserved haplotypes in released public and ex-Plant Variety Protection inbreds. Despite the reduction in variation from the original stiff-stalk population, substantial genetic and genomic variation was identified supporting the potential for continued breeding success in this pool. The assemblies described here represent stiff-stalk inbreds that have historical and commercial relevance and provide further insight into the emerging maize pangenome.
Collapse
Affiliation(s)
- Nolan Bornowski
- Dep. of Plant Biology, Michigan State Univ., 612 Wilson Road, East Lansing, MI, 48824, USA
| | - Kathryn J Michel
- Dep. of Agronomy, Univ. of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| | - John P Hamilton
- Dep. of Plant Biology, Michigan State Univ., 612 Wilson Road, East Lansing, MI, 48824, USA
| | - Shujun Ou
- Dep. of Ecology, Evolution, and Organismal Biology, Iowa State Univ., 2200 Osborn Drive, Ames, IA, 50011, USA
| | - Arun S Seetharam
- Dep. of Ecology, Evolution, and Organismal Biology, Iowa State Univ., 2200 Osborn Drive, Ames, IA, 50011, USA
| | - Jerry Jenkins
- HudsonAlpha Institute for Biotechnology, 601 Genome Way Northwest, Huntsville, AL, 35806, USA
| | - Jane Grimwood
- HudsonAlpha Institute for Biotechnology, 601 Genome Way Northwest, Huntsville, AL, 35806, USA
| | - Chris Plott
- HudsonAlpha Institute for Biotechnology, 601 Genome Way Northwest, Huntsville, AL, 35806, USA
| | - Shengqiang Shu
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Jayson Talag
- Arizona Genomics Institute, School of Plant Sciences, Univ. of Arizona, 1657 E Helen Street, Tucson, AZ, 85721, USA
| | - Megan Kennedy
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Hope Hundley
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Vasanth R Singan
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Kerrie Barry
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Chris Daum
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Yuko Yoshinaga
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Jeremy Schmutz
- HudsonAlpha Institute for Biotechnology, 601 Genome Way Northwest, Huntsville, AL, 35806, USA
- U.S. Dep. of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA
| | - Candice N Hirsch
- Dep. of Agronomy and Plant Genetics, Univ. of Minnesota, 1991 Upper Buford Circle, Saint Paul, MN, 55108, USA
| | - Matthew B Hufford
- Dep. of Ecology, Evolution, and Organismal Biology, Iowa State Univ., 2200 Osborn Drive, Ames, IA, 50011, USA
| | - Natalia de Leon
- Dep. of Agronomy, Univ. of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
- Dep. of Energy, Great Lakes Bioenergy Research Center, Univ. of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
| | - Shawn M Kaeppler
- Dep. of Agronomy, Univ. of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
- Dep. of Energy, Great Lakes Bioenergy Research Center, Univ. of Wisconsin - Madison, 1575 Linden Drive, Madison, WI, 53706, USA
- Wisconsin Crop Innovation Center, Univ. of Wisconsin - Madison, 8520 University Green, Middleton, WI, 53562, USA
| | - C Robin Buell
- Dep. of Plant Biology, Michigan State Univ., 612 Wilson Road, East Lansing, MI, 48824, USA
- Dep. of Energy, Great Lakes Bioenergy Research Center, Michigan State Univ., 612 Wilson Road, East Lansing, MI, 48824, USA
| |
Collapse
|
16
|
Pucker B, Kleinbölting N, Weisshaar B. Large scale genomic rearrangements in selected Arabidopsis thaliana T-DNA lines are caused by T-DNA insertion mutagenesis. BMC Genomics 2021; 22:599. [PMID: 34362298 PMCID: PMC8348815 DOI: 10.1186/s12864-021-07877-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Accepted: 07/06/2021] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Experimental proof of gene function assignments in plants is based on mutant analyses. T-DNA insertion lines provided an invaluable resource of mutants and enabled systematic reverse genetics-based investigation of the functions of Arabidopsis thaliana genes during the last decades. RESULTS We sequenced the genomes of 14 A. thaliana GABI-Kat T-DNA insertion lines, which eluded flanking sequence tag-based attempts to characterize their insertion loci, with Oxford Nanopore Technologies (ONT) long reads. Complex T-DNA insertions were resolved and 11 previously unknown T-DNA loci identified, resulting in about 2 T-DNA insertions per line and suggesting that this number was previously underestimated. T-DNA mutagenesis caused fusions of chromosomes along with compensating translocations to keep the gene set complete throughout meiosis. Also, an inverted duplication of 800 kbp was detected. About 10 % of GABI-Kat lines might be affected by chromosomal rearrangements, some of which do not involve T-DNA. Local assembly of selected reads was shown to be a computationally effective method to resolve the structure of T-DNA insertion loci. We developed an automated workflow to support investigation of long read data from T-DNA insertion lines. All steps from DNA extraction to assembly of T-DNA loci can be completed within days. CONCLUSIONS Long read sequencing was demonstrated to be an effective way to resolve complex T-DNA insertions and chromosome fusions. Many T-DNA insertions comprise not just a single T-DNA, but complex arrays of multiple T-DNAs. It is becoming obvious that T-DNA insertion alleles must be characterized by exact identification of both T-DNA::genome junctions to generate clear genotype-to-phenotype relations.
Collapse
Affiliation(s)
- Boas Pucker
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, Germany
- Evolution and Diversity, Department of Plant Sciences, University of Cambridge, Cambridge, UK
| | - Nils Kleinbölting
- Bioinformatics Resource Facility, Center for Biotechnology (CeBiTec, Bielefeld University, Sequenz 1, 33615 Bielefeld, Germany
| | - Bernd Weisshaar
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, Germany
| |
Collapse
|
17
|
Sielemann K, Weisshaar B, Pucker B. Reference-based QUantification Of gene Dispensability (QUOD). PLANT METHODS 2021; 17:18. [PMID: 33563309 PMCID: PMC7871624 DOI: 10.1186/s13007-021-00718-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Accepted: 02/03/2021] [Indexed: 05/03/2023]
Abstract
BACKGROUND Dispensability of genes in a phylogenetic lineage, e.g. a species, genus, or higher-level clade, is gaining relevance as most genome sequencing projects move to a pangenome level. Most analyses classify genes as core genes, which are present in all investigated individual genomes, and dispensable genes, which only occur in a single or a few investigated genomes. The binary classification as 'core' or 'dispensable' is often based on arbitrary cutoffs of presence/absence in the analysed genomes. Even when extended to 'conditionally dispensable', this concept still requires the assignment of genes to distinct groups. RESULTS Here, we present a new method which overcomes this distinct classification by quantifying gene dispensability and present a dedicated tool for reference-based QUantification Of gene Dispensability (QUOD). As a proof of concept, sequence data of 966 Arabidopsis thaliana accessions (Ath-966) were processed to calculate a gene-specific dispensability score for each gene based on normalised coverage in read mappings. We validated this score by comparison of highly conserved Benchmarking Universal Single Copy Orthologs (BUSCOs) to all other genes. The average scores of BUSCOs were significantly lower than the scores of non-BUSCOs. Analysis of variation demonstrated lower variation values between replicates of a single accession than between iteratively, randomly selected accessions from the whole dataset Ath-966. Functional investigations revealed defense and antimicrobial response genes among the genes with high-dispensability scores. CONCLUSIONS Instead of classifying a gene as core or dispensable, QUOD assigns a dispensability score to each gene. Hence, QUOD facilitates the identification of candidate dispensable genes, associated with high dispensability scores, which often underlie lineage-specific adaptation to varying environmental conditions.
Collapse
Affiliation(s)
- Katharina Sielemann
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
- Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, 33615 Bielefeld, Germany
| | - Bernd Weisshaar
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
| | - Boas Pucker
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
- Evolution and Diversity, Department of Plant Sciences, University of Cambridge, Cambridge, UK
| |
Collapse
|
18
|
Formation and diversification of a paradigm biosynthetic gene cluster in plants. Nat Commun 2020; 11:5354. [PMID: 33097700 PMCID: PMC7584637 DOI: 10.1038/s41467-020-19153-6] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Accepted: 09/29/2020] [Indexed: 12/31/2022] Open
Abstract
Numerous examples of biosynthetic gene clusters (BGCs), including for compounds of agricultural and medicinal importance, have now been discovered in plant genomes. However, little is known about how these complex traits are assembled and diversified. Here, we examine a large number of variants within and between species for a paradigm BGC (the thalianol cluster), which has evolved recently in a common ancestor of the Arabidopsis genus. Comparisons at the species level reveal differences in BGC organization and involvement of auxiliary genes, resulting in production of species-specific triterpenes. Within species, the thalianol cluster is primarily fixed, showing a low frequency of deleterious haplotypes. We further identify chromosomal inversion as a molecular mechanism that may shuffle more distant genes into the cluster, so enabling cluster compaction. Antagonistic natural selection pressures are likely involved in shaping the occurrence and maintenance of this BGC. Our work sheds light on the birth, life and death of complex genetic and metabolic traits in plants.
Collapse
|
19
|
Zou P, Duan L, Zhang S, Bai X, Liu Z, Jin F, Sun H, Xu W, Chen R. Target Specificity of the CRISPR-Cas9 System in Arabidopsis thaliana, Oryza sativa, and Glycine max Genomes. J Comput Biol 2020; 27:1544-1552. [PMID: 32298599 DOI: 10.1089/cmb.2019.0453] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Clustered regularly interspaced short palindromic repeats (CRISPR), a class of immune-associated sequences in bacteria, have been developed as a powerful tool for editing eukaryotic genomes in diverse cells and organisms in recent years. The CRISPR-Cas9 system can recognize upstream 20 nucleotides (guide sequence) adjacent to the protospacer-adjacent motif site and trigger double-stranded DNA cleavage as well as DNA repair mechanisms, which eventually result in knockout, knockin, or site-specific mutagenesis. However, off-target effect caused by guide sequence misrecognition is the major drawback and restricts its widespread application. In this study, global analysis of specificities of all guide sequences in Arabidopsis thaliana, Oryza sativa (rice), and Glycine max (soybean) were performed. As a result, a simple pipeline and three genome-wide databases were established and shared for the scientific society. For each target site of CRISPR-Cas9, specificity score and off-target number were calculated and evaluated. The mean values of off-target numbers for A. thaliana, rice, and soybean were determined as 27.5, 57.3, and 174.7, respectively. Comparative analysis among these plants suggested that the frequency of off-target effects was correlated to genome size, chromosomal locus, gene density, and guanine-cytosine (GC) content. Our results contributed to the better understanding of CRISPR-Cas9 system in plants and would help to minimize the off-target effect during its applications in the future.
Collapse
Affiliation(s)
- Pan Zou
- Tianjin Institute of Agricultural Quality Standard and Testing Technology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| | - Lijin Duan
- Tianjin Institute of Agricultural Quality Standard and Testing Technology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| | - Shasha Zhang
- Tianjin Institute of Agricultural Quality Standard and Testing Technology, Tianjin Academy of Agricultural Sciences, Tianjin, China
- College of Life Sciences and Food Engineering, Hebei University of Engineering, Handan, China
| | - Xue Bai
- Tianjin Institute of Agricultural Quality Standard and Testing Technology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| | - Zhenghui Liu
- Tianjin Institute of Agricultural Quality Standard and Testing Technology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| | - Fengmei Jin
- Tianjin Research Center of Agricultural Biotechnology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| | - Haibo Sun
- Tianjin Research Center of Agricultural Biotechnology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| | - Wentao Xu
- Key Laboratory of Assessment of Genetically Modified Organism (Food Safety) (Ministry of Agriculture and Rural Affairs), China Agricultural University, Beijing, China
| | - Rui Chen
- Tianjin Institute of Agricultural Quality Standard and Testing Technology, Tianjin Academy of Agricultural Sciences, Tianjin, China
| |
Collapse
|
20
|
Genome Sequences of Both Organelles of the Grapevine Rootstock Cultivar 'Börner'. Microbiol Resour Announc 2020; 9:9/15/e01471-19. [PMID: 32273371 PMCID: PMC7380517 DOI: 10.1128/mra.01471-19] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Genomic long reads of the interspecific grapevine rootstock cultivar ‘Börner’ (Vitis riparia GM183 × Vitis cinerea Arnold) were used to assemble its chloroplast and mitochondrion genome sequences. We annotated 133 chloroplast and 172 mitochondrial genes, including the RNA editing sites. The organelle genomes in ‘Börner’ were maternally inherited from Vitis riparia. Genomic long reads of the interspecific grapevine rootstock cultivar ‘Börner’ (Vitis riparia GM183 × Vitis cinerea Arnold) were used to assemble its chloroplast and mitochondrion genome sequences. We annotated 133 chloroplast and 172 mitochondrial genes, including the RNA editing sites. The organelle genomes in ‘Börner’ were maternally inherited from Vitis riparia.
Collapse
|
21
|
Schilbert HM, Rempel A, Pucker B. Comparison of Read Mapping and Variant Calling Tools for the Analysis of Plant NGS Data. PLANTS (BASEL, SWITZERLAND) 2020; 9:E439. [PMID: 32252268 PMCID: PMC7238416 DOI: 10.3390/plants9040439] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 03/28/2020] [Accepted: 03/30/2020] [Indexed: 12/30/2022]
Abstract
High-throughput sequencing technologies have rapidly developed during the past years and have become an essential tool in plant sciences. However, the analysis of genomic data remains challenging and relies mostly on the performance of automatic pipelines. Frequently applied pipelines involve the alignment of sequence reads against a reference sequence and the identification of sequence variants. Since most benchmarking studies of bioinformatics tools for this purpose have been conducted on human datasets, there is a lack of benchmarking studies in plant sciences. In this study, we evaluated the performance of 50 different variant calling pipelines, including five read mappers and ten variant callers, on six real plant datasets of the model organism Arabidopsis thaliana. Sets of variants were evaluated based on various parameters including sensitivity and specificity. We found that all investigated tools are suitable for analysis of NGS data in plant research. When looking at different performance metrics, BWA-MEM and Novoalign were the best mappers and GATK returned the best results in the variant calling step.
Collapse
Affiliation(s)
- Hanna Marie Schilbert
- Genetics and Genomics of Plants, CeBiTec and Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
| | - Andreas Rempel
- Genetics and Genomics of Plants, CeBiTec and Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
- Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany
| | - Boas Pucker
- Genetics and Genomics of Plants, CeBiTec and Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
- Molecular Genetics and Physiology of Plants, Faculty of Biology and Biotechnology, Ruhr-University Bochum, 44801 Bochum, Germany
| |
Collapse
|
22
|
Siadjeu C, Pucker B, Viehöver P, Albach DC, Weisshaar B. High Contiguity De Novo Genome Sequence Assembly of Trifoliate Yam ( Dioscorea dumetorum) Using Long Read Sequencing. Genes (Basel) 2020; 11:E274. [PMID: 32143301 PMCID: PMC7140821 DOI: 10.3390/genes11030274] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 02/25/2020] [Accepted: 02/29/2020] [Indexed: 12/17/2022] Open
Abstract
Trifoliate yam (Dioscorea dumetorum) is one example of an orphan crop, not traded internationally. Post-harvest hardening of the tubers of this species starts within 24 h after harvesting and renders the tubers inedible. Genomic resources are required for D. dumetorum to improve breeding for non-hardening varieties as well as for other traits. We sequenced the D. dumetorum genome and generated the corresponding annotation. The two haplophases of this highly heterozygous genome were separated to a large extent. The assembly represents 485 Mbp of the genome with an N50 of over 3.2 Mbp. A total of 35,269 protein-encoding gene models as well as 9941 non-coding RNA genes were predicted, and functional annotations were assigned.
Collapse
Affiliation(s)
- Christian Siadjeu
- Institute for Biology and Environmental Sciences, Biodiversity and Evolution of Plants, Carl-von-Ossietzky University Oldenburg, Carl-von-Ossietzky Str. 9-11, 26111 Oldenburg, Germany; (C.S.); (D.C.A.)
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany; (B.P.); (P.V.)
| | - Boas Pucker
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany; (B.P.); (P.V.)
- Molecular Genetics and Physiology of Plants, Faculty of Biology and Biotechnology, Ruhr-University Bochum, Universitätsstraße 150, 44801 Bochum, Germany
| | - Prisca Viehöver
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany; (B.P.); (P.V.)
| | - Dirk C. Albach
- Institute for Biology and Environmental Sciences, Biodiversity and Evolution of Plants, Carl-von-Ossietzky University Oldenburg, Carl-von-Ossietzky Str. 9-11, 26111 Oldenburg, Germany; (C.S.); (D.C.A.)
| | - Bernd Weisshaar
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany; (B.P.); (P.V.)
| |
Collapse
|
23
|
Jiao WB, Schneeberger K. Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics. Nat Commun 2020; 11:989. [PMID: 32080174 PMCID: PMC7033125 DOI: 10.1038/s41467-020-14779-y] [Citation(s) in RCA: 97] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 01/31/2020] [Indexed: 12/12/2022] Open
Abstract
Despite hundreds of sequenced Arabidopsis genomes, very little is known about the degree of genomic collinearity within single species, due to the low number of chromosome-level assemblies. Here, we report chromosome-level reference-quality assemblies of seven Arabidopsis thaliana accessions selected across its global range. Each genome reveals between 13–17 Mb rearranged, and 5–6 Mb non-reference sequences introducing copy-number changes in ~5000 genes, including ~1900 non-reference genes. Quantifying the collinearity between the genomes reveals ~350 euchromatic regions, where accession-specific tandem duplications destroy the collinearity between the genomes. These hotspots of rearrangements are characterized by reduced meiotic recombination in hybrids and genes implicated in biotic stress response. This suggests that hotspots of rearrangements undergo altered evolutionary dynamics, as compared to the rest of the genome, which are mostly based on the accumulation of new mutations and not on the recombination of existing variation, and thereby enable a quick response to the biotic stress. Despite tremendous genomic resources in the Arabidopsis community, only a few whole genome de novo assemblies are available. Here, the authors report chromosome-level reference-quality assemblies of seven A. thaliana accessions and reveal hotspots of rearrangements with altered evolutionary dynamics.
Collapse
Affiliation(s)
- Wen-Biao Jiao
- Max Planck Institute for Plant Breeding Research, Department of Chromosome Biology, Carl-von-Linné-Weg 10, 50829, Cologne, Germany
| | - Korbinian Schneeberger
- Max Planck Institute for Plant Breeding Research, Department of Chromosome Biology, Carl-von-Linné-Weg 10, 50829, Cologne, Germany. .,Faculty of Biology, LMU Munich, Großhaderner Str. 2, 82152, Planegg-Martinsried, Germany.
| |
Collapse
|
24
|
Sasaki E, Kawakatsu T, Ecker JR, Nordborg M. Common alleles of CMT2 and NRPE1 are major determinants of CHH methylation variation in Arabidopsis thaliana. PLoS Genet 2019; 15:e1008492. [PMID: 31887137 PMCID: PMC6953882 DOI: 10.1371/journal.pgen.1008492] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 01/10/2020] [Accepted: 12/03/2019] [Indexed: 01/05/2023] Open
Abstract
DNA cytosine methylation is an epigenetic mark associated with silencing of transposable elements (TEs) and heterochromatin formation. In plants, it occurs in three sequence contexts: CG, CHG, and CHH (where H is A, T, or C). The latter does not allow direct inheritance of methylation during DNA replication due to lack of symmetry, and methylation must therefore be re-established every cell generation. Genome-wide association studies (GWAS) have previously shown that CMT2 and NRPE1 are major determinants of genome-wide patterns of TE CHH methylation. Here we instead focus on CHH methylation of individual TEs and TE-families, allowing us to identify the pathways involved in CHH methylation simply from natural variation and confirm the associations by comparing them with mutant phenotypes. Methylation at TEs targeted by the RNA-directed DNA methylation (RdDM) pathway is unaffected by CMT2 variation, but is strongly affected by variation at NRPE1, which is largely responsible for the longitudinal cline in this phenotype. In contrast, CMT2-targeted TEs are affected by both loci, which jointly explain 7.3% of the phenotypic variation (13.2% of total genetic effects). There is no longitudinal pattern for this phenotype, however, because the geographic patterns appear to compensate for each other in a pattern suggestive of stabilizing selection. DNA methylation is a major component of transposon silencing, and essential for genomic integrity. Recent studies revealed large-scale geographic variation as well as the existence of major trans-acting polymorphisms that partly explained this variation. In this study, we re-analyze previously published data (The 1001 Epigenomes), focusing on CHH methylation patterns of individual TEs and TE families rather than on genome-wide averages (as was done in previous studies). GWAS of the patterns reveals the underlying regulatory networks, and allowed us to comprehensively characterize trans-regulation of CHH methylation and its role in the striking geographic pattern for this phenotype.
Collapse
Affiliation(s)
- Eriko Sasaki
- Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
| | - Taiji Kawakatsu
- Plant Biology Laboratory, Salk Institute for Biological Studies, La Jolla, California, United States of America
- Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, California, United States of America
- Institute of Agrobiological Sciences, National Agriculture and Food Research Organization. Tsukuba, Ibaraki, Japan
| | - Joseph R. Ecker
- Plant Biology Laboratory, Salk Institute for Biological Studies, La Jolla, California, United States of America
- Genomic Analysis Laboratory, Salk Institute for Biological Studies, La Jolla, California, United States of America
- Howard Hughes Medical Institute, Salk Institute for Biological Studies, La Jolla, California, United States of America
| | - Magnus Nordborg
- Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria
- * E-mail:
| |
Collapse
|
25
|
Pucker B, Rückert C, Stracke R, Viehöver P, Kalinowski J, Weisshaar B. Twenty-Five Years of Propagation in Suspension Cell Culture Results in Substantial Alterations of the Arabidopsis Thaliana Genome. Genes (Basel) 2019; 10:E671. [PMID: 31480756 PMCID: PMC6770967 DOI: 10.3390/genes10090671] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Revised: 08/23/2019] [Accepted: 08/29/2019] [Indexed: 01/16/2023] Open
Abstract
Arabidopsis thaliana is one of the best studied plant model organisms. Besides cultivation in greenhouses, cells of this plant can also be propagated in suspension cell culture. At7 is one such cell line that was established about 25 years ago. Here, we report the sequencing and the analysis of the At7 genome. Large scale duplications and deletions compared to the Columbia-0 (Col-0) reference sequence were detected. The number of deletions exceeds the number of insertions, thus indicating that a haploid genome size reduction is ongoing. Patterns of small sequence variants differ from the ones observed between A. thaliana accessions, e.g., the number of single nucleotide variants matches the number of insertions/deletions. RNA-Seq analysis reveals that disrupted alleles are less frequent in the transcriptome than the native ones.
Collapse
Affiliation(s)
- Boas Pucker
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany.
| | - Christian Rückert
- Microbial Genomics and Biotechnology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany
| | - Ralf Stracke
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany
| | - Prisca Viehöver
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany
| | - Jörn Kalinowski
- Microbial Genomics and Biotechnology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany
| | - Bernd Weisshaar
- Genetics and Genomics of Plants, Faculty of Biology, Center for Biotechnology (CeBiTec), Bielefeld University, Sequenz 1, 33615 Bielefeld, NRW, Germany
| |
Collapse
|
26
|
Pucker B, Schilbert HM, Schumacher SF. Integrating Molecular Biology and Bioinformatics Education. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2019-0005/jib-2019-0005.xml. [PMID: 31145692 PMCID: PMC6798849 DOI: 10.1515/jib-2019-0005] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 04/15/2019] [Indexed: 02/01/2023] Open
Abstract
Combined awareness about the power and limitations of bioinformatics and molecular biology enables advanced research based on high-throughput data. Despite an increasing demand of scientists with a combined background in both fields, the education of dry and wet lab subjects are often still separated. This work describes an example of integrated education with a focus on genomics and transcriptomics. Participants learned computational and molecular biology methods in the same practical course. Peer-review was applied as a teaching method to foster cooperative learning of students with heterogeneous backgrounds. The positive evaluation results indicate that this approach was accepted by the participants and would likely be suitable for wider scale application.
Collapse
Affiliation(s)
- Boas Pucker
- Genetics and Genomics of Plants, CeBiTec and Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | - Hanna Marie Schilbert
- Genetics and Genomics of Plants, CeBiTec and Faculty of Biology, Bielefeld University, Bielefeld, Germany
| | | |
Collapse
|