201
|
Great majority of recombination events in Arabidopsis are gene conversion events. Proc Natl Acad Sci U S A 2012; 109:20992-7. [PMID: 23213238 DOI: 10.1073/pnas.1211827110] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The evolutionary importance of meiosis may not solely be associated with allelic shuffling caused by crossing-over but also have to do with its more immediate effects such as gene conversion. Although estimates of the crossing-over rate are often well resolved, the gene conversion rate is much less clear. In Arabidopsis, for example, next-generation sequencing approaches suggest that the two rates are about the same, which contrasts with indirect measures, these suggesting an excess of gene conversion. Here, we provide analysis of this problem by sequencing 40 F(2) Arabidopsis plants and their parents. Small gene conversion tracts, with biased gene conversion content, represent over 90% (probably nearer 99%) of all recombination events. The rate of alteration of protein sequence caused by gene conversion is over 600 times that caused by mutation. Finally, our analysis reveals recombination hot spots and unexpectedly high recombination rates near centromeres. This may be responsible for the previously unexplained pattern of high genetic diversity near Arabidopsis centromeres.
Collapse
|
202
|
Hartwig B, James GV, Konrad K, Schneeberger K, Turck F. Fast isogenic mapping-by-sequencing of ethyl methanesulfonate-induced mutant bulks. PLANT PHYSIOLOGY 2012; 160:591-600. [PMID: 22837357 PMCID: PMC3461541 DOI: 10.1104/pp.112.200311] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2012] [Accepted: 07/25/2012] [Indexed: 05/18/2023]
Abstract
Mapping-by-sequencing (or SHOREmapping) has revitalized the powerful concept of forward genetic screens in plants. However, as in conventional genetic mapping approaches, mapping-by-sequencing requires phenotyping of mapping populations established from crosses between two diverged accessions. In addition to the segregation of the focal phenotype, this introduces natural phenotypic variation, which can interfere with the recognition of quantitative phenotypes. Here, we demonstrate how mapping-by-sequencing and candidate gene identification can be performed within the same genetic background using only mutagen-induced changes as segregating markers. Using a previously unknown suppressor of mutants of like heterochromatin protein1 (lhp1), which in its functional form is involved in chromatin-mediated gene repression, we identified three closely linked ethyl methanesulfonate-induced changes as putative candidates. In order to assess allele frequency differences between such closely linked mutations, we introduced deep candidate resequencing using the new Ion Torrent Personal Genome Machine sequencing platform to our mutant identification pipeline and thereby reduced the number of causal candidate mutations to only one. Genetic analysis of two independent additional alleles confirmed that this mutation was causal for the suppression of lhp1.
Collapse
|
203
|
Langley CH, Stevens K, Cardeno C, Lee YCG, Schrider DR, Pool JE, Langley SA, Suarez C, Corbett-Detig RB, Kolaczkowski B, Fang S, Nista PM, Holloway AK, Kern AD, Dewey CN, Song YS, Hahn MW, Begun DJ. Genomic variation in natural populations of Drosophila melanogaster. Genetics 2012; 192:533-98. [PMID: 22673804 PMCID: PMC3454882 DOI: 10.1534/genetics.112.142018] [Citation(s) in RCA: 250] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 05/24/2012] [Indexed: 02/07/2023] Open
Abstract
This report of independent genome sequences of two natural populations of Drosophila melanogaster (37 from North America and 6 from Africa) provides unique insight into forces shaping genomic polymorphism and divergence. Evidence of interactions between natural selection and genetic linkage is abundant not only in centromere- and telomere-proximal regions, but also throughout the euchromatic arms. Linkage disequilibrium, which decays within 1 kbp, exhibits a strong bias toward coupling of the more frequent alleles and provides a high-resolution map of recombination rate. The juxtaposition of population genetics statistics in small genomic windows with gene structures and chromatin states yields a rich, high-resolution annotation, including the following: (1) 5'- and 3'-UTRs are enriched for regions of reduced polymorphism relative to lineage-specific divergence; (2) exons overlap with windows of excess relative polymorphism; (3) epigenetic marks associated with active transcription initiation sites overlap with regions of reduced relative polymorphism and relatively reduced estimates of the rate of recombination; (4) the rate of adaptive nonsynonymous fixation increases with the rate of crossing over per base pair; and (5) both duplications and deletions are enriched near origins of replication and their density correlates negatively with the rate of crossing over. Available demographic models of X and autosome descent cannot account for the increased divergence on the X and loss of diversity associated with the out-of-Africa migration. Comparison of the variation among these genomes to variation among genomes from D. simulans suggests that many targets of directional selection are shared between these species.
Collapse
Affiliation(s)
- Charles H Langley
- Department of Evolution and Ecology, University of California, Davis, CA 95616, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
204
|
Deschamps S, Llaca V, May GD. Genotyping-by-Sequencing in Plants. BIOLOGY 2012; 1:460-83. [PMID: 24832503 PMCID: PMC4009820 DOI: 10.3390/biology1030460] [Citation(s) in RCA: 161] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2012] [Revised: 08/07/2012] [Accepted: 09/13/2012] [Indexed: 12/12/2022]
Abstract
The advent of next-generation DNA sequencing (NGS) technologies has led to the development of rapid genome-wide Single Nucleotide Polymorphism (SNP) detection applications in various plant species. Recent improvements in sequencing throughput combined with an overall decrease in costs per gigabase of sequence is allowing NGS to be applied to not only the evaluation of small subsets of parental inbred lines, but also the mapping and characterization of traits of interest in much larger populations. Such an approach, where sequences are used simultaneously to detect and score SNPs, therefore bypassing the entire marker assay development stage, is known as genotyping-by-sequencing (GBS). This review will summarize the current state of GBS in plants and the promises it holds as a genome-wide genotyping application.
Collapse
Affiliation(s)
- Stéphane Deschamps
- DuPont Agricultural Biotechnology, Experimental Station, PO Box 80353, 200 Powder Mill Road, Wilmington, DE 19880-0353, USA.
| | - Victor Llaca
- DuPont Agricultural Biotechnology, Experimental Station, PO Box 80353, 200 Powder Mill Road, Wilmington, DE 19880-0353, USA.
| | - Gregory D May
- DuPont Pioneer, 7300 NW 62nd Ave., P.O. Box 1004, Johnston, IA 50131-1004, USA.
| |
Collapse
|
205
|
Why assembling plant genome sequences is so challenging. BIOLOGY 2012; 1:439-59. [PMID: 24832233 PMCID: PMC4009782 DOI: 10.3390/biology1020439] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Revised: 09/05/2012] [Accepted: 09/06/2012] [Indexed: 12/16/2022]
Abstract
In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed.
Collapse
|
206
|
McCooke JK, Appels R, Barrero RA, Ding A, Ozimek-Kulik JE, Bellgard MI, Morahan G, Phillips JK. A novel mutation causing nephronophthisis in the Lewis polycystic kidney rat localises to a conserved RCC1 domain in Nek8. BMC Genomics 2012; 13:393. [PMID: 22899815 PMCID: PMC3441220 DOI: 10.1186/1471-2164-13-393] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2012] [Accepted: 08/06/2012] [Indexed: 01/03/2023] Open
Abstract
Background Nephronophthisis (NPHP) as a cause of cystic kidney disease is the most common genetic cause of progressive renal failure in children and young adults. NPHP is characterized by abnormal and/or loss of function of proteins associated with primary cilia. Previously, we characterized an autosomal recessive phenotype of cystic kidney disease in the Lewis Polycystic Kidney (LPK) rat. Results In this study, quantitative trait locus analysis was used to define a ~1.6Mbp region on rat chromosome 10q25 harbouring the lpk mutation. Targeted genome capture and next-generation sequencing of this region identified a non-synonymous mutation R650C in the NIMA (never in mitosis gene a)- related kinase 8 ( Nek8) gene. This is a novel Nek8 mutation that occurs within the regulator of chromosome condensation 1 (RCC1)-like region of the protein. Specifically, the R650C substitution is located within a G[QRC]LG repeat motif of the predicted seven bladed beta-propeller structure of the RCC1 domain. The rat Nek8 gene is located in a region syntenic to portions of human chromosome 17 and mouse 11. Scanning electron microscopy confirmed abnormally long cilia on LPK kidney epithelial cells, and fluorescence immunohistochemistry for Nek8 protein revealed altered cilia localisation. Conclusions When assessed relative to other Nek8 NPHP mutations, our results indicate the whole propeller structure of the RCC1 domain is important, as the different mutations cause comparable phenotypes. This study establishes the LPK rat as a novel model system for NPHP and further consolidates the link between cystic kidney disease and cilia proteins.
Collapse
Affiliation(s)
- John K McCooke
- Centre for Comparative Genomics, Murdoch University, Perth, WA 6150, Australia
| | | | | | | | | | | | | | | |
Collapse
|
207
|
Kulemzina I, Schumacher MR, Verma V, Reiter J, Metzler J, Failla AV, Lanz C, Sreedharan VT, Rätsch G, Ivanov D. Cohesin rings devoid of Scc3 and Pds5 maintain their stable association with the DNA. PLoS Genet 2012; 8:e1002856. [PMID: 22912589 PMCID: PMC3415457 DOI: 10.1371/journal.pgen.1002856] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 06/11/2012] [Indexed: 01/01/2023] Open
Abstract
Cohesin is a protein complex that forms a ring around sister chromatids thus holding them together. The ring is composed of three proteins: Smc1, Smc3 and Scc1. The roles of three additional proteins that associate with the ring, Scc3, Pds5 and Wpl1, are not well understood. It has been proposed that these three factors form a complex that stabilizes the ring and prevents it from opening. This activity promotes sister chromatid cohesion but at the same time poses an obstacle for the initial entrapment of sister DNAs. This hindrance to cohesion establishment is overcome during DNA replication via acetylation of the Smc3 subunit by the Eco1 acetyltransferase. However, the full mechanistic consequences of Smc3 acetylation remain unknown. In the current work, we test the requirement of Scc3 and Pds5 for the stable association of cohesin with DNA. We investigated the consequences of Scc3 and Pds5 depletion in vivo using degron tagging in budding yeast. The previously described DHFR-based N-terminal degron as well as a novel Eco1-derived C-terminal degron were employed in our study. Scc3 and Pds5 associate with cohesin complexes independently of each other and require the Scc1 "core" subunit for their association with chromosomes. Contrary to previous data for Scc1 downregulation, depletion of either Scc3 or Pds5 had a strong effect on sister chromatid cohesion but not on cohesin binding to DNA. Quantity, stability and genome-wide distribution of cohesin complexes remained mostly unchanged after the depletion of Scc3 and Pds5. Our findings are inconsistent with a previously proposed model that Scc3 and Pds5 are cohesin maintenance factors required for cohesin ring stability or for maintaining its association with DNA. We propose that Scc3 and Pds5 specifically function during cohesion establishment in S phase.
Collapse
Affiliation(s)
- Irina Kulemzina
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | | | - Vikash Verma
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | - Jochen Reiter
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | - Janina Metzler
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | | | - Christa Lanz
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | | | - Gunnar Rätsch
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | - Dmitri Ivanov
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| |
Collapse
|
208
|
Independent FLC mutations as causes of flowering-time variation in Arabidopsis thaliana and Capsella rubella. Genetics 2012; 192:729-39. [PMID: 22865739 PMCID: PMC3454893 DOI: 10.1534/genetics.112.143958] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Capsella rubella is an inbreeding annual forb closely related to Arabidopsis thaliana, a model species widely used for studying natural variation in adaptive traits such as flowering time. Although mutations in dozens of genes can affect flowering of A. thaliana in the laboratory, only a handful of such genes vary in natural populations. Chief among these are FRIGIDA (FRI) and FLOWERING LOCUS C (FLC). Common and rare FRI mutations along with rare FLC mutations explain a large fraction of flowering-time variation in A. thaliana. Here we document flowering time under different conditions in 20 C. rubella accessions from across the species’ range. Similar to A. thaliana, vernalization, long photoperiods and elevated ambient temperature generally promote flowering. In this collection of C. rubella accessions, we did not find any obvious loss-of-function FRI alleles. Using mapping-by-sequencing with two strains that have contrasting flowering behaviors, we identified a splice-site mutation in FLC as the likely cause of early flowering in accession 1408. However, other similarly early C. rubella accessions did not share this mutation. We conclude that the genetic basis of flowering-time variation in C. rubella is complex, despite this very young species having undergone an extreme genetic bottleneck when it split from C. grandiflora a few tens of thousands of years ago.
Collapse
|
209
|
Lai K, Duran C, Berkman PJ, Lorenc MT, Stiller J, Manoli S, Hayden MJ, Forrest KL, Fleury D, Baumann U, Zander M, Mason AS, Batley J, Edwards D. Single nucleotide polymorphism discovery from wheat next-generation sequence data. PLANT BIOTECHNOLOGY JOURNAL 2012; 10:743-9. [PMID: 22748104 DOI: 10.1111/j.1467-7652.2012.00718.x] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant type of molecular genetic marker and can be used for producing high-resolution genetic maps, marker-trait association studies and marker-assisted breeding. Large polyploid genomes such as wheat present a challenge for SNP discovery because of the potential presence of multiple homoeologs for each gene. AutoSNPdb has been successfully applied to identify SNPs from Sanger sequence data for several species, including barley, rice and Brassica, but the volume of data required to accurately call SNPs in the complex genome of wheat has prevented its application to this important crop. DNA sequencing technology has been revolutionized by the introduction of next-generation sequencing, and it is now possible to generate several million sequence reads in a timely and cost-effective manner. We have produced wheat transcriptome sequence data using 454 sequencing technology and applied this for SNP discovery using a modified autoSNPdb method, which integrates SNP and gene annotation information with a graphical viewer. A total of 4,694,141 sequence reads from three bread wheat varieties were assembled to identify a total of 38 928 candidate SNPs. Each SNP is within an assembly complete with annotation, enabling the selection of polymorphism within genes of interest.
Collapse
Affiliation(s)
- Kaitao Lai
- School of Agriculture and Food Science, University of Queensland, Brisbane, QLD, Australia
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
210
|
Galvão VC, Nordström KJV, Lanz C, Sulz P, Mathieu J, Posé D, Schmid M, Weigel D, Schneeberger K. Synteny-based mapping-by-sequencing enabled by targeted enrichment. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2012; 71:517-526. [PMID: 22409706 DOI: 10.1111/j.1365-313x.2012.04993.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Mapping-by-sequencing, as implemented in SHOREmap ('SHOREmapping'), is greatly accelerating the identification of causal mutations. The original SHOREmap approach based on resequencing of bulked segregants required a highly accurate and complete reference sequence. However, current whole-genome or transcriptome assemblies from next-generation sequencing data of non-model organisms do not produce chromosome-length scaffolds. We have therefore developed a method that exploits synteny with a related genome for genetic mapping. We first demonstrate how mapping-by-sequencing can be performed using a reduced number of markers, and how the associated decrease in the number of markers can be compensated for by enrichment of marker sequences. As proof of concept, we apply this method to Arabidopsis thaliana gene models ordered by synteny with the genome sequence of the distant relative Brassica rapa, whose genome has several large-scale rearrangements relative to A. thaliana. Our approach provides an alternative method for high-resolution genetic mapping in species that lack finished genome reference sequences or for which only RNA-seq assemblies are available. Finally, for improved identification of causal mutations by fine-mapping, we introduce a new likelihood ratio test statistic, transforming local allele frequency estimations into a confidence interval similar to conventional mapping intervals.
Collapse
Affiliation(s)
- Vinicius C Galvão
- Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
211
|
Single Nucleotide Polymorphism (SNP) Detection and Genotype Calling from Massively Parallel Sequencing (MPS) Data. STATISTICS IN BIOSCIENCES 2012; 5:3-25. [PMID: 24489615 DOI: 10.1007/s12561-012-9067-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Massively parallel sequencing (MPS), since its debut in 2005, has transformed the field of genomic studies. These new sequencing technologies have resulted in the successful identification of causal variants for several rare Mendelian disorders. They have also begun to deliver on their promise to explain some of the missing heritability from genome-wide association studies (GWAS) of complex traits. We anticipate a rapidly growing number of MPS-based studies for a diverse range of applications in the near future. One crucial and nearly inevitable step is to detect SNPs and call genotypes at the detected polymorphic sites from the sequencing data. Here, we review statistical methods that have been proposed in the past five years for this purpose. In addition, we discuss emerging issues and future directions related to SNP detection and genotype calling from MPS data.
Collapse
|
212
|
Pacheco-Villalobos D, Hardtke CS. Natural genetic variation of root system architecture from Arabidopsis to Brachypodium: towards adaptive value. Philos Trans R Soc Lond B Biol Sci 2012; 367:1552-8. [PMID: 22527398 PMCID: PMC3321687 DOI: 10.1098/rstb.2011.0237] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Root system architecture is a trait that displays considerable plasticity because of its sensitivity to environmental stimuli. Nevertheless, to a significant degree it is genetically constrained as suggested by surveys of its natural genetic variation. A few regulators of root system architecture have been isolated as quantitative trait loci through the natural variation approach in the dicotyledon model, Arabidopsis. This provides proof of principle that allelic variation for root system architecture traits exists, is genetically tractable, and might be exploited for crop breeding. Beyond Arabidopsis, Brachypodium could serve as both a credible and experimentally accessible model for root system architecture variation in monocotyledons, as suggested by first glimpses of the different root morphologies of Brachypodium accessions. Whether a direct knowledge transfer gained from molecular model system studies will work in practice remains unclear however, because of a lack of comprehensive understanding of root system physiology in the native context. For instance, apart from a few notable exceptions, the adaptive value of genetic variation in root system modulators is unknown. Future studies should thus aim at comprehensive characterization of the role of genetic players in root system architecture variation by taking into account the native environmental conditions, in particular soil characteristics.
Collapse
Affiliation(s)
| | - Christian S. Hardtke
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, 1015 Lausanne, Switzerland
| |
Collapse
|
213
|
Bocker MT, Tuorto F, Raddatz G, Musch T, Yang FC, Xu M, Lyko F, Breiling A. Hydroxylation of 5-methylcytosine by TET2 maintains the active state of the mammalian HOXA cluster. Nat Commun 2012; 3:818. [PMID: 22569366 DOI: 10.1038/ncomms1826] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2011] [Accepted: 04/05/2012] [Indexed: 12/22/2022] Open
Abstract
Differentiation is accompanied by extensive epigenomic reprogramming, leading to the repression of stemness factors and the transcriptional maintenance of activated lineage-specific genes. Here we use the mammalian Hoxa cluster of developmental genes as a model system to follow changes in DNA modification patterns during retinoic acid-induced differentiation. We find the inactive cluster to be marked by defined patterns of 5-methylcytosine (5mC). Upon the induction of differentiation, the active anterior part of the cluster becomes increasingly enriched in 5-hydroxymethylcytosine (5hmC), following closely the colinear activation pattern of the gene array, which is paralleled by the reduction of 5mC. Depletion of the 5hmC generating dioxygenase Tet2 impairs the maintenance of Hoxa activity and partially restores 5mC levels. Our results indicate that gene-specific 5mC-5hmC conversion by Tet2 is crucial for the maintenance of active chromatin states at lineage-specific loci.
Collapse
Affiliation(s)
- Michael T Bocker
- Division of Epigenetics, DKFZ-ZMBH Alliance, German Cancer Research Center, Im Neuenheimer Feld 580, 69120 Heidelberg, Germany
| | | | | | | | | | | | | | | |
Collapse
|
214
|
Li H. Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics 2012; 28:1838-44. [PMID: 22569178 DOI: 10.1093/bioinformatics/bts280] [Citation(s) in RCA: 251] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Eugene Myers in his string graph paper suggested that in a string graph or equivalently a unitig graph, any path spells a valid assembly. As a string/unitig graph also encodes every valid assembly of reads, such a graph, provided that it can be constructed correctly, is in fact a lossless representation of reads. In principle, every analysis based on whole-genome shotgun sequencing (WGS) data, such as SNP and insertion/deletion (INDEL) calling, can also be achieved with unitigs. RESULTS To explore the feasibility of using de novo assembly in the context of resequencing, we developed a de novo assembler, fermi, that assembles Illumina short reads into unitigs while preserving most of information of the input reads. SNPs and INDELs can be called by mapping the unitigs against a reference genome. By applying the method on 35-fold human resequencing data, we showed that in comparison to the standard pipeline, our approach yields similar accuracy for SNP calling and better results for INDEL calling. It has higher sensitivity than other de novo assembly based methods for variant calling. Our work suggests that variant calling with de novo assembly can be a beneficial complement to the standard variant calling pipeline for whole-genome resequencing. In the methodological aspects, we propose FMD-index for forward-backward extension of DNA sequences, a fast algorithm for finding all super-maximal exact matches and one-pass construction of unitigs from an FMD-index. AVAILABILITY http://github.com/lh3/fermi
Collapse
Affiliation(s)
- Heng Li
- Medical Population Genetics Program, Broad Institute, 7 Cambridge Center, MA 02142, USA.
| |
Collapse
|
215
|
Iorizzo M, Senalik D, Szklarczyk M, Grzebelus D, Spooner D, Simon P. De novo assembly of the carrot mitochondrial genome using next generation sequencing of whole genomic DNA provides first evidence of DNA transfer into an angiosperm plastid genome. BMC PLANT BIOLOGY 2012; 12:61. [PMID: 22548759 PMCID: PMC3413510 DOI: 10.1186/1471-2229-12-61] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 05/01/2012] [Indexed: 05/02/2023]
Abstract
BACKGROUND Sequence analysis of organelle genomes has revealed important aspects of plant cell evolution. The scope of this study was to develop an approach for de novo assembly of the carrot mitochondrial genome using next generation sequence data from total genomic DNA. RESULTS Sequencing data from a carrot 454 whole genome library were used to develop a de novo assembly of the mitochondrial genome. Development of a new bioinformatic tool allowed visualizing contig connections and elucidation of the de novo assembly. Southern hybridization demonstrated recombination across two large repeats. Genome annotation allowed identification of 44 protein coding genes, three rRNA and 17 tRNA. Identification of the plastid genome sequence allowed organelle genome comparison. Mitochondrial intergenic sequence analysis allowed detection of a fragment of DNA specific to the carrot plastid genome. PCR amplification and sequence analysis across different Apiaceae species revealed consistent conservation of this fragment in the mitochondrial genomes and an insertion in Daucus plastid genomes, giving evidence of a mitochondrial to plastid transfer of DNA. Sequence similarity with a retrotransposon element suggests a possibility that a transposon-like event transferred this sequence into the plastid genome. CONCLUSIONS This study confirmed that whole genome sequencing is a practical approach for de novo assembly of higher plant mitochondrial genomes. In addition, a new aspect of intercompartmental genome interaction was reported providing the first evidence for DNA transfer into an angiosperm plastid genome. The approach used here could be used more broadly to sequence and assemble mitochondrial genomes of diverse species. This information will allow us to better understand intercompartmental interactions and cell evolution.
Collapse
Affiliation(s)
- Massimo Iorizzo
- Department of Horticulture, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI 53706, USA
| | - Douglas Senalik
- Department of Horticulture, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI 53706, USA
- USDA-Agricultural Research Service, Vegetable Crops Research Unit, University of Wisconsin, 1575 Linden Drive, Madison, WI 53706, USA
| | - Marek Szklarczyk
- Department of Genetics, Plant Breeding and Seed Science, University of Agriculture Krakow, Al. 29 Listopada 54, 31-425, Krakow, Poland
| | - Dariusz Grzebelus
- Department of Genetics, Plant Breeding and Seed Science, University of Agriculture Krakow, Al. 29 Listopada 54, 31-425, Krakow, Poland
| | - David Spooner
- Department of Horticulture, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI 53706, USA
- USDA-Agricultural Research Service, Vegetable Crops Research Unit, University of Wisconsin, 1575 Linden Drive, Madison, WI 53706, USA
| | - Philipp Simon
- Department of Horticulture, University of Wisconsin-Madison, 1575 Linden Drive, Madison, WI 53706, USA
- USDA-Agricultural Research Service, Vegetable Crops Research Unit, University of Wisconsin, 1575 Linden Drive, Madison, WI 53706, USA
| |
Collapse
|
216
|
Păcurar DI, Păcurar ML, Street N, Bussell JD, Pop TI, Gutierrez L, Bellini C. A collection of INDEL markers for map-based cloning in seven Arabidopsis accessions. JOURNAL OF EXPERIMENTAL BOTANY 2012; 63:2491-501. [PMID: 22282537 PMCID: PMC3346218 DOI: 10.1093/jxb/err422] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
The availability of a comprehensive set of resources including an entire annotated reference genome, sequenced alternative accessions, and a multitude of marker systems makes Arabidopsis thaliana an ideal platform for genetic mapping. PCR markers based on INsertions/DELetions (INDELs) are currently the most frequently used polymorphisms. For the most commonly used mapping combination, Columbia×Landsberg erecta (Col-0×Ler-0), the Cereon polymorphism database is a valuable resource for the generation of polymorphic markers. However, because the number of markers available in public databases for accessions other than Col-0 and Ler-0 is extremely low, mapping using other accessions is far from straightforward. This issue arose while cloning mutations in the Wassilewskija (Ws-4) background. In this work, approaches are described for marker generation in Ws-4 x Col-0. Complementary strategies were employed to generate 229 INDEL markers. Firstly, existing Col-0/Ler-0 Cereon predicted polymorphisms were mined for transferability to Ws-4. Secondly, Ws-0 ecotype Illumina sequence data were analyzed to identify INDELs that could be used for the development of PCR-based markers for Col-0 and Ws-4. Finally, shotgun sequencing allowed the identification of INDELs directly between Col-0 and Ws-4. The polymorphism of the 229 markers was assessed in seven widely used Arabidopsis accessions, and PCR markers that allow a clear distinction between the diverged Ws-0 and Ws-4 accessions are detailed. The utility of the markers was demonstrated by mapping more than 35 mutations in a Col-0×Ws-4 combination, an example of which is presented here. The potential contribution of next generation sequencing technologies to more traditional map-based cloning is discussed.
Collapse
Affiliation(s)
- Daniel Ioan Păcurar
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, SE-90183 Umeå, Sweden.
| | | | | | | | | | | | | |
Collapse
|
217
|
Childs LH, Lisec J, Walther D. Matapax: an online high-throughput genome-wide association study pipeline. PLANT PHYSIOLOGY 2012; 158:1534-41. [PMID: 22353578 PMCID: PMC3343729 DOI: 10.1104/pp.112.194027] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2012] [Accepted: 02/20/2012] [Indexed: 05/24/2023]
Abstract
High-throughput sequencing and genotyping methods are dramatically increasing the number of observable genetic intraspecies differences that can be exploited as genetic markers. In addition, automated phenotyping platforms and "omics" profiling technologies further enlarge the set of quantifiable macroscopic and molecular traits at an ever-increasing pace. Combined, both lines of technological advances create unparalleled opportunities to identify candidate gene regions and, ideally, even single genes responsible for observed variations in a particular trait via association studies. However, as of yet, this new potential is not sufficiently matched by enabling software solutions to easily exploit this wealth of genotype/phenotype information. We have developed Matapax, a Web-based platform to address this need. Initially, we built the infrastructure to support association studies in Arabidopsis (Arabidopsis thaliana) based on several genotyping efforts covering up to 1,375 Arabidopsis accessions. Based on the user-supplied trait information, associated single-nucleotide polymorphism markers and single-nucleotide polymorphism-harboring or -neighboring genes are identified using both the GAPIT and EMMA libraries developed for R. Additional interrogation is facilitated by displaying candidate regions and genes in a genome browser and by providing relevant annotation information. In the future, we plan to broaden the scope of organisms to other plant species as more genotype/phenotype information becomes available. Matapax is freely available at http://matapax.mpimp-golm.mpg.de and can be accessed using any internet browser.
Collapse
Affiliation(s)
- Liam H Childs
- Max-Planck Institute for Molecular Plant Physiology, Golm 14476, Germany.
| | | | | |
Collapse
|
218
|
Hamilton JP, Buell CR. Advances in plant genome sequencing. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2012; 70:177-90. [PMID: 22449051 DOI: 10.1111/j.1365-313x.2012.04894.x] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
The study of plant biology in the 21st century is, and will continue to be, vastly different from that in the 20th century. One driver for this has been the use of genomics methods to reveal the genetic blueprints for not one but dozens of plant species, as well as resolving genome differences in thousands of individuals at the population level. Genomics technology has advanced substantially since publication of the first plant genome sequence, that of Arabidopsis thaliana, in 2000. Plant genomics researchers have readily embraced new algorithms, technologies and approaches to generate genome, transcriptome and epigenome datasets for model and crop species that have permitted deep inferences into plant biology. Challenges in sequencing any genome include ploidy, heterozygosity and paralogy, all which are amplified in plant genomes compared to animal genomes due to the large genome sizes, high repetitive sequence content, and rampant whole- or segmental genome duplication. The ability to generate de novo transcriptome assemblies provides an alternative approach to bypass these complex genomes and access the gene space of these recalcitrant species. The field of genomics is driven by technological improvements in sequencing platforms; however, software and algorithm development has lagged behind reductions in sequencing costs, improved throughput, and quality improvements. It is anticipated that sequencing platforms will continue to improve the length and quality of output, and that the complementary algorithms and bioinformatic software needed to handle large, repetitive genomes will improve. The future is bright for an exponential improvement in our understanding of plant biology.
Collapse
Affiliation(s)
- John P Hamilton
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
| | | |
Collapse
|
219
|
Alcázar R, Pecinka A, Aarts MGM, Fransz PF, Koornneef M. Signals of speciation within Arabidopsis thaliana in comparison with its relatives. CURRENT OPINION IN PLANT BIOLOGY 2012; 15:205-211. [PMID: 22265228 DOI: 10.1016/j.pbi.2012.01.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Revised: 12/06/2011] [Accepted: 01/03/2012] [Indexed: 05/31/2023]
Abstract
The species within the now well-defined Arabidopsis genus provide biological materials suitable to investigate speciation and the development of reproductive isolation barriers between related species. Even within the model species A. thaliana, genetic differentiation between populations due to environmental adaptation or demographic history can lead to cases where hybrids between accessions are non-viable. Experimental evidence supports the importance of genome duplications and genetic epistatic interactions in the occurrence of reproductive isolation. Other examples of adaptation to specific environments can be found in Arabidopsis relatives where hybridization and chromosome doubling lead to new amphidiploid species. Molecular signals of speciation found in the Arabidopsis genus should provide a better understanding of speciation processes in plants from a genetic, molecular and evolutionary perspective.
Collapse
Affiliation(s)
- Rubén Alcázar
- Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, Carl-von-Linné-Weg 10, 50829 Cologne, Germany
| | | | | | | | | |
Collapse
|
220
|
Joshi HJ, Christiansen KM, Fitz J, Cao J, Lipzen A, Martin J, Smith-Moritz AM, Pennacchio LA, Schackwitz WS, Weigel D, Heazlewood JL. 1001 Proteomes: a functional proteomics portal for the analysis of Arabidopsis thaliana accessions. ACTA ACUST UNITED AC 2012; 28:1303-6. [PMID: 22451271 DOI: 10.1093/bioinformatics/bts133] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The sequencing of over a thousand natural strains of the model plant Arabidopsis thaliana is producing unparalleled information at the genetic level for plant researchers. To enable the rapid exploitation of these data for functional proteomics studies, we have created a resource for the visualization of protein information and proteomic datasets for sequenced natural strains of A. thaliana. RESULTS The 1001 Proteomes portal can be used to visualize amino acid substitutions or non-synonymous single-nucleotide polymorphisms in individual proteins of A. thaliana based on the reference genome Col-0. We have used the available processed sequence information to analyze the conservation of known residues subject to protein phosphorylation among these natural strains. The substitution of amino acids in A. thaliana natural strains is heavily constrained and is likely a result of the conservation of functional attributes within proteins. At a practical level, we demonstrate that this information can be used to clarify ambiguously defined phosphorylation sites from phosphoproteomic studies. Protein sets of available natural variants are available for download to enable proteomic studies on these accessions. Together this information can be used to uncover the possible roles of specific amino acids in determining the structure and function of proteins in the model plant A. thaliana. An online portal to enable the community to exploit these data can be accessed at http://1001proteomes.masc-proteomics.org/
Collapse
Affiliation(s)
- Hiren J Joshi
- Joint BioEnergy Institute and Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
221
|
Hansey CN, Vaillancourt B, Sekhon RS, de Leon N, Kaeppler SM, Buell CR. Maize (Zea mays L.) genome diversity as revealed by RNA-sequencing. PLoS One 2012; 7:e33071. [PMID: 22438891 PMCID: PMC3306378 DOI: 10.1371/journal.pone.0033071] [Citation(s) in RCA: 127] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2011] [Accepted: 02/08/2012] [Indexed: 11/18/2022] Open
Abstract
Maize is rich in genetic and phenotypic diversity. Understanding the sequence, structural, and expression variation that contributes to phenotypic diversity would facilitate more efficient varietal improvement. RNA based sequencing (RNA-seq) is a powerful approach for transcriptional analysis, assessing sequence variation, and identifying novel transcript sequences, particularly in large, complex, repetitive genomes such as maize. In this study, we sequenced RNA from whole seedlings of 21 maize inbred lines representing diverse North American and exotic germplasm. Single nucleotide polymorphism (SNP) detection identified 351,710 polymorphic loci distributed throughout the genome covering 22,830 annotated genes. Tight clustering of two distinct heterotic groups and exotic lines was evident using these SNPs as genetic markers. Transcript abundance analysis revealed minimal variation in the total number of genes expressed across these 21 lines (57.1% to 66.0%). However, the transcribed gene set among the 21 lines varied, with 48.7% expressed in all of the lines, 27.9% expressed in one to 20 lines, and 23.4% expressed in none of the lines. De novo assembly of RNA-seq reads that did not map to the reference B73 genome sequence revealed 1,321 high confidence novel transcripts, of which, 564 loci were present in all 21 lines, including B73, and 757 loci were restricted to a subset of the lines. RT-PCR validation demonstrated 87.5% concordance with the computational prediction of these expressed novel transcripts. Intriguingly, 145 of the novel de novo assembled loci were present in lines from only one of the two heterotic groups consistent with the hypothesis that, in addition to sequence polymorphisms and transcript abundance, transcript presence/absence variation is present and, thereby, may be a mechanism contributing to the genetic basis of heterosis.
Collapse
Affiliation(s)
- Candice N. Hansey
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Energy Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, United States of America
| | - Brieanne Vaillancourt
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Energy Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, United States of America
| | - Rajandeep S. Sekhon
- Department of Agronomy, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Energy Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Natalia de Leon
- Department of Agronomy, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Energy Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Shawn M. Kaeppler
- Department of Agronomy, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Energy Great Lakes Bioenergy Research Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - C. Robin Buell
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Energy Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, United States of America
- * E-mail:
| |
Collapse
|
222
|
Simon UK, Trajanoski S, Kroneis T, Sedlmayr P, Guelly C, Guttenberger H. Accession-Specific Haplotypes of the Internal Transcribed Spacer Region in Arabidopsis thaliana--A Means for Barcoding Populations. Mol Biol Evol 2012; 29:2231-9. [DOI: 10.1093/molbev/mss093] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
|
223
|
Schmitz RJ, Ecker JR. Epigenetic and epigenomic variation in Arabidopsis thaliana. TRENDS IN PLANT SCIENCE 2012; 17:149-54. [PMID: 22342533 PMCID: PMC3645451 DOI: 10.1016/j.tplants.2012.01.001] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Revised: 12/23/2011] [Accepted: 01/04/2012] [Indexed: 05/04/2023]
Abstract
Arabidopsis thaliana (Arabidopsis) is ideally suited for studies of natural phenotypic variation. This species has also provided an unparalleled experimental system to explore the mechanistic link between genetic and epigenetic variation, especially with regard to cytosine methylation. Using high-throughput sequencing methods, genotype to epigenotype to phenotype observations can now be extended to plant populations. We review the evidence for induced and spontaneous epigenetic variants that have been identified in Arabidopsis in the laboratory and discuss how these experimental observations could explain existing variation in the wild.
Collapse
Affiliation(s)
- Robert J Schmitz
- Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | | |
Collapse
|
224
|
Lu P, Han X, Qi J, Yang J, Wijeratne AJ, Li T, Ma H. Analysis of Arabidopsis genome-wide variations before and after meiosis and meiotic recombination by resequencing Landsberg erecta and all four products of a single meiosis. Genome Res 2012; 22:508-18. [PMID: 22106370 PMCID: PMC3290786 DOI: 10.1101/gr.127522.111] [Citation(s) in RCA: 94] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Accepted: 11/17/2011] [Indexed: 11/24/2022]
Abstract
Meiotic recombination, including crossovers (COs) and gene conversions (GCs), impacts natural variation and is an important evolutionary force. COs increase genetic diversity by redistributing existing variation, whereas GCs can alter allelic frequency. Here, we sequenced Arabidopsis Landsberg erecta (Ler) and two sets of all four meiotic products from a Columbia (Col)/Ler hybrid to investigate genome-wide variation and meiotic recombination at nucleotide resolution. Comparing Ler and Col sequences uncovered 349,171 Single Nucleotide Polymorphisms (SNPs), 58,085 small and 2315 large insertions/deletions (indels), with highly correlated genome-wide distributions of SNPs, and small indels. A total of 443 genes have at least 10 nonsynonymous substitutions in protein-coding regions, with enrichment for disease-resistance genes. Another 316 genes are affected by large indels, including 130 genes with complete deletion of coding regions in Ler. Using the Arabidopsis qrt1 mutant, two sets of four meiotic products were generated and analyzed by sequencing for meiotic recombination, representing the first tetrad analysis with whole-genome sequencing in a nonfungal species. We detected 18 COs, six of which had an associated GC event, and four GCs without COs (NCOs), and revealed that Arabidopsis GCs are likely fewer and with shorter tracts than those in yeast. Meiotic recombination and chromosome assortment events dramatically redistributed genome variation in meiotic products, contributing to population diversity. In particular, meiosis provides a rapid mechanism to generate copy-number variation (CNV) of sequences that have different chromosomal positions in Col and Ler.
Collapse
Affiliation(s)
- Pingli Lu
- Department of Biology and the Huck Institutes of the Life Sciences, the Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Xinwei Han
- Department of Biology and the Huck Institutes of the Life Sciences, the Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Intercollege Graduate Program in Genetics, the Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Ji Qi
- State Key Laboratory of Genetic Engineering, Institute of Plant Biology, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
- Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China
| | - Jiange Yang
- Department of Biology and the Huck Institutes of the Life Sciences, the Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Asela J. Wijeratne
- Department of Biology and the Huck Institutes of the Life Sciences, the Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Intercollege Graduate Program in Plant Biology, the Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Tao Li
- Institute of Hydrobiology, Chinese Academy of Science, Wuhan 430072, China
| | - Hong Ma
- State Key Laboratory of Genetic Engineering, Institute of Plant Biology, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai 200433, China
- Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China
| |
Collapse
|
225
|
Plackett AR, Powers SJ, Fernandez-Garcia N, Urbanova T, Takebayashi Y, Seo M, Jikumaru Y, Benlloch R, Nilsson O, Ruiz-Rivero O, Phillips AL, Wilson ZA, Thomas SG, Hedden P. Analysis of the developmental roles of the Arabidopsis gibberellin 20-oxidases demonstrates that GA20ox1, -2, and -3 are the dominant paralogs. THE PLANT CELL 2012; 24:941-60. [PMID: 22427334 PMCID: PMC3336139 DOI: 10.1105/tpc.111.095109] [Citation(s) in RCA: 141] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Revised: 02/16/2012] [Accepted: 02/27/2012] [Indexed: 05/18/2023]
Abstract
Gibberellin (GA) biosynthesis is necessary for normal plant development, with later GA biosynthetic stages being governed by multigene families. Arabidopsis thaliana contains five GA 20-oxidase (GA20ox) genes, and past work has demonstrated the importance of GA20ox1 and -2 for growth and fertility. Here, we show through systematic mutant analysis that GA20ox1, -2, and -3 are the dominant paralogs; their absence results in severe dwarfism and almost complete loss of fertility. In vitro analysis revealed that GA20ox4 has full GA20ox activity, but GA20ox5 catalyzes only the first two reactions of the sequence by which GA(12) is converted to GA(9). GA20ox3 functions almost entirely redundantly with GA20ox1 and -2 at most developmental stages, including the floral transition, while GA20ox4 and -5 have very minor roles. These results are supported by analysis of the gene expression patterns in promoter:β-glucuronidase reporter lines. We demonstrate that fertility is highly sensitive to GA concentration, that GA20ox1, -2, and -3 have significant effects on floral organ growth and anther development, and that both GA deficiency and overdose impact on fertility. Loss of GA20ox activity causes anther developmental arrest, with the tapetum failing to degrade. Some phenotypic recovery of late flowers in GA-deficient mutants, including ga1-3, indicated the involvement of non-GA pathways in floral development.
Collapse
Affiliation(s)
- Andrew R.G. Plackett
- Plant Science Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, United Kingdom
| | - Stephen J. Powers
- Biomathematics and Bioinformatics Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, United Kingdom
| | - Nieves Fernandez-Garcia
- Plant Science Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, United Kingdom
| | - Terezie Urbanova
- Plant Science Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, United Kingdom
| | | | - Mitsunori Seo
- RIKEN Plant Science Center, Yokohama, Kanagawa 230-0045, Japan
| | - Yusuke Jikumaru
- RIKEN Plant Science Center, Yokohama, Kanagawa 230-0045, Japan
| | - Reyes Benlloch
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, S-90183 Umea, Sweden
| | - Ove Nilsson
- Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiology, Swedish University of Agricultural Sciences, S-90183 Umea, Sweden
| | - Omar Ruiz-Rivero
- Plant Science Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, United Kingdom
| | - Andrew L. Phillips
- Plant Science Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, United Kingdom
| | - Zoe A. Wilson
- School of Biosciences, University of Nottingham, Loughborough, Leicestershire LE12 5RD, United Kingdom
| | - Stephen G. Thomas
- Plant Science Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, United Kingdom
| | - Peter Hedden
- Plant Science Department, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, United Kingdom
- Address correspondence to
| |
Collapse
|
226
|
Zhang G, Fedyunin I, Kirchner S, Xiao C, Valleriani A, Ignatova Z. FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads. Nucleic Acids Res 2012; 40:e83. [PMID: 22379138 PMCID: PMC3367211 DOI: 10.1093/nar/gks196] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The most crucial step in data processing from high-throughput sequencing applications is the accurate and sensitive alignment of the sequencing reads to reference genomes or transcriptomes. The accurate detection of insertions and deletions (indels) and errors introduced by the sequencing platform or by misreading of modified nucleotides is essential for the quantitative processing of the RNA-based sequencing (RNA-Seq) datasets and for the identification of genetic variations and modification patterns. We developed a new, fast and accurate algorithm for nucleic acid sequence analysis, FANSe, with adjustable mismatch allowance settings and ability to handle indels to accurately and quantitatively map millions of reads to small or large reference genomes. It is a seed-based algorithm which uses the whole read information for mapping and high sensitivity and low ambiguity are achieved by using short and non-overlapping reads. Furthermore, FANSe uses hotspot score to prioritize the processing of highly possible matches and implements modified Smith–Watermann refinement with reduced scoring matrix to accelerate the calculation without compromising its sensitivity. The FANSe algorithm stably processes datasets from various sequencing platforms, masked or unmasked and small or large genomes. It shows a remarkable coverage of low-abundance mRNAs which is important for quantitative processing of RNA-Seq datasets.
Collapse
Affiliation(s)
- Gong Zhang
- Biochemistry, Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14467 Potsdam, Germany.
| | | | | | | | | | | |
Collapse
|
227
|
Yant L. Genome-wide mapping of transcription factor binding reveals developmental process integration and a fresh look at evolutionary dynamics. AMERICAN JOURNAL OF BOTANY 2012; 99:277-90. [PMID: 22268222 DOI: 10.3732/ajb.1100333] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
How does evolution forge adaptive responses? Are many changes required or few? Just how complex are the transcriptional networks that control development? Diverse questions like these are being newly addressed by next-generation sequencing-based techniques. Facilitating a mechanistic understanding, these approaches reveal the direct in vivo interactions between transcription factors and their physical targets, combined with genome-scale readouts to comprehensively map adaptive gene regulatory networks (GRNs). Here I focus on pioneering work from the last 3 years that has leveraged these data to investigate diverse aspects of GRN circuitry controlling the reproductive transition in plants. These approaches have revealed surprising new functions for long-investigated key players in developmental programs and laid bare the basis for pleiotropy in many others, suggesting widespread process integration at the transcriptional level. Evolutionary questions begged by the recent deluge of GRN mapping data are being assessed anew, both by emerging work outside Arabidopsis thaliana and novel analyses within. These studies have swiftly exposed the distinctive power and adaptability of genome-wide GRN mapping and illustrate that this unique data type holds tremendous promise for plant biology.
Collapse
Affiliation(s)
- Levi Yant
- Department of Organismic and Evolutionary Biology, Harvard University, 22 Oxford Street, Cambridge, Massachusetts 02138, USA.
| |
Collapse
|
228
|
Sahu BB, Sumit R, Srivastava SK, Bhattacharyya MK. Sequence based polymorphic (SBP) marker technology for targeted genomic regions: its application in generating a molecular map of the Arabidopsis thaliana genome. BMC Genomics 2012; 13:20. [PMID: 22244314 PMCID: PMC3323429 DOI: 10.1186/1471-2164-13-20] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Accepted: 01/13/2012] [Indexed: 08/30/2023] Open
Abstract
Background Molecular markers facilitate both genotype identification, essential for modern animal and plant breeding, and the isolation of genes based on their map positions. Advancements in sequencing technology have made possible the identification of single nucleotide polymorphisms (SNPs) for any genomic regions. Here a sequence based polymorphic (SBP) marker technology for generating molecular markers for targeted genomic regions in Arabidopsis is described. Results A ~3X genome coverage sequence of the Arabidopsis thaliana ecotype, Niederzenz (Nd-0) was obtained by applying Illumina's sequencing by synthesis (Solexa) technology. Comparison of the Nd-0 genome sequence with the assembled Columbia-0 (Col-0) genome sequence identified putative single nucleotide polymorphisms (SNPs) throughout the entire genome. Multiple 75 base pair Nd-0 sequence reads containing SNPs and originating from individual genomic DNA molecules were the basis for developing co-dominant SBP markers. SNPs containing Col-0 sequences, supported by transcript sequences or sequences from multiple BAC clones, were compared to the respective Nd-0 sequences to identify possible restriction endonuclease enzyme site variations. Small amplicons, PCR amplified from both ecotypes, were digested with suitable restriction enzymes and resolved on a gel to reveal the sequence based polymorphisms. By applying this technology, 21 SBP markers for the marker poor regions of the Arabidopsis map representing polymorphisms between Col-0 and Nd-0 ecotypes were generated. Conclusions The SBP marker technology described here allowed the development of molecular markers for targeted genomic regions of Arabidopsis. It should facilitate isolation of co-dominant molecular markers for targeted genomic regions of any animal or plant species, whose genomic sequences have been assembled. This technology will particularly facilitate the development of high density molecular marker maps, essential for cloning genes based on their genetic map positions and identifying tightly linked molecular markers for selecting desirable genotypes in animal and plant breeding experiments.
Collapse
Affiliation(s)
- Binod B Sahu
- Department of Agronomy, Iowa State University, Ames, Iowa 50011, USA
| | | | | | | |
Collapse
|
229
|
Saeed F, Perez-Rathke A, Gwarnicki J, Berger-Wolf T, Khokhar A. High Performance Multiple Sequence Alignment System for Pyrosequencing Reads from Multiple Reference Genomes. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 2012; 72:83-93. [PMID: 23125479 PMCID: PMC3486434 DOI: 10.1016/j.jpdc.2011.08.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Genome resequencing with short reads generated from pyrosequencing generally relies on mapping the short reads against a single reference genome. However, mapping of reads from multiple reference genomes is not possible using a pairwise mapping algorithm. In order to align the reads w.r.t each other and the reference genomes, existing multiple sequence alignment(MSA) methods cannot be used because they do not take into account the position of these short reads with respect to the genome, and are highly inefficient for large number of sequences. In this paper, we develop a highly scalable parallel algorithm based on domain decomposition, referred to as P-Pyro-Align, to align such large number of reads from single or multiple reference genomes. The proposed alignment algorithm accurately aligns the erroneous reads, and has been implemented on a cluster of workstations using MPI library. Experimental results for different problem sizes are analyzed in terms of execution time, quality of the alignments, and the ability of the algorithm to handle reads from multiple haplotypes. We report high quality multiple alignment of up to 0.5 million reads. The algorithm is shown to be highly scalable and exhibits super-linear speedups with increasing number of processors.
Collapse
Affiliation(s)
- Fahad Saeed
- Department of Computer Science, University of Illinois at Chicago, IL USA
| | | | | | | | | |
Collapse
|
230
|
Weigel D. Natural variation in Arabidopsis: from molecular genetics to ecological genomics. PLANT PHYSIOLOGY 2012; 158:2-22. [PMID: 22147517 PMCID: PMC3252104 DOI: 10.1104/pp.111.189845] [Citation(s) in RCA: 242] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Accepted: 12/05/2011] [Indexed: 05/18/2023]
Affiliation(s)
- Detlef Weigel
- Max Planck Institute for Developmental Biology, 72076 Tuebingen, Germany.
| |
Collapse
|
231
|
Abstract
Allelic variation within species provides fundamental insights into the evolution and ecology of organisms, and information about this variation is becoming increasingly available in sequence datasets of multiple and/or outbred individuals. Unfortunately, identifying true allelic variants poses a number of challenges, given the presence of both sequencing errors and alleles from other closely related loci. We outline the key considerations involved in this process, including assessing the accuracy of allele resolution in sequence assembly, clustering of alleles within and among individuals, and identifying clusters that are most likely to correspond to true allelic variants of a single locus. Our focus is particularly on the case where alleles must be identified without a fully resolved reference genome, and where sequence depth information cannot be used to infer the putative number of loci sharing a sequence, such as in transcriptome or post-assembly datasets. Throughout, we provide information about publicly available tools to aid allele identification in such cases.
Collapse
Affiliation(s)
- Katrina M Dlugosch
- Department of Ecology & Evolutionary Biology, University of Arizona, Tucson, AZ, USA.
| | | |
Collapse
|
232
|
Kumar S, Banks TW, Cloutier S. SNP Discovery through Next-Generation Sequencing and Its Applications. INTERNATIONAL JOURNAL OF PLANT GENOMICS 2012; 2012:831460. [PMID: 23227038 PMCID: PMC3512287 DOI: 10.1155/2012/831460] [Citation(s) in RCA: 150] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2012] [Accepted: 10/08/2012] [Indexed: 05/08/2023]
Abstract
The decreasing cost along with rapid progress in next-generation sequencing and related bioinformatics computing resources has facilitated large-scale discovery of SNPs in various model and nonmodel plant species. Large numbers and genome-wide availability of SNPs make them the marker of choice in partially or completely sequenced genomes. Although excellent reviews have been published on next-generation sequencing, its associated bioinformatics challenges, and the applications of SNPs in genetic studies, a comprehensive review connecting these three intertwined research areas is needed. This paper touches upon various aspects of SNP discovery, highlighting key points in availability and selection of appropriate sequencing platforms, bioinformatics pipelines, SNP filtering criteria, and applications of SNPs in genetic analyses. The use of next-generation sequencing methodologies in many non-model crops leading to discovery and implementation of SNPs in various genetic studies is discussed. Development and improvement of bioinformatics software that are open source and freely available have accelerated the SNP discovery while reducing the associated cost. Key considerations for SNP filtering and associated pipelines are discussed in specific topics. A list of commonly used software and their sources is compiled for easy access and reference.
Collapse
Affiliation(s)
- Santosh Kumar
- Department of Plant Science, University of Manitoba, Winnipeg, MB, Canada R3T 2N2
| | - Travis W. Banks
- Department of Applied Genomics, Vineland Research and Innovation Centre, Vineland Station, ON, Canada L0R 2E0
| | - Sylvie Cloutier
- Department of Plant Science, University of Manitoba, Winnipeg, MB, Canada R3T 2N2
- Cereal Research Centre, Agriculture and Agri-Food Canada, Winnipeg, MB, Canada R3T 2M9
- *Sylvie Cloutier:
| |
Collapse
|
233
|
Abstract
Legumes are the third-largest family of angiosperms, the second-most-important crop family, and a key source of biological nitrogen in agriculture. Recently, the genome sequences of Glycine max (soybean), Medicago truncatula, and Lotus japonicus were substantially completed. Comparisons among legume genomes reveal a key role for duplication, especially a whole-genome duplication event approximately 58 Mya that is shared by most agriculturally important legumes. A second and more recent genome duplication occurred only in the lineage leading to soybean. Outcomes of genome duplication, including gene fractionation and sub- and neofunctionalization, have played key roles in shaping legume genomes and in the evolution of legume-specific traits. Analysis of legume genome sequences also enables the discovery of legume-specific gene families and provides a framework for genome-wide association mapping that will target phenotypes of special importance in legumes. Translating genomic resources from sequenced species to less studied but still important "orphan" legumes will enhance prospects for world food production.
Collapse
Affiliation(s)
- Nevin D Young
- Department of Plant Pathology and Department of Plant Biology, University of Minnesota, St. Paul, MN 55108, USA.
| | | |
Collapse
|
234
|
Huang X, Zhao Y, Wei X, Li C, Wang A, Zhao Q, Li W, Guo Y, Deng L, Zhu C, Fan D, Lu Y, Weng Q, Liu K, Zhou T, Jing Y, Si L, Dong G, Huang T, Lu T, Feng Q, Qian Q, Li J, Han B. Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nat Genet 2011; 44:32-9. [PMID: 22138690 DOI: 10.1038/ng.1018] [Citation(s) in RCA: 615] [Impact Index Per Article: 43.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Accepted: 11/02/2011] [Indexed: 12/20/2022]
Abstract
A high-density haplotype map recently enabled a genome-wide association study (GWAS) in a population of indica subspecies of Chinese rice landraces. Here we extend this methodology to a larger and more diverse sample of 950 worldwide rice varieties, including the Oryza sativa indica and Oryza sativa japonica subspecies, to perform an additional GWAS. We identified a total of 32 new loci associated with flowering time and with ten grain-related traits, indicating that the larger sample increased the power to detect trait-associated variants using GWAS. To characterize various alleles and complex genetic variation, we developed an analytical framework for haplotype-based de novo assembly of the low-coverage sequencing data in rice. We identified candidate genes for 18 associated loci through detailed annotation. This study shows that the integrated approach of sequence-based GWAS and functional genome annotation has the potential to match complex traits to their causal polymorphisms in rice.
Collapse
Affiliation(s)
- Xuehui Huang
- National Center for Gene Research, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
235
|
Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, Karthikeyan AS, Lee CH, Nelson WD, Ploetz L, Singh S, Wensel A, Huala E. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res 2011; 40:D1202-10. [PMID: 22140109 PMCID: PMC3245047 DOI: 10.1093/nar/gkr1090] [Citation(s) in RCA: 1541] [Impact Index Per Article: 110.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
The Arabidopsis Information Resource (TAIR, http://arabidopsis.org) is a genome database for Arabidopsis thaliana, an important reference organism for many fundamental aspects of biology as well as basic and applied plant biology research. TAIR serves as a central access point for Arabidopsis data, annotates gene function and expression patterns using controlled vocabulary terms, and maintains and updates the A. thaliana genome assembly and annotation. TAIR also provides researchers with an extensive set of visualization and analysis tools. Recent developments include several new genome releases (TAIR8, TAIR9 and TAIR10) in which the A. thaliana assembly was updated, pseudogenes and transposon genes were re-annotated, and new data from proteomics and next generation transcriptome sequencing were incorporated into gene models and splice variants. Other highlights include progress on functional annotation of the genome and the release of several new tools including Textpresso for Arabidopsis which provides the capability to carry out full text searches on a large body of research literature.
Collapse
Affiliation(s)
- Philippe Lamesch
- Department of Plant Biology, Carnegie Institution, 260 Panama St, Stanford, CA 94305, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
236
|
Zheng LY, Guo XS, He B, Sun LJ, Peng Y, Dong SS, Liu TF, Jiang S, Ramachandran S, Liu CM, Jing HC. Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor). Genome Biol 2011; 12:R114. [PMID: 22104744 PMCID: PMC3334600 DOI: 10.1186/gb-2011-12-11-r114] [Citation(s) in RCA: 167] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Revised: 11/04/2011] [Accepted: 11/21/2011] [Indexed: 01/22/2023] Open
Abstract
Background Sorghum (Sorghum bicolor) is globally produced as a source of food, feed, fiber and fuel. Grain and sweet sorghums differ in a number of important traits, including stem sugar and juice accumulation, plant height as well as grain and biomass production. The first whole genome sequence of a grain sorghum is available, but additional genome sequences are required to study genome-wide and intraspecific variation for dissecting the genetic basis of these important traits and for tailor-designed breeding of this important C4 crop. Results We resequenced two sweet and one grain sorghum inbred lines, and identified a set of nearly 1,500 genes differentiating sweet and grain sorghum. These genes fall into ten major metabolic pathways involved in sugar and starch metabolisms, lignin and coumarin biosynthesis, nucleic acid metabolism, stress responses and DNA damage repair. In addition, we uncovered 1,057,018 SNPs, 99,948 indels of 1 to 10 bp in length and 16,487 presence/absence variations as well as 17,111 copy number variations. The majority of the large-effect SNPs, indels and presence/absence variations resided in the genes containing leucine rich repeats, PPR repeats and disease resistance R genes possessing diverse biological functions or under diversifying selection, but were absent in genes that are essential for life. Conclusions This is a first report of the identification of genome-wide patterns of genetic variation in sorghum. High-density SNP and indel markers reported here will be a valuable resource for future gene-phenotype studies and the molecular breeding of this important crop and related species.
Collapse
Affiliation(s)
- Lei-Ying Zheng
- Institute of Botany, Chinese Academy of Sciences, Beijing, China.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
237
|
Jiménez-Gómez JM. Next generation quantitative genetics in plants. FRONTIERS IN PLANT SCIENCE 2011; 2:77. [PMID: 22645550 PMCID: PMC3355736 DOI: 10.3389/fpls.2011.00077] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Accepted: 10/23/2011] [Indexed: 05/31/2023]
Abstract
Most characteristics in living organisms show continuous variation, which suggests that they are controlled by multiple genes. Quantitative trait loci (QTL) analysis can identify the genes underlying continuous traits by establishing associations between genetic markers and observed phenotypic variation in a segregating population. The new high-throughput sequencing (HTS) technologies greatly facilitate QTL analysis by providing genetic markers at genome-wide resolution in any species without previous knowledge of its genome. In addition HTS serves to quantify molecular phenotypes, which aids to identify the loci responsible for QTLs and to understand the mechanisms underlying diversity. The constant improvements in price, experimental protocols, computational pipelines, and statistical frameworks are making feasible the use of HTS for any research group interested in quantitative genetics. In this review I discuss the application of HTS for molecular marker discovery, population genotyping, and expression profiling in QTL analysis.
Collapse
Affiliation(s)
- José M. Jiménez-Gómez
- Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding ResearchKöln, Germany
| |
Collapse
|
238
|
Bolle C, Schneider A, Leister D. Perspectives on Systematic Analyses of Gene Function in Arabidopsis thaliana: New Tools, Topics and Trends. Curr Genomics 2011; 12:1-14. [PMID: 21886450 PMCID: PMC3129038 DOI: 10.2174/138920211794520187] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Revised: 10/28/2010] [Accepted: 11/23/2010] [Indexed: 11/22/2022] Open
Abstract
Since the sequencing of the nuclear genome of Arabidopsis thaliana ten years ago, various large-scale analyses of gene function have been performed in this model species. In particular, the availability of collections of lines harbouring random T-DNA or transposon insertions, which include mutants for almost all of the ~27,000 A. thaliana genes, has been crucial for the success of forward and reverse genetic approaches. In the foreseeable future, genome-wide phenotypic data from mutant analyses will become available for Arabidopsis, and will stimulate a flood of novel in-depth gene-function analyses. In this review, we consider the present status of resources and concepts for systematic studies of gene function in A. thaliana. Current perspectives on the utility of loss-of-function and gain-of-function mutants will be discussed in light of the genetic and functional redundancy of many A. thaliana genes.
Collapse
Affiliation(s)
- C Bolle
- Lehrstuhl für Molekularbiologie der Pflanzen (Botanik), Department Biologie I, Ludwig-Maximilians-Universität München, Großhaderner Str. 2, D-82152 Planegg-Martinsried, Germany
| | | | | |
Collapse
|
239
|
Muralidharan O, Natsoulis G, Bell J, Newburger D, Xu H, Kela I, Ji H, Zhang N. A cross-sample statistical model for SNP detection in short-read sequencing data. Nucleic Acids Res 2011; 40:e5. [PMID: 22064853 PMCID: PMC3245949 DOI: 10.1093/nar/gkr851] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Highly multiplex DNA sequencers have greatly expanded our ability to survey human genomes for previously unknown single nucleotide polymorphisms (SNPs). However, sequencing and mapping errors, though rare, contribute substantially to the number of false discoveries in current SNP callers. We demonstrate that we can significantly reduce the number of false positive SNP calls by pooling information across samples. Although many studies prepare and sequence multiple samples with the same protocol, most existing SNP callers ignore cross-sample information. In contrast, we propose an empirical Bayes method that uses cross-sample information to learn the error properties of the data. This error information lets us call SNPs with a lower false discovery rate than existing methods.
Collapse
Affiliation(s)
- Omkar Muralidharan
- Department of Statistics, Stanford University, 390 Serra Mall, Stanford, CA, 94305, USA
| | | | | | | | | | | | | | | |
Collapse
|
240
|
Chen H, He H, Zou Y, Chen W, Yu R, Liu X, Yang Y, Gao YM, Xu JL, Fan LM, Li Y, Li ZK, Deng XW. Development and application of a set of breeder-friendly SNP markers for genetic analyses and molecular breeding of rice (Oryza sativa L.). TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2011; 123:869-79. [PMID: 21681488 DOI: 10.1007/s00122-011-1633-5] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2011] [Accepted: 06/01/2011] [Indexed: 05/20/2023]
Abstract
Single nucleotide polymorphisms (SNPs) are the most abundant DNA markers in plant genomes. In this study, based on 54,465 SNPs between the genomes of two Indica varieties, Minghui 63 (MH63) and Zhenshan 97 (ZS97) and additional 20,705 SNPs between the MH63 and Nipponbare genomes, we identified and confirmed 1,633 well-distributed SNPs by PCR and Sanger sequencing. From these, a set of 372 SNPs were further selected to analyze the patterns of genetic diversity in 300 representative rice inbred lines from 22 rice growing countries worldwide. Using this set of SNPs, we were able to uncover the well-known Indica-Japonica subspecific differentiation and geographic differentiations within Indica and Japonica. Furthermore, our SNP results revealed some common and contrasting patterns of the haplotype diversity along different rice chromosomes in the Indica and Japonica accessions, which suggest different evolutionary forces possibly acting in specific regions of the rice genome during domestication and evolution of rice. Our results demonstrated that this set of SNPs can be used as anchor SNPs for large scale genotyping in rice molecular breeding research involving Indica-Japonica and Indica-Indica crosses.
Collapse
Affiliation(s)
- Haodong Chen
- Peking-Yale Joint Center for Plant Molecular Genetics and Agro-biotechnology, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, 100871 Beijing, China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
241
|
Guo YL, Fitz J, Schneeberger K, Ossowski S, Cao J, Weigel D. Genome-wide comparison of nucleotide-binding site-leucine-rich repeat-encoding genes in Arabidopsis. PLANT PHYSIOLOGY 2011; 157:757-69. [PMID: 21810963 PMCID: PMC3192553 DOI: 10.1104/pp.111.181990] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Accepted: 08/01/2011] [Indexed: 05/18/2023]
Abstract
Plants, like animals, use several lines of defense against pathogen attack. Prominent among genes that confer disease resistance are those encoding nucleotide-binding site-leucine-rich repeat (NB-LRR) proteins. Likely due to selection pressures caused by pathogens, NB-LRR genes are the most variable gene family in plants, but there appear to be species-specific limits to the number of NB-LRR genes in a genome. Allelic diversity within an individual is also increased by obligatory outcrossing, which leads to genome-wide heterozygosity. In this study, we compared the NB-LRR gene complement of the selfer Arabidopsis thaliana and its outcrossing close relative Arabidopsis lyrata. We then complemented and contrasted the interspecific patterns with studies of NB-LRR diversity within A. thaliana. Three important insights are as follows: (1) that both species have similar numbers of NB-LRR genes; (2) that loci with single NB-LRR genes are less variable than tandem arrays; and (3) that presence-absence polymorphisms within A. thaliana are not strongly correlated with the presence or absence of orthologs in A. lyrata. Although A. thaliana individuals are mostly homozygous and thus potentially less likely to suffer from aberrant interaction of NB-LRR proteins with newly introduced alleles, the number of NB-LRR genes is similar to that in A. lyrata. In intraspecific and interspecific comparisons, NB-LRR genes are also more variable than receptor-like protein genes. Finally, in contrast to Drosophila, there is a clearly positive relationship between interspecific divergence and intraspecific polymorphisms.
Collapse
|
242
|
Double-strand break repair processes drive evolution of the mitochondrial genome in Arabidopsis. BMC Biol 2011; 9:64. [PMID: 21951689 PMCID: PMC3193812 DOI: 10.1186/1741-7007-9-64] [Citation(s) in RCA: 165] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Accepted: 09/27/2011] [Indexed: 11/12/2022] Open
Abstract
Background The mitochondrial genome of higher plants is unusually dynamic, with recombination and nonhomologous end-joining (NHEJ) activities producing variability in size and organization. Plant mitochondrial DNA also generally displays much lower nucleotide substitution rates than mammalian or yeast systems. Arabidopsis displays these features and expedites characterization of the mitochondrial recombination surveillance gene MSH1 (MutS 1 homolog), lending itself to detailed study of de novo mitochondrial genome activity. In the present study, we investigated the underlying basis for unusual plant features as they contribute to rapid mitochondrial genome evolution. Results We obtained evidence of double-strand break (DSB) repair, including NHEJ, sequence deletions and mitochondrial asymmetric recombination activity in Arabidopsis wild-type and msh1 mutants on the basis of data generated by Illumina deep sequencing and confirmed by DNA gel blot analysis. On a larger scale, with mitochondrial comparisons across 72 Arabidopsis ecotypes, similar evidence of DSB repair activity differentiated ecotypes. Forty-seven repeat pairs were active in DNA exchange in the msh1 mutant. Recombination sites showed asymmetrical DNA exchange within lengths of 50- to 556-bp sharing sequence identity as low as 85%. De novo asymmetrical recombination involved heteroduplex formation, gene conversion and mismatch repair activities. Substoichiometric shifting by asymmetrical exchange created the appearance of rapid sequence gain and loss in association with particular repeat classes. Conclusions Extensive mitochondrial genomic variation within a single plant species derives largely from DSB activity and its repair. Observed gene conversion and mismatch repair activity contribute to the low nucleotide substitution rates seen in these genomes. On a phenotypic level, these patterns of rearrangement likely contribute to the reproductive versatility of higher plants.
Collapse
|
243
|
Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature 2011; 480:245-9. [PMID: 22057020 DOI: 10.1038/nature10555] [Citation(s) in RCA: 483] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Accepted: 09/13/2011] [Indexed: 11/09/2022]
Abstract
Heritable epigenetic polymorphisms, such as differential cytosine methylation, can underlie phenotypic variation. Moreover, wild strains of the plant Arabidopsis thaliana differ in many epialleles, and these can influence the expression of nearby genes. However, to understand their role in evolution, it is imperative to ascertain the emergence rate and stability of epialleles, including those that are not due to structural variation. We have compared genome-wide DNA methylation among 10 A. thaliana lines, derived 30 generations ago from a common ancestor. Epimutations at individual positions were easily detected, and close to 30,000 cytosines in each strain were differentially methylated. In contrast, larger regions of contiguous methylation were much more stable, and the frequency of changes was in the same low range as that of DNA mutations. Like individual positions, the same regions were often affected by differential methylation in independent lines, with evidence for recurrent cycles of forward and reverse mutations. Transposable elements and short interfering RNAs have been causally linked to DNA methylation. In agreement, differentially methylated sites were farther from transposable elements and showed less association with short interfering RNA expression than invariant positions. The biased distribution and frequent reversion of epimutations have important implications for the potential contribution of sequence-independent epialleles to plant evolution.
Collapse
|
244
|
Johnson DBF, Xu J, Shen Z, Takimoto JK, Schultz MD, Schmitz RJ, Xiang Z, Ecker JR, Briggs SP, Wang L. RF1 knockout allows ribosomal incorporation of unnatural amino acids at multiple sites. Nat Chem Biol 2011; 7:779-86. [PMID: 21926996 PMCID: PMC3201715 DOI: 10.1038/nchembio.657] [Citation(s) in RCA: 278] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Accepted: 07/18/2011] [Indexed: 11/09/2022]
Abstract
Stop codons have been exploited for genetic incorporation of unnatural amino acids (Uaas) in live cells, but the efficiency is low possibly due to competition from release factors, limiting the power and scope of this technology. Here we show that the reportedly essential release factor 1 can be knocked out from Escherichia coli by fixing release factor 2. The resultant strain JX33 is stable and independent, and reassigns UAG from a stop signal to an amino acid when a UAG-decoding tRNA/synthetase pair is introduced. Uaas were efficiently incorporated at multiple UAG sites in the same gene without translational termination in JX33. We also found that amino acid incorporation at endogenous UAG codons is dependent on RF1 and mRNA context, which explains why E. coli tolerates apparent global suppression of UAG. JX33 affords a unique autonomous host for synthesizing and evolving novel protein functions by enabling Uaa incorporation at multiple sites.
Collapse
Affiliation(s)
- David B F Johnson
- The Jack H. Skirball Center for Chemical Biology and Proteomics, The Salk Institute for Biological Studies, La Jolla, California, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
245
|
Slotte T, Bataillon T, Hansen TT, St Onge K, Wright SI, Schierup MH. Genomic determinants of protein evolution and polymorphism in Arabidopsis. Genome Biol Evol 2011; 3:1210-9. [PMID: 21926095 PMCID: PMC3296466 DOI: 10.1093/gbe/evr094] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Recent results from Drosophila suggest that positive selection has a substantial impact on genomic patterns of polymorphism and divergence. However, species with smaller population sizes and/or stronger population structure may not be expected to exhibit Drosophila-like patterns of sequence variation. We test this prediction and identify determinants of levels of polymorphism and rates of protein evolution using genomic data from Arabidopsis thaliana and the recently sequenced Arabidopsis lyrata genome. We find that, in contrast to Drosophila, there is no negative relationship between nonsynonymous divergence and silent polymorphism at any spatial scale examined. Instead, synonymous divergence is a major predictor of silent polymorphism, which suggests variation in mutation rate as the main determinant of silent variation. Variation in rates of protein divergence is mainly correlated with gene expression level and breadth, consistent with results for a broad range of taxa, and map-based estimates of recombination rate are only weakly correlated with nonsynonymous divergence. Variation in mutation rates and the strength of purifying selection seem to be major drivers of patterns of polymorphism and divergence in Arabidopsis. Nevertheless, a model allowing for varying negative and positive selection by functional gene category explains the data better than a homogeneous model, implying the action of positive selection on a subset of genes. Genes involved in disease resistance and abiotic stress display high proportions of adaptive substitution. Our results are important for a general understanding of the determinants of rates of protein evolution and the impact of selection on patterns of polymorphism and divergence.
Collapse
Affiliation(s)
- Tanja Slotte
- Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Sweden.
| | | | | | | | | | | |
Collapse
|
246
|
Haiminen N, Kuhn DN, Parida L, Rigoutsos I. Evaluation of methods for de novo genome assembly from high-throughput sequencing reads reveals dependencies that affect the quality of the results. PLoS One 2011; 6:e24182. [PMID: 21915294 PMCID: PMC3168497 DOI: 10.1371/journal.pone.0024182] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Accepted: 08/01/2011] [Indexed: 12/19/2022] Open
Abstract
Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole-genome assembly an appealing target application. In this paper we evaluate the feasibility of de novo genome assembly from short reads (≤100 nucleotides) through a detailed study involving genomic sequences of various lengths and origin, in conjunction with several of the currently popular assembly programs. Our extensive analysis demonstrates that, in addition to sequencing coverage, attributes such as the architecture of the target genome, the identity of the used assembly program, the average read length and the observed sequencing error rates are powerful variables that affect the best achievable assembly of the target sequence in terms of size and correctness.
Collapse
Affiliation(s)
- Niina Haiminen
- Computational Biology Center, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, United States of America
- * E-mail: (NH); (IR)
| | - David N. Kuhn
- Subtropical Horticulture Research Station, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Miami, Florida, United Sates of America
| | - Laxmi Parida
- Computational Biology Center, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, United States of America
| | - Isidore Rigoutsos
- Computational Biology Center, IBM Thomas J. Watson Research Center, Yorktown Heights, New York, United States of America
- * E-mail: (NH); (IR)
| |
Collapse
|
247
|
Delker C, Quint M. Expression level polymorphisms: heritable traits shaping natural variation. TRENDS IN PLANT SCIENCE 2011; 16:481-488. [PMID: 21700486 DOI: 10.1016/j.tplants.2011.05.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Revised: 05/12/2011] [Accepted: 05/18/2011] [Indexed: 05/31/2023]
Abstract
Natural accessions of many species harbor a wealth of genetic variation visible in a large array of phenotypes. Although expression level polymorphisms (ELPs) in several genes have been shown to contribute to variation in diverse traits, their general impact on adaptive variation has likely been underestimated. At present, ELPs have predominantly been correlated to quantitative trait loci (eQTLs) that occupy central hubs in signaling networks, which pleiotropically affect numerous traits. To increase the sensitivity of detecting minor effect eQTLs or those that act in a trait-specific manner, we emphasize the need for more systematic approaches. This requires, but is not limited to, refining experimental designs such as reduction of tissue complexity and combinatorial methods including a priori defined networks.
Collapse
Affiliation(s)
- Carolin Delker
- Leibniz Institute of Plant Biochemistry, Independent Junior Research Group, Department of Molecular Signal Processing, Weinberg 3, 06120 Halle (Saale), Germany
| | | |
Collapse
|
248
|
Gan X, Stegle O, Behr J, Steffen JG, Drewe P, Hildebrand KL, Lyngsoe R, Schultheiss SJ, Osborne EJ, Sreedharan VT, Kahles A, Bohnert R, Jean G, Derwent P, Kersey P, Belfield EJ, Harberd NP, Kemen E, Toomajian C, Kover PX, Clark RM, Rätsch G, Mott R. Multiple reference genomes and transcriptomes for Arabidopsis thaliana. Nature 2011; 477:419-23. [PMID: 21874022 PMCID: PMC4856438 DOI: 10.1038/nature10414] [Citation(s) in RCA: 469] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2011] [Accepted: 08/05/2011] [Indexed: 01/07/2023]
Abstract
Genetic differences between Arabidopsis thaliana accessions underlie the plant's extensive phenotypic variation, and until now these have been interpreted largely in the context of the annotated reference accession Col-0. Here we report the sequencing, assembly and annotation of the genomes of 18 natural A. thaliana accessions, and their transcriptomes. When assessed on the basis of the reference annotation, one-third of protein-coding genes are predicted to be disrupted in at least one accession. However, re-annotation of each genome revealed that alternative gene models often restore coding potential. Gene expression in seedlings differed for nearly half of expressed genes and was frequently associated with cis variants within 5 kilobases, as were intron retention alternative splicing events. Sequence and expression variation is most pronounced in genes that respond to the biotic environment. Our data further promote evolutionary and functional studies in A. thaliana, especially the MAGIC genetic reference population descended from these accessions.
Collapse
Affiliation(s)
- Xiangchao Gan
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
249
|
Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat Genet 2011; 43:956-63. [PMID: 21874002 DOI: 10.1038/ng.911] [Citation(s) in RCA: 643] [Impact Index Per Article: 45.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2011] [Accepted: 07/26/2011] [Indexed: 12/20/2022]
Abstract
The plant Arabidopsis thaliana occurs naturally in many different habitats throughout Eurasia. As a foundation for identifying genetic variation contributing to adaptation to diverse environments, a 1001 Genomes Project to sequence geographically diverse A. thaliana strains has been initiated. Here we present the first phase of this project, based on population-scale sequencing of 80 strains drawn from eight regions throughout the species' native range. We describe the majority of common small-scale polymorphisms as well as many larger insertions and deletions in the A. thaliana pan-genome, their effects on gene function, and the patterns of local and global linkage among these variants. The action of processes other than spontaneous mutation is identified by comparing the spectrum of mutations that have accumulated since A. thaliana diverged from its closest relative 10 million years ago with the spectrum observed in the laboratory. Recent species-wide selective sweeps are rare, and potentially deleterious mutations are more common in marginal populations.
Collapse
|
250
|
Brotman Y, Riewe D, Lisec J, Meyer RC, Willmitzer L, Altmann T. Identification of enzymatic and regulatory genes of plant metabolism through QTL analysis in Arabidopsis. JOURNAL OF PLANT PHYSIOLOGY 2011; 168:1387-94. [PMID: 21536339 DOI: 10.1016/j.jplph.2011.03.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2010] [Revised: 03/20/2011] [Accepted: 03/21/2011] [Indexed: 05/04/2023]
Abstract
The biochemical diversity in the plant kingdom is estimated to well exceed 100,000 distinct compounds (Weckwerth, 2003) and 4000 to 20,000 metabolites per species seem likely (Fernie et al., 2004). In recent years extensive progress has been made towards the identification of enzymes and regulatory genes working in a complex network to generate this large arsenal of metabolites. Genetic loci influencing quantitative traits, e.g. metabolites or biomass, may be mapped to associated molecular markers, a method called quantitative trait locus mapping (QTL mapping), which may facilitate the identification of novel genes in biochemical pathways. Arabidopsis thaliana, as a model organism for seed plants, is a suitable target for metabolic QTL (mQTL) studies due to the availability of highly developed molecular and genetic tools, and the extensive knowledge accumulated on the metabolite profile. While intensely studied, in particular since the availability of its complete sequence, the genome of Arabidopsis still comprises a large proportion of genes with only tentative function based on sequence homology. From a total number of 33,518 genes currently listed (TAIR 9, http://www.arabidopsis.org), only about 25% have direct experimental evidence for their molecular function and biological process, while for more than 30% no biological data are available. Modern metabolomics approaches together with continually extended genomic resources will facilitate the task of assigning functions to those genes. In our previous study we reported on the identification of mQTL (Lisec et al., 2008). In this paper, we summarize the current status of mQTL analyses and causal gene identification in Arabidopsis and present evidence that a candidate gene located within the confidence interval of a fumarate mQTL (AT5G50950) encoding a putative fumarase is likely to be the causal gene of this QTL. The total number of genes molecularly identified based on mQTL studies is still limited, but the advent of multi-parallel analysis techniques for measurement of gene expression, as well as protein and metabolite abundances and for rapid gene identification will assist in the important task of assigning enzymes and regulatory genes to the growing network of known metabolic reactions.
Collapse
Affiliation(s)
- Yariv Brotman
- Department of Molecular Physiology, Max-Planck-Institute of Molecular Plant Physiology, Am Muehlenberg 1, Potsdam-Golm, Germany
| | | | | | | | | | | |
Collapse
|