301
|
Koornneef M, Meinke D. The development of Arabidopsis as a model plant. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2010; 61:909-21. [PMID: 20409266 DOI: 10.1111/j.1365-313x.2009.04086.x] [Citation(s) in RCA: 227] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Twenty-five years ago, Arabidopsis thaliana emerged as the model organism of choice for research in plant biology. A consensus was reached about the need to focus on a single organism to integrate the classical disciplines of plant science with the expanding fields of genetics and molecular biology. Ten years after publication of its genome sequence, Arabidopsis remains the standard reference plant for all of biology. We reflect here on the major advances and shared resources that led to the extraordinary growth of the Arabidopsis research community. We also underscore the importance of continuing to expand and refine our detailed knowledge of Arabidopsis while seeking to appreciate the remarkable diversity that characterizes the plant kingdom.
Collapse
Affiliation(s)
- Maarten Koornneef
- Department of Plant Breeding and Genetics at the Max Planck Institute for Plant Breeding Research, Carl-von Linné Weg 10, Cologne, Germany.
| | | |
Collapse
|
302
|
Rounsley SD, Last RL. Shotguns and SNPs: how fast and cheap sequencing is revolutionizing plant biology. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2010; 61:922-7. [PMID: 20409267 DOI: 10.1111/j.1365-313x.2009.04030.x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
In 1998 Cereon Genomics LLC, a subsidiary of Monsanto Co., performed a shotgun sequencing of the Arabidopsis thaliana Landsberg erecta genome to a depth of twofold coverage using 'classic' Sanger sequencing. This sequence was assembled and aligned to the Columbia ecotype sequence produced by the Arabidopsis Genome Initiative. The analysis provided tens of thousands of high-confidence predictions of polymorphisms between these two varieties of A. thaliana, and the predicted polymorphisms and Landsberg erecta sequence were subsequently made available to the not-for-profit research community by Monsanto. These data have been used for a wide variety of published studies, including map-based gene identification from forward genetic screens, studies of recombination and organelle genetics, and gene expression studies. The combination of resequencing approaches with next-generation sequencing technology has led to an increasing number of similar studies of genome-wide genetic diversity in A. thaliana, including the 1001 genomes project (http://1001genomes.org). Similar approaches are becoming possible in any number of crop species as DNA sequencing costs plummet and throughput rapidly increases, promising to lay the groundwork for revolutionizing our understanding of the relationship between genotype and phenotype in plants.
Collapse
Affiliation(s)
- Steven D Rounsley
- School of Plant Sciences, University of Arizona, Tucson, AZ 85721, USA
| | | |
Collapse
|
303
|
O'Malley RC, Ecker JR. Linking genotype to phenotype using the Arabidopsis unimutant collection. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2010; 61:928-40. [PMID: 20409268 DOI: 10.1111/j.1365-313x.2010.04119.x] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
The large collections of Arabidopsis thaliana sequence-indexed T-DNA insertion mutants are among the most important resources to emerge from the sequencing of the genome. Several laboratories around the world have used the Arabidopsis reference genome sequence to map T-DNA flanking sequence tags (FST) for over 325,000 T-DNA insertion lines. Over the past decade, phenotypes identified with T-DNA-induced mutants have played a critical role in advancing both basic and applied plant research. These widely used mutants are an invaluable tool for direct interrogation of gene function. However, most lines are hemizygous for the insertion, necessitating a genotyping step to identify homozygous plants for the quantification of phenotypes. This situation has limited the application of these collections for genome-wide screens. Isolating multiple homozygous insert lines for every gene in the genome would make it possible to systematically test the phenotypic consequence of gene loss under a wide variety of conditions. One major obstacle to achieving this goal is that 12% of genes have no insertion and 8% are only represented by a single allele. Generation of additional mutations to achieve full genome coverage has been slow and expensive since each insertion is sequenced one at a time. Recent advances in high-throughput sequencing technology open up a potentially faster and cost-effective means to create new, very large insertion mutant populations for plants or animals. With the combination of new tools for genome-wide studies and emerging phenotyping platforms, these sequence-indexed mutant collections are poised to have a larger impact on our understanding of gene function.
Collapse
Affiliation(s)
- Ronan C O'Malley
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92307, USA
| | | |
Collapse
|
304
|
Remarkably ancient balanced polymorphisms in a multi-locus gene network. Nature 2010; 464:54-8. [PMID: 20164837 PMCID: PMC2834422 DOI: 10.1038/nature08791] [Citation(s) in RCA: 117] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2009] [Accepted: 12/15/2009] [Indexed: 11/25/2022]
Abstract
Local adaptations within species are often governed by several interacting genes scattered throughout the genome. Single-locus models of selection cannot explain the maintenance of such complex variation because recombination separates co-adapted alleles. Here we report a novel type of intraspecific multi-locus genetic variation that has been maintained over a vast period of time. The galactose (GAL) utilization gene network of the brewer’s yeast relative Saccharomyces kudriavzevii exists in two distinct states: a functional gene network in Portuguese strains and, in Japanese strains, a non-functional gene network of allelic pseudogenes. Genome sequencing of all available S. kudriavzevii strains revealed that none of the functional GAL genes were acquired from other species. Rather, these polymorphisms have been maintained for nearly the entire history of the species, despite more recent gene flow genome-wide. Experimental evidence suggests that inactivation of the GAL3 and GAL80 regulatory genes facilitated the origin and long-term maintenance of the two gene network states. This striking example of a balanced unlinked gene network polymorphism introduces a remarkable type of intraspecific variation that may be widespread.
Collapse
|
305
|
Krawitz P, Rödelsperger C, Jäger M, Jostins L, Bauer S, Robinson PN. Microindel detection in short-read sequence data. ACTA ACUST UNITED AC 2010; 26:722-9. [PMID: 20144947 DOI: 10.1093/bioinformatics/btq027] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Several recent studies have demonstrated the effectiveness of resequencing and single nucleotide variant (SNV) detection by deep short-read sequencing platforms. While several reliable algorithms are available for automated SNV detection, the automated detection of microindels in deep short-read data presents a new bioinformatics challenge. RESULTS We systematically analyzed how the short-read mapping tools MAQ, Bowtie, Burrows-Wheeler alignment tool (BWA), Novoalign and RazerS perform on simulated datasets that contain indels and evaluated how indels affect error rates in SNV detection. We implemented a simple algorithm to compute the equivalent indel region eir, which can be used to process the alignments produced by the mapping tools in order to perform indel calling. Using simulated data that contains indels, we demonstrate that indel detection works well on short-read data: the detection rate for microindels (<4 bp) is >90%. Our study provides insights into systematic errors in SNV detection that is based on ungapped short sequence read alignments. Gapped alignments of short sequence reads can be used to reduce this error and to detect microindels in simulated short-read data. A comparison with microindels automatically identified on the ABI Sanger and Roche 454 platform indicates that microindel detection from short sequence reads identifies both overlapping and distinct indels. CONTACT peter.krawitz@googlemail.com; peter.robinson@charite.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peter Krawitz
- Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, 13353 Berlin.
| | | | | | | | | | | |
Collapse
|
306
|
Out AA, van Minderhout IJHM, Goeman JJ, Ariyurek Y, Ossowski S, Schneeberger K, Weigel D, van Galen M, Taschner PEM, Tops CMJ, Breuning MH, van Ommen GJB, den Dunnen JT, Devilee P, Hes FJ. Deep sequencing to reveal new variants in pooled DNA samples. Hum Mutat 2010; 30:1703-12. [PMID: 19842214 DOI: 10.1002/humu.21122] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We evaluated massive parallel sequencing and long-range PCR (LRP) for rare variant detection and allele frequency estimation in pooled DNA samples. Exons 2 to 16 of the MUTYH gene were analyzed in breast cancer patients with Illumina's (Solexa) technology. From a pool of 287 genomic DNA samples we generated a single LRP product, while the same LRP was performed on 88 individual samples and the resulting products then pooled. Concentrations of constituent samples were measured with fluorimetry for genomic DNA and high-resolution melting curve analysis (HR-MCA) for LRP products. Illumina sequencing results were compared to Sanger sequencing data of individual samples. Correlation between allele frequencies detected by both methods was poor in the first pool, presumably because the genomic samples amplified unequally in the LRP, due to DNA quality variability. In contrast, allele frequencies correlated well in the second pool, in which all expected alleles at a frequency of 1% and higher were reliably detected, plus the majority of singletons (0.6% allele frequency). We describe custom bioinformatics and statistics to optimize detection of rare variants and to estimate required sequencing depth. Our results provide directions for designing high-throughput analyses of candidate genes.
Collapse
Affiliation(s)
- Astrid A Out
- Department of Clinical Genetics, Leiden University Medical Center, Leiden, The Netherlands.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
307
|
Cridland JM, Thornton KR. Validation of rearrangement break points identified by paired-end sequencing in natural populations of Drosophila melanogaster. Genome Biol Evol 2010; 2:83-101. [PMID: 20333226 PMCID: PMC2839345 DOI: 10.1093/gbe/evq001] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/08/2010] [Indexed: 01/17/2023] Open
Abstract
Several recent studies have focused on the evolution of recently duplicated genes in Drosophila. Currently, however, little is known about the evolutionary forces acting upon duplications that are segregating in natural populations. We used a high-throughput, paired-end sequencing platform (Illumina) to identify structural variants in a population sample of African D. melanogaster. Polymerase chain reaction and sequencing confirmation of duplications detected by multiple, independent paired-ends showed that paired-end sequencing reliably uncovered the break points of structural rearrangements and allowed us to identify a number of tandem duplications segregating within a natural population. Our confirmation experiments show that rates of confirmation are very high, even at modest coverage. Our results also compare well with previous studies using microarrays (Emerson J, Cardoso-Moreira M, Borevitz JO, Long M. 2008. Natural selection shapes genome wide patterns of copy-number polymorphism in Drosophila melanogaster. Science. 320:1629-1631. and Dopman EB, Hartl DL. 2007. A portrait of copy-number polymorphism in Drosophila melanogaster. Proc Natl Acad Sci U S A. 104:19920-19925.), which both gives us confidence in the results of this study as well as confirms previous microarray results.We were also able to identify whole-gene duplications, such as a novel duplication of Or22a, an olfactory receptor, and identify copy-number differences in genes previously known to be under positive selection, like Cyp6g1, which confers resistance to dichlorodiphenyltrichloroethane. Several "hot spots" of duplications were detected in this study, which indicate that particular regions of the genome may be more prone to generating duplications. Finally, population frequency analysis of confirmed events also showed an excess of rare variants in our population, which indicates that duplications segregating in the population may be deleterious and ultimately destined to be lost from the population.
Collapse
Affiliation(s)
- Julie M Cridland
- Department of Ecology and Evolutionary Biology, University of California, Irvine, USA
| | | |
Collapse
|
308
|
Santuari L, Pradervand S, Amiguet-Vercher AM, Thomas J, Dorcey E, Harshman K, Xenarios I, Juenger TE, Hardtke CS. Substantial deletion overlap among divergent Arabidopsis genomes revealed by intersection of short reads and tiling arrays. Genome Biol 2010; 11:R4. [PMID: 20067627 PMCID: PMC2847716 DOI: 10.1186/gb-2010-11-1-r4] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2009] [Revised: 01/05/2010] [Accepted: 01/12/2010] [Indexed: 01/06/2023] Open
Abstract
A new approach to detect deletions in divergentgenomes combines short read sequencing and tilling array data. Its utility is demonstrated on Arabidopsis strains. Identification of small polymorphisms from next generation sequencing short read data is relatively easy, but detection of larger deletions is less straightforward. Here, we analyzed four divergent Arabidopsis accessions and found that intersection of absent short read coverage with weak tiling array hybridization signal reliably flags deletions. Interestingly, individual deletions were frequently observed in two or more of the accessions examined, suggesting that variation in gene content partly reflects a common history of deletion events.
Collapse
Affiliation(s)
- Luca Santuari
- Department of Plant Molecular Biology, University of Lausanne, Biophore Building, CH-1015 Lausanne, Switzerland.
| | | | | | | | | | | | | | | | | |
Collapse
|
309
|
Pool JE, Hellmann I, Jensen JD, Nielsen R. Population genetic inference from genomic sequence variation. Genome Res 2010; 20:291-300. [PMID: 20067940 DOI: 10.1101/gr.079509.108] [Citation(s) in RCA: 149] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Population genetics has evolved from a theory-driven field with little empirical data into a data-driven discipline in which genome-scale data sets test the limits of available models and computational analysis methods. In humans and a few model organisms, analyses of whole-genome sequence polymorphism data are currently under way. And in light of the falling costs of next-generation sequencing technologies, such studies will soon become common in many other organisms as well. Here, we assess the challenges to analyzing whole-genome sequence polymorphism data, and we discuss the potential of these data to yield new insights concerning population history and the genomic prevalence of natural selection.
Collapse
Affiliation(s)
- John E Pool
- Department of Integrative Biology, University of California, Berkeley, Berkeley, California 94720, USA
| | | | | | | |
Collapse
|
310
|
Hall D, Tegstrom C, Ingvarsson PK. Using association mapping to dissect the genetic basis of complex traits in plants. Brief Funct Genomics 2010; 9:157-65. [DOI: 10.1093/bfgp/elp048] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
311
|
Meyer RC, Kusterer B, Lisec J, Steinfath M, Becher M, Scharr H, Melchinger AE, Selbig J, Schurr U, Willmitzer L, Altmann T. QTL analysis of early stage heterosis for biomass in Arabidopsis. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2010; 120:227-37. [PMID: 19504257 PMCID: PMC2793381 DOI: 10.1007/s00122-009-1074-6] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2009] [Accepted: 05/18/2009] [Indexed: 05/19/2023]
Abstract
The main objective of this study was to identify genomic regions involved in biomass heterosis using QTL, generation means, and mode-of-inheritance classification analyses. In a modified North Carolina Design III we backcrossed 429 recombinant inbred line and 140 introgression line populations to the two parental accessions, C24 and Col-0, whose F (1) hybrid exhibited 44% heterosis for biomass. Mid-parent heterosis in the RILs ranged from -31 to 99% for dry weight and from -58 to 143% for leaf area. We detected ten genomic positions involved in biomass heterosis at an early developmental stage, individually explaining between 2.4 and 15.7% of the phenotypic variation. While overdominant gene action was prevalent in heterotic QTL, our results suggest that a combination of dominance, overdominance and epistasis is involved in biomass heterosis in this Arabidopsis cross.
Collapse
Affiliation(s)
- Rhonda Christiane Meyer
- Department of Molecular Genetics, Leibniz Institute of Plant Genetics and Crop Plant Research, Corrensstrasse 3, 06466 Gatersleben, Germany.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
312
|
Ossowski S, Schneeberger K, Lucas-Lledó JI, Warthmann N, Clark RM, Shaw RG, Weigel D, Lynch M. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 2010; 327:92-4. [PMID: 20044577 PMCID: PMC3878865 DOI: 10.1126/science.1180677] [Citation(s) in RCA: 774] [Impact Index Per Article: 51.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
To take complete advantage of information on within-species polymorphism and divergence from close relatives, one needs to know the rate and the molecular spectrum of spontaneous mutations. To this end, we have searched for de novo spontaneous mutations in the complete nuclear genomes of five Arabidopsis thaliana mutation accumulation lines that had been maintained by single-seed descent for 30 generations. We identified and validated 99 base substitutions and 17 small and large insertions and deletions. Our results imply a spontaneous mutation rate of 7 x 10(-9) base substitutions per site per generation, the majority of which are G:C-->A:T transitions. We explain this very biased spectrum of base substitution mutations as a result of two main processes: deamination of methylated cytosines and ultraviolet light-induced mutagenesis.
Collapse
Affiliation(s)
- Stephan Ossowski
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Korbinian Schneeberger
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | | | - Norman Warthmann
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Richard M. Clark
- Department of Biology, University of Utah, Salt Lake City, UT 84112, USA
| | - Ruth G. Shaw
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, MN 55108, USA
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Michael Lynch
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
313
|
Shen Y, Wan Z, Coarfa C, Drabek R, Chen L, Ostrowski EA, Liu Y, Weinstock GM, Wheeler DA, Gibbs RA, Yu F. A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res 2009; 20:273-80. [PMID: 20019143 DOI: 10.1101/gr.096388.109] [Citation(s) in RCA: 123] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Accurate identification of genetic variants from next-generation sequencing (NGS) data is essential for immediate large-scale genomic endeavors such as the 1000 Genomes Project, and is crucial for further genetic analysis based on the discoveries. The key challenge in single nucleotide polymorphism (SNP) discovery is to distinguish true individual variants (occurring at a low frequency) from sequencing errors (often occurring at frequencies orders of magnitude higher). Therefore, knowledge of the error probabilities of base calls is essential. We have developed Atlas-SNP2, a computational tool that detects and accounts for systematic sequencing errors caused by context-related variables in a logistic regression model learned from training data sets. Subsequently, it estimates the posterior error probability for each substitution through a Bayesian formula that integrates prior knowledge of the overall sequencing error probability and the estimated SNP rate with the results from the logistic regression model for the given substitutions. The estimated posterior SNP probability can be used to distinguish true SNPs from sequencing errors. Validation results show that Atlas-SNP2 achieves a false-positive rate of lower than 10%, with an approximately 5% or lower false-negative rate.
Collapse
Affiliation(s)
- Yufeng Shen
- The Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
314
|
|
315
|
Parks M, Cronn R, Liston A. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes. BMC Biol 2009; 7:84. [PMID: 19954512 PMCID: PMC2793254 DOI: 10.1186/1741-7007-7-84] [Citation(s) in RCA: 376] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2009] [Accepted: 12/02/2009] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Molecular evolutionary studies share the common goal of elucidating historical relationships, and the common challenge of adequately sampling taxa and characters. Particularly at low taxonomic levels, recent divergence, rapid radiations, and conservative genome evolution yield limited sequence variation, and dense taxon sampling is often desirable. Recent advances in massively parallel sequencing make it possible to rapidly obtain large amounts of sequence data, and multiplexing makes extensive sampling of megabase sequences feasible. Is it possible to efficiently apply massively parallel sequencing to increase phylogenetic resolution at low taxonomic levels? RESULTS We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome) generated using multiplexed massively parallel sequencing. 30/33 ingroup nodes resolved with > or = 95% bootstrap support; this is a substantial improvement relative to prior studies, and shows massively parallel sequencing-based strategies can produce sufficient high quality sequence to reach support levels originally proposed for the phylogenetic bootstrap. Resampling simulations show that at least the entire plastome is necessary to fully resolve Pinus, particularly in rapidly radiating clades. Meta-analysis of 99 published infrageneric phylogenies shows that whole plastome analysis should provide similar gains across a range of plant genera. A disproportionate amount of phylogenetic information resides in two loci (ycf1, ycf2), highlighting their unusual evolutionary properties. CONCLUSION Plastome sequencing is now an efficient option for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses. With continuing improvements in sequencing capacity, the strategies herein should revolutionize efforts requiring dense taxon and character sampling, such as phylogeographic analyses and species-level DNA barcoding.
Collapse
Affiliation(s)
- Matthew Parks
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, 97331, USA
| | - Richard Cronn
- Pacific Northwest Research Station, USDA Forest Service, Corvallis, OR, 97331, USA
| | - Aaron Liston
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, 97331, USA
| |
Collapse
|
316
|
Steuernagel B, Taudien S, Gundlach H, Seidel M, Ariyadasa R, Schulte D, Petzold A, Felder M, Graner A, Scholz U, Mayer KFX, Platzer M, Stein N. De novo 454 sequencing of barcoded BAC pools for comprehensive gene survey and genome analysis in the complex genome of barley. BMC Genomics 2009; 10:547. [PMID: 19930547 PMCID: PMC2784808 DOI: 10.1186/1471-2164-10-547] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2009] [Accepted: 11/20/2009] [Indexed: 01/18/2023] Open
Abstract
Background De novo sequencing the entire genome of a large complex plant genome like the one of barley (Hordeum vulgare L.) is a major challenge both in terms of experimental feasibility and costs. The emergence and breathtaking progress of next generation sequencing technologies has put this goal into focus and a clone based strategy combined with the 454/Roche technology is conceivable. Results To test the feasibility, we sequenced 91 barcoded, pooled, gene containing barley BACs using the GS FLX platform and assembled the sequences under iterative change of parameters. The BAC assemblies were characterized by N50 of ~50 kb (N80 ~31 kb, N90 ~21 kb) and a Q40 of 94%. For ~80% of the clones, the best assemblies consisted of less than 10 contigs at 24-fold mean sequence coverage. Moreover we show that gene containing regions seem to assemble completely and uninterrupted thus making the approach suitable for detecting complete and positionally anchored genes. By comparing the assemblies of four clones to their complete reference sequences generated by the Sanger method, we evaluated the distribution, quality and representativeness of the 454 sequences as well as the consistency and reliability of the assemblies. Conclusion The described multiplex 454 sequencing of barcoded BACs leads to sequence consensi highly representative for the clones. Assemblies are correct for the majority of contigs. Though the resolution of complex repetitive structures requires additional experimental efforts, our approach paves the way for a clone based strategy of sequencing the barley genome.
Collapse
Affiliation(s)
- Burkhard Steuernagel
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), D-06466 Gatersleben, Germany.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
317
|
Imelfort M, Edwards D. De novo sequencing of plant genomes using second-generation technologies. Brief Bioinform 2009; 10:609-18. [DOI: 10.1093/bib/bbp039] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
|
318
|
Clement NL, Snell Q, Clement MJ, Hollenhorst PC, Purwar J, Graves BJ, Cairns BR, Johnson WE. The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. ACTA ACUST UNITED AC 2009; 26:38-45. [PMID: 19861355 DOI: 10.1093/bioinformatics/btp614] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The advent of next-generation sequencing technologies has increased the accuracy and quantity of sequence data, opening the door to greater opportunities in genomic research. RESULTS In this article, we present GNUMAP (Genomic Next-generation Universal MAPper), a program capable of overcoming two major obstacles in the mapping of reads from next-generation sequencing runs. First, we have created an algorithm that probabilistically maps reads to repeat regions in the genome on a quantitative basis. Second, we have developed a probabilistic Needleman-Wunsch algorithm which utilizes _prb.txt and _int.txt files produced in the Solexa/Illumina pipeline to improve the mapping accuracy for lower quality reads and increase the amount of usable data produced in a given experiment. AVAILABILITY The source code for the software can be downloaded from http://dna.cs.byu.edu/gnumap.
Collapse
Affiliation(s)
- Nathan L Clement
- Department of Computer Science, Department of Statistics, Brigham Young University, Provo, UT 84602, USA.
| | | | | | | | | | | | | | | |
Collapse
|
319
|
Gilad Y, Pritchard JK, Thornton K. Characterizing natural variation using next-generation sequencing technologies. Trends Genet 2009; 25:463-71. [PMID: 19801172 DOI: 10.1016/j.tig.2009.09.003] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2009] [Revised: 09/08/2009] [Accepted: 09/09/2009] [Indexed: 01/22/2023]
Abstract
Progress in evolutionary genomics is tightly coupled with the development of new technologies to collect high-throughput data. The availability of next-generation sequencing technologies has the potential to revolutionize genomic research and enable us to focus on a large number of outstanding questions that previously could not be addressed effectively. Indeed, we are now able to study genetic variation on a genome-wide scale, characterize gene regulatory processes at unprecedented resolution, and soon, we expect that individual laboratories might be able to rapidly sequence new genomes. However, at present, the analysis of next-generation sequencing data is challenging, in particular because most sequencing platforms provide short reads, which are difficult to align and assemble. In addition, only little is known about sources of variation that are associated with next-generation sequencing study designs. A better understanding of the sources of error and bias in sequencing data is essential, especially in the context of studies of variation at dynamic quantitative traits.
Collapse
Affiliation(s)
- Yoav Gilad
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
| | | | | |
Collapse
|
320
|
Schneeberger K, Hagmann J, Ossowski S, Warthmann N, Gesing S, Kohlbacher O, Weigel D. Simultaneous alignment of short reads against multiple genomes. Genome Biol 2009; 10:R98. [PMID: 19761611 PMCID: PMC2768987 DOI: 10.1186/gb-2009-10-9-r98] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Revised: 09/12/2009] [Accepted: 09/17/2009] [Indexed: 11/15/2022] Open
Abstract
New software for the alignment of short-read sequence data to multiple genomes allows identification of polymorphisms that cannot be identified by alignment to a single reference genome. Genome resequencing with short reads generally relies on alignments against a single reference. GenomeMapper supports simultaneous mapping of short reads against multiple genomes by integrating related genomes (e.g., individuals of the same species) into a single graph structure. It constitutes the first approach for handling multiple references and introduces representations for alignments against complex structures. Demonstrated benefits include access to polymorphisms that cannot be identified by alignments against the reference alone. Download GenomeMapper at .
Collapse
Affiliation(s)
- Korbinian Schneeberger
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Spemannstrasse 37-39, D-72076 Tübingen, Germany.
| | | | | | | | | | | | | |
Collapse
|
321
|
Selective epigenetic control of retrotransposition in Arabidopsis. Nature 2009; 461:427-30. [PMID: 19734882 DOI: 10.1038/nature08328] [Citation(s) in RCA: 245] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Accepted: 07/27/2009] [Indexed: 11/08/2022]
Abstract
Retrotransposons are mobile genetic elements that populate chromosomes, where the host largely controls their activities. In plants and mammals, retrotransposons are transcriptionally silenced by DNA methylation, which in Arabidopsis is propagated at CG dinucleotides by METHYLTRANSFERASE 1 (MET1). In met1 mutants, however, mobilization of retrotransposons is not observed, despite their transcriptional activation. A post-transcriptional mechanism therefore seems to be preventing retrotransposition. Here we show that a copia-type retrotransposon (Evadé, French for 'fugitive') evaded suppression of its movement during inbreeding of hybrid epigenomes consisting of met1- and wild-type-derived chromosomes. Evadé (EVD) reinsertions caused a series of developmental mutations that allowed its identification. Genetic testing of host control of the EVD life cycle showed that transcriptional suppression occurred by CG methylation supported by RNA-directed DNA methylation. On transcriptional reactivation, subsequent steps of the EVD cycle were inhibited by plant-specific RNA polymerase IV/V and the histone methyltransferase KRYPTONITE (KYP). Moreover, genome resequencing demonstrated retrotransposition of EVD but no other potentially active retroelements when this combination of epigenetic mechanisms was compromised. Our results demonstrate that epigenetic control of retrotransposons extends beyond transcriptional suppression and can be individualized for particular elements.
Collapse
|
322
|
Schneeberger K, Ossowski S, Lanz C, Juul T, Petersen AH, Nielsen KL, Jørgensen JE, Weigel D, Andersen SU. SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nat Methods 2009; 6:550-1. [PMID: 19644454 DOI: 10.1038/nmeth0809-550] [Citation(s) in RCA: 400] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
323
|
Varshney RK, Nayak SN, May GD, Jackson SA. Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol 2009; 27:522-30. [PMID: 19679362 DOI: 10.1016/j.tibtech.2009.05.006] [Citation(s) in RCA: 419] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2009] [Revised: 05/21/2009] [Accepted: 05/27/2009] [Indexed: 10/20/2022]
Abstract
Using next-generation sequencing technologies it is possible to resequence entire plant genomes or sample entire transcriptomes more efficiently and economically and in greater depth than ever before. Rather than sequencing individual genomes, we envision the sequencing of hundreds or even thousands of related genomes to sample genetic diversity within and between germplasm pools. Identification and tracking of genetic variation are now so efficient and precise that thousands of variants can be tracked within large populations. In this review, we outline some important areas such as the large-scale development of molecular markers for linkage mapping, association mapping, wide crosses and alien introgression, epigenetic modifications, transcript profiling, population genetics and de novo genome/organellar genome assembly for which these technologies are expected to advance crop genetics and breeding, leading to crop improvement.
Collapse
Affiliation(s)
- Rajeev K Varshney
- Centre of Excellence in Genomics (CEG), International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru 502324, A.P., India.
| | | | | | | |
Collapse
|
324
|
Neiman M, Olson MS, Tiffin P. Selective histories of poplar protease inhibitors: elevated polymorphism, purifying selection, and positive selection driving divergence of recent duplicates. THE NEW PHYTOLOGIST 2009; 183:740-750. [PMID: 19566812 DOI: 10.1111/j.1469-8137.2009.02936.x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
To further our understanding of plant defense evolution and the consistency of selection at the nucleotide level we analysed polymorphism data from five protease inhibitor (PI) genes in Populus balsamifera. We compared diversity at the five PI genes to diversity at nondefense loci in both range-wide samples as well as in two subpopulations, one from the northern edge of the species range and one from the southern edge of the range. We also compared our data with previously reported diversity in Populus tremula, a European species with similar ecology to North American P. balsamifera. The PIs show diverse histories, including repeated bouts of positive selection and excess diversity. These genes also exhibit diverse histories in P. tremula but the signatures of selection acting at the specific loci differed between the species. One locus, KTI3, segregates several recent duplicates that show evidence of either positive selection or relaxed selective constraints. The patterns of diversity at the PIs varied within P. balsamifera and between two closely related species. The lack of consistent patterns suggests that evolution of host defense genes, including adaptations to enemy-imposed selection, may often be lineage- and gene-specific.
Collapse
Affiliation(s)
- Maurine Neiman
- Department of Plant Biology, 250 Biosciences, University of Minnesota, Saint Paul, MN 55105, USA
| | - Matthew S Olson
- Institute of Arctic Biology, 311 Irving 1, University of Alaska Fairbanks, Fairbanks, AK 99775, USA
| | - Peter Tiffin
- Department of Plant Biology, 250 Biosciences, University of Minnesota, Saint Paul, MN 55105, USA
| |
Collapse
|
325
|
Papdi C, Joseph MP, Salamó IP, Vidal S, Szabados L. Genetic technologies for the identification of plant genes controlling environmental stress responses. FUNCTIONAL PLANT BIOLOGY : FPB 2009; 36:696-720. [PMID: 32688681 DOI: 10.1071/fp09047] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2009] [Accepted: 06/11/2009] [Indexed: 06/11/2023]
Abstract
Abiotic conditions such as light, temperature, water availability and soil parameters determine plant growth and development. The adaptation of plants to extreme environments or to sudden changes in their growth conditions is controlled by a well balanced, genetically determined signalling system, which is still far from being understood. The identification and characterisation of plant genes which control responses to environmental stresses is an essential step to elucidate the complex regulatory network, which determines stress tolerance. Here, we review the genetic approaches, which have been used with success to identify plant genes which control responses to different abiotic stress factors. We describe strategies and concepts for forward and reverse genetic screens, conventional and insertion mutagenesis, TILLING, gene tagging, promoter trapping, activation mutagenesis and cDNA library transfer. The utility of the various genetic approaches in plant stress research we review is illustrated by several published examples.
Collapse
Affiliation(s)
- Csaba Papdi
- Institute of Plant Biology, Biological Research Centre, 6726-Szeged, Temesvári krt. 62, Hungary
| | - Mary Prathiba Joseph
- Institute of Plant Biology, Biological Research Centre, 6726-Szeged, Temesvári krt. 62, Hungary
| | - Imma Pérez Salamó
- Institute of Plant Biology, Biological Research Centre, 6726-Szeged, Temesvári krt. 62, Hungary
| | - Sabina Vidal
- Facultad de Ciencias, Universidad de la República, Iguá 4225, CP 11400, Montevideo, Uruguay
| | - László Szabados
- Institute of Plant Biology, Biological Research Centre, 6726-Szeged, Temesvári krt. 62, Hungary
| |
Collapse
|
326
|
Abstract
A major challenge in current biology is to understand the genetic basis of variation for quantitative traits. We review the principles of quantitative trait locus mapping and summarize insights about the genetic architecture of quantitative traits that have been obtained over the past decades. We are currently in the midst of a genomic revolution, which enables us to incorporate genetic variation in transcript abundance and other intermediate molecular phenotypes into a quantitative trait locus mapping framework. This systems genetics approach enables us to understand the biology inside the 'black box' that lies between genotype and phenotype in terms of causal networks of interacting genes.
Collapse
|
327
|
|
328
|
A Multiparent Advanced Generation Inter-Cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS Genet 2009; 5:e1000551. [PMID: 19593375 PMCID: PMC2700969 DOI: 10.1371/journal.pgen.1000551] [Citation(s) in RCA: 379] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2009] [Accepted: 06/08/2009] [Indexed: 12/29/2022] Open
Abstract
Identifying natural allelic variation that underlies quantitative trait variation remains a fundamental problem in genetics. Most studies have employed either simple synthetic populations with restricted allelic variation or performed association mapping on a sample of naturally occurring haplotypes. Both of these approaches have some limitations, therefore alternative resources for the genetic dissection of complex traits continue to be sought. Here we describe one such alternative, the Multiparent Advanced Generation Inter-Cross (MAGIC). This approach is expected to improve the precision with which QTL can be mapped, improving the outlook for QTL cloning. Here, we present the first panel of MAGIC lines developed: a set of 527 recombinant inbred lines (RILs) descended from a heterogeneous stock of 19 intermated accessions of the plant Arabidopsis thaliana. These lines and the 19 founders were genotyped with 1,260 single nucleotide polymorphisms and phenotyped for development-related traits. Analytical methods were developed to fine-map quantitative trait loci (QTL) in the MAGIC lines by reconstructing the genome of each line as a mosaic of the founders. We show by simulation that QTL explaining 10% of the phenotypic variance will be detected in most situations with an average mapping error of about 300 kb, and that if the number of lines were doubled the mapping error would be under 200 kb. We also show how the power to detect a QTL and the mapping accuracy vary, depending on QTL location. We demonstrate the utility of this new mapping population by mapping several known QTL with high precision and by finding novel QTL for germination data and bolting time. Our results provide strong support for similar ongoing efforts to produce MAGIC lines in other organisms. Most traits of economic and evolutionary interest vary quantitatively and have multiple genes affecting their expression. Dissecting the genetic basis of such traits is crucial for the improvement of crops and management of diseases. Here, we develop a new resource to identify genes underlying such quantitative traits in Arabidopsis thaliana, a genetic model organism in plants. We show that using a large population of inbred lines derived from intercrossing 19 parents, we can localize the genes underlying quantitative traits better than with existing methods. Using these lines, we were able to replicate the identification of previously known genes that affect developmental traits in A. thaliana and identify some new ones. This paper also presents all the necessary biological and computational material necessary for the scientific community to use these lines in their own research. Our results suggest that the use of lines derived from a multiparent advanced generation inter-cross (MAGIC lines) should be very useful in other organisms.
Collapse
|
329
|
Seki M, Shinozaki K. Functional genomics using RIKEN Arabidopsis thaliana full-length cDNAs. JOURNAL OF PLANT RESEARCH 2009; 122:355-66. [PMID: 19412652 DOI: 10.1007/s10265-009-0239-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 04/08/2009] [Indexed: 05/24/2023]
Abstract
Full-length cDNAs are essential for the correct annotation of genomic sequences as well as for the functional analysis of genes and their products. We have isolated about 240,000 RIKEN Arabidopsis full-length (RAFL) cDNA clones. These clones were clustered into about 17,000 non-redundant cDNA groups, i.e., about 60% of all Arabidopsis predicted genes. The sequence information of the RAFL cDNAs is useful for promoter analysis, and for the correct annotation of predicted transcriptional units and gene products. We prepared cDNA microarrays containing independent full-length cDNA groups and studied the expression profiles of genes under various stress- and hormone-treatment conditions, and in various mutants and transgenic plants. These expression profiling studies have shown the expression levels of many genes as a detailed snapshot describing the state of a biological system in planta under various conditions. We have applied RAFL cDNAs to the functional analysis of proteins using the full-length cDNA over-expressing (FOX) gene hunting system and the wheat germ cell-free protein synthesis system. The RAFL cDNA collection was also used for determination of the domain structure of proteins by NMR. In this review, we summarize the present state and perspectives of functional genomics using RAFL cDNAs.
Collapse
Affiliation(s)
- Motoaki Seki
- Plant Genomic Network Research Team, Plant Functional Genomics Research Group, RIKEN Plant Science Center, RIKEN Yokohama Institute, Yokohama 230-0045, Japan.
| | | |
Collapse
|
330
|
Koboldt DC, Chen K, Wylie T, Larson DE, McLellan MD, Mardis ER, Weinstock GM, Wilson RK, Ding L. VarScan: variant detection in massively parallel sequencing of individual and pooled samples. ACTA ACUST UNITED AC 2009; 25:2283-5. [PMID: 19542151 DOI: 10.1093/bioinformatics/btp373] [Citation(s) in RCA: 980] [Impact Index Per Article: 61.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
SUMMARY Massively parallel sequencing technologies hold incredible promise for the study of DNA sequence variation, particularly the identification of variants affecting human disease. The unprecedented throughput and relatively short read lengths of Roche/454, Illumina/Solexa, and other platforms have spurred development of a new generation of sequence alignment algorithms. Yet detection of sequence variants based on short read alignments remains challenging, and most currently available tools are limited to a single platform or aligner type. We present VarScan, an open source tool for variant detection that is compatible with several short read aligners. We demonstrate VarScan's ability to detect SNPs and indels with high sensitivity and specificity, in both Roche/454 sequencing of individuals and deep Illumina/Solexa sequencing of pooled samples.
Collapse
Affiliation(s)
- Daniel C Koboldt
- The Genome Center at Washington University School of Medicine, St Louis, MO 63108, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
331
|
Abstract
The 1001 Genomes project for Arabidopsis thaliana could provide an enormous boost for plant research for a modest financial investment. We advocate here a 1001 Genomes project for Arabidopsis thaliana, the workhorse of plant genetics, which will provide an enormous boost for plant research with a modest financial investment.
Collapse
Affiliation(s)
- Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany.
| | | |
Collapse
|
332
|
|
333
|
Method for improving sequence coverage uniformity of targeted genomic intervals amplified by LR-PCR using Illumina GA sequencing-by-synthesis technology. Biotechniques 2009; 46:229-31. [PMID: 19317667 DOI: 10.2144/000113082] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
One approach for high-throughput population-based sequencing of targeted intervals in the human genome is to amplify the regions using long-range PCR (LR-PCR) followed by sequencing with next-generation sequencing (NGS) technologies. Utilizing this method, we have observed that the 50 bp located at the amplicon ends account for more than 50% of the sequenced bases and that the sequence coverage depth of base pairs within an amplicon is highly variable. Here we propose an explanation for the overrepresentation of the amplicon ends and show that the use of 5'-blocked primers for the LR-PCR reaction reduces their overrepresentation. Furthermore, we demonstrate that using a 600-bp library insert size rather than the standard 200-bp insert size results in more uniform sequence coverage depth. The capability to increase sequence coverage uniformity greatly improves the effective throughput of NGS platforms.
Collapse
|
334
|
Abstract
The role of hybridization in evolution has been debated for over a century. Recent molecular genetic studies indicate that hybridization is surprisingly frequent in natural populations, and that it may allow populations to regain traits that have been lost and possibly to replace damaged alleles with functional copies from related species.
Collapse
|
335
|
Yamamoto T, Yonemaru J, Yano M. Towards the understanding of complex traits in rice: substantially or superficially? DNA Res 2009; 16:141-54. [PMID: 19359285 PMCID: PMC2695773 DOI: 10.1093/dnares/dsp006] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Completion of the genome analysis followed by extensive comprehensive studies on a variety of genes and gene families of rice (Oryza sativa) resulted in rapid accumulation of information concerning the presence of many complex traits that are governed by a number of genes of distinct functions in this most important crop cultivated worldwide. The genetic and molecular biological dissection of many important rice phenotypes has contributed to our understanding of the complex nature of the genetic control with respect to these phenotypes. However, in spite of the considerable advances made in the field, details of genetic control remain largely unsolved, thereby hampering our exploitation of this useful information in the breeding of new rice cultivars. To further strengthen the field application of the genome science data of rice obtained so far, we need to develop more powerful genomics-assisted methods for rice breeding based on information derived from various quantitative trait loci (QTL) and related analyses. In this review, we describe recent progresses and outcomes in rice QTL analyses, problems associated with the application of the technology to rice breeding and their implications for the genetic study of other crops along with future perspectives of the relevant fields.
Collapse
Affiliation(s)
- Toshio Yamamoto
- QTL Genomics Research Center, National Institute of Agrobiological Science, Kannondai 2-1-2, Tsukuba, Ibaraki 305-8602, Japan
| | | | | |
Collapse
|
336
|
Brady SM, Provart NJ. Web-queryable large-scale data sets for hypothesis generation in plant biology. THE PLANT CELL 2009; 21:1034-51. [PMID: 19401381 PMCID: PMC2685637 DOI: 10.1105/tpc.109.066050] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2009] [Revised: 04/03/2009] [Accepted: 04/12/2009] [Indexed: 05/17/2023]
Abstract
The approaching end of the 21st century's first decade marks an exciting time for plant biology. Several National Science Foundation Arabidopsis 2010 Projects will conclude, and whether or not the stated goal of the National Science Foundation 2010 Program-to determine the function of 25,000 Arabidopsis genes by 2010-is reached, these projects and others in a similar vein, such as those performed by the AtGenExpress Consortium and various plant genome sequencing initiatives, have generated important and unprecedented large-scale data sets. While providing significant biological insights for the individual laboratories that generated them, these data sets, in conjunction with the appropriate tools, are also permitting plant biologists worldwide to gain new insights into their own biological systems of interest, often at a mouse click through a Web browser. This review provides an overview of several such genomic, epigenomic, transcriptomic, proteomic, and metabolomic data sets and describes Web-based tools for querying them in the context of hypothesis generation for plant biology. We provide five biological examples of how such tools and data sets have been used to provide biological insight.
Collapse
Affiliation(s)
- Siobhan M Brady
- Section of Plant Biology and Genome Center, University of California, Davis, California 95616, USA
| | | |
Collapse
|
337
|
Ganal MW, Altmann T, Röder MS. SNP identification in crop plants. CURRENT OPINION IN PLANT BIOLOGY 2009; 12:211-7. [PMID: 19186095 DOI: 10.1016/j.pbi.2008.12.009] [Citation(s) in RCA: 190] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2008] [Revised: 12/18/2008] [Accepted: 12/20/2008] [Indexed: 05/18/2023]
Abstract
In many plants, single nucleotide polymorphism (SNP) markers are increasingly becoming the marker system of choice. However, for many crop plants there are surprisingly low numbers of validated SNP markers available although they are needed in large numbers for studies regarding genetic variation, linkage mapping, population structure analysis, association genetics, map-based gene isolation, and plant breeding. This review summarizes the current status of SNP marker development technologies for major crop plants. It will also provide an outlook into the future regarding possible SNP identification approaches in crop plants on the basis of current development in model systems such as Arabidopsis which will become available with the full sequencing of more plant genomes, genome resequencing, and in conjunction with the next-generation sequencing technologies.
Collapse
Affiliation(s)
- Martin W Ganal
- TraitGenetics GmbH, Am Schwabeplan 1b, D-06466 Gatersleben, Germany.
| | | | | |
Collapse
|
338
|
Lister R, Gregory BD, Ecker JR. Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond. CURRENT OPINION IN PLANT BIOLOGY 2009; 12:107-18. [PMID: 19157957 PMCID: PMC2723731 DOI: 10.1016/j.pbi.2008.11.004] [Citation(s) in RCA: 138] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2008] [Revised: 11/17/2008] [Accepted: 11/20/2008] [Indexed: 05/18/2023]
Abstract
The sudden availability of DNA sequencing technologies that rapidly produce vast amounts of sequence information has triggered a paradigm shift in genomics, enabling massively parallel surveying of complex nucleic acid populations. The diversity of applications to which these technologies have already been applied demonstrates the immense range of cellular processes and properties that can now be studied at the single-base resolution. These include genome resequencing and polymorphism discovery, mutation mapping, DNA methylation, histone modifications, transcriptome sequencing, gene discovery, alternative splicing identification, small RNA profiling, DNA-protein, and possibly even protein-protein interactions. Thus, these deep sequencing technologies offer plant biologists unprecedented opportunities to increase the understanding of the functions and dynamics of plant cells and populations.
Collapse
Affiliation(s)
- Ryan Lister
- Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Brian D. Gregory
- Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Joseph R. Ecker
- Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
- Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA
- Corresponding author: Joseph R. Ecker, Plant Biology Laboratory and Genomic Analysis Laboratory, The Salk Institute for Biological Studies, 10010 N. Torrey Pines Rd., La Jolla, CA 92037, Telephone: (858) 453-4100 x1795, Fax: (858) 558-6379, E-mail:
| |
Collapse
|
339
|
Application of 'next-generation' sequencing technologies to microbial genetics. Nat Rev Microbiol 2009; 7:287-96. [PMID: 19287448 DOI: 10.1038/nrmicro2122] [Citation(s) in RCA: 123] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
New sequencing methods generate data that can allow the assembly of microbial genome sequences in days. With such revolutionary advances in technology come new challenges in methodologies and informatics. In this article, we review the capabilities of high-throughput sequencing technologies and discuss the many options for getting useful information from the data.
Collapse
|
340
|
Ziolkowski PA, Koczyk G, Galganski L, Sadowski J. Genome sequence comparison of Col and Ler lines reveals the dynamic nature of Arabidopsis chromosomes. Nucleic Acids Res 2009; 37:3189-201. [PMID: 19305000 PMCID: PMC2691826 DOI: 10.1093/nar/gkp183] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Large differences in plant genome sizes are mainly due to numerous events of insertions or deletions (indels). The balance between these events determines the evolutionary direction of genome changes. To address the question of what phenomena trigger these alterations, we compared the genomic sequences of two Arabidopsis thaliana lines, Columbia (Col) and Landsberg erecta (Ler). Based on the resulting alignments large indels (>100 bp) within these two genomes were analysed. There are ∼8500 large indels accounting for the differences between the two genomes. The genetic basis of their origin was distinguished as three main categories: unequal recombination (Urec)-derived, illegitimate recombination (Illrec)-derived and transposable elements (TE)-derived. A detailed study of their distribution and size variation along chromosomes, together with a correlation analyses, allowed us to demonstrate the impact of particular recombination-based mechanisms on the plant genome evolution. The results show that unequal recombination is not efficient in the removal of TEs within the pericentromeric regions. Moreover, we discovered an unexpectedly high influence of large indels on gene evolution pointing out significant differences between the various gene families. For the first time, we present convincing evidence that somatic events do play an important role in plant genome evolution.
Collapse
Affiliation(s)
- Piotr A Ziolkowski
- Department of Biotechnology, Adam Mickiewicz University, Umultowska 89, 61-614 Poznań, Poland
| | | | | | | |
Collapse
|
341
|
MacLean D, Jones JDG, Studholme DJ. Application of 'next-generation' sequencing technologies to microbial genetics. Nat Rev Microbiol 2009. [DOI: 10.1038/nrmicro2088] [Citation(s) in RCA: 243] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
342
|
Reinhardt JA, Baltrus DA, Nishimura MT, Jeck WR, Jones CD, Dangl JL. De novo assembly using low-coverage short read sequence data from the rice pathogen Pseudomonas syringae pv. oryzae. Genome Res 2008; 19:294-305. [PMID: 19015323 DOI: 10.1101/gr.083311.108] [Citation(s) in RCA: 121] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We developed a novel approach for de novo genome assembly using only sequence data from high-throughput short read sequencing technologies. By combining data generated from 454 Life Sciences (Roche) and Illumina (formerly known as Solexa sequencing) sequencing platforms, we reliably assembled genomes into large scaffolds at a fraction of the traditional cost and without use of a reference sequence. We applied this method to two isolates of the phytopathogenic bacteria Pseudomonas syringae. Sequencing and reassembly of the well-studied tomato and Arabidopsis pathogen, Pto(DC3000), facilitated development and testing of our method. Sequencing of a distantly related rice pathogen, Por(1_)(6), demonstrated our method's efficacy for de novo assembly of novel genomes. Our assembly of Por(1_6) yielded an N50 scaffold size of 531,821 bp with >75% of the predicted genome covered by scaffolds over 100,000 bp. One of the critical phenotypic differences between strains of P. syringae is the range of plant hosts they infect. This is largely determined by their complement of type III effector proteins. The genome of Por(1_6) is the first sequenced for a P. syringae isolate that is a pathogen of monocots, and, as might be predicted, its complement of type III effectors differs substantially from the previously sequenced isolates of this species. The genome of Por(1_6) helps to define an expansion of the P. syringae pan-genome, a corresponding contraction of the core genome, and a further diversification of the type III effector complement for this important plant pathogen species.
Collapse
Affiliation(s)
- Josephine A Reinhardt
- Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | | | | | | | | | | |
Collapse
|