51
|
Nelson ADL, Devisetty UK, Palos K, Haug-Baltzell AK, Lyons E, Beilstein MA. Evolinc: A Tool for the Identification and Evolutionary Comparison of Long Intergenic Non-coding RNAs. Front Genet 2017; 8:52. [PMID: 28536600 PMCID: PMC5422434 DOI: 10.3389/fgene.2017.00052] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 04/12/2017] [Indexed: 11/25/2022] Open
Abstract
Long intergenic non-coding RNAs (lincRNAs) are an abundant and functionally diverse class of eukaryotic transcripts. Reported lincRNA repertoires in mammals vary, but are commonly in the thousands to tens of thousands of transcripts, covering ~90% of the genome. In addition to elucidating function, there is particular interest in understanding the origin and evolution of lincRNAs. Aside from mammals, lincRNA populations have been sparsely sampled, precluding evolutionary analyses focused on their emergence and persistence. Here we present Evolinc, a two-module pipeline designed to facilitate lincRNA discovery and characterize aspects of lincRNA evolution. The first module (Evolinc-I) is a lincRNA identification workflow that also facilitates downstream differential expression analysis and genome browser visualization of identified lincRNAs. The second module (Evolinc-II) is a genomic and transcriptomic comparative analysis workflow that determines the phylogenetic depth to which a lincRNA locus is conserved within a user-defined group of related species. Here we validate lincRNA catalogs generated with Evolinc-I against previously annotated Arabidopsis and human lincRNA data. Evolinc-I recapitulated earlier findings and uncovered an additional 70 Arabidopsis and 43 human lincRNAs. We demonstrate the usefulness of Evolinc-II by examining the evolutionary histories of a public dataset of 5,361 Arabidopsis lincRNAs. We used Evolinc-II to winnow this dataset to 40 lincRNAs conserved across species in Brassicaceae. Finally, we show how Evolinc-II can be used to recover the evolutionary history of a known lincRNA, the human telomerase RNA (TERC). These latter analyses revealed unexpected duplication events as well as the loss and subsequent acquisition of a novel TERC locus in the lineage leading to mice and rats. The Evolinc pipeline is currently integrated in CyVerse's Discovery Environment and is free for use by researchers.
Collapse
Affiliation(s)
- Andrew D L Nelson
- Beilstein Lab, School of Plant Sciences, University of ArizonaTucson, AZ, USA
| | | | - Kyle Palos
- Beilstein Lab, School of Plant Sciences, University of ArizonaTucson, AZ, USA
| | - Asher K Haug-Baltzell
- Lyons Lab, Genetics Graduate Interdisciplinary Group, University of ArizonaTucson, AZ, USA
| | - Eric Lyons
- CyVerse, Bio5, University of ArizonaTucson, AZ, USA.,Lyons Lab, Genetics Graduate Interdisciplinary Group, University of ArizonaTucson, AZ, USA
| | - Mark A Beilstein
- Beilstein Lab, School of Plant Sciences, University of ArizonaTucson, AZ, USA
| |
Collapse
|
52
|
Mascher M, Gundlach H, Himmelbach A, Beier S, Twardziok SO, Wicker T, Radchuk V, Dockter C, Hedley PE, Russell J, Bayer M, Ramsay L, Liu H, Haberer G, Zhang XQ, Zhang Q, Barrero RA, Li L, Taudien S, Groth M, Felder M, Hastie A, Šimková H, Staňková H, Vrána J, Chan S, Muñoz-Amatriaín M, Ounit R, Wanamaker S, Bolser D, Colmsee C, Schmutzer T, Aliyeva-Schnorr L, Grasso S, Tanskanen J, Chailyan A, Sampath D, Heavens D, Clissold L, Cao S, Chapman B, Dai F, Han Y, Li H, Li X, Lin C, McCooke JK, Tan C, Wang P, Wang S, Yin S, Zhou G, Poland JA, Bellgard MI, Borisjuk L, Houben A, Doležel J, Ayling S, Lonardi S, Kersey P, Langridge P, Muehlbauer GJ, Clark MD, Caccamo M, Schulman AH, Mayer KFX, Platzer M, Close TJ, Scholz U, Hansson M, Zhang G, Braumann I, Spannagl M, Li C, Waugh R, Stein N. A chromosome conformation capture ordered sequence of the barley genome. Nature 2017; 544:427-433. [DOI: 10.1038/nature22043] [Citation(s) in RCA: 966] [Impact Index Per Article: 120.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2016] [Accepted: 03/03/2017] [Indexed: 02/06/2023]
|
53
|
Lutz U, Nussbaumer T, Spannagl M, Diener J, Mayer KF, Schwechheimer C. Natural haplotypes of FLM non-coding sequences fine-tune flowering time in ambient spring temperatures in Arabidopsis. eLife 2017; 6. [PMID: 28294941 PMCID: PMC5388537 DOI: 10.7554/elife.22114] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 03/09/2017] [Indexed: 11/18/2022] Open
Abstract
Cool ambient temperatures are major cues determining flowering time in spring. The mechanisms promoting or delaying flowering in response to ambient temperature changes are only beginning to be understood. In Arabidopsis thaliana, FLOWERING LOCUS M (FLM) regulates flowering in the ambient temperature range and FLM is transcribed and alternatively spliced in a temperature-dependent manner. We identify polymorphic promoter and intronic sequences required for FLM expression and splicing. In transgenic experiments covering 69% of the available sequence variation in two distinct sites, we show that variation in the abundance of the FLM-ß splice form strictly correlate (R2 = 0.94) with flowering time over an extended vegetative period. The FLM polymorphisms lead to changes in FLM expression (PRO2+) but may also affect FLM intron 1 splicing (INT6+). This information could serve to buffer the anticipated negative effects on agricultural systems and flowering that may occur during climate change. DOI:http://dx.doi.org/10.7554/eLife.22114.001
Collapse
Affiliation(s)
- Ulrich Lutz
- Plant Systems Biology, Technische Universität München, Freising, Germany
| | - Thomas Nussbaumer
- Plant Genome and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Manuel Spannagl
- Plant Genome and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Julia Diener
- Plant Systems Biology, Technische Universität München, Freising, Germany
| | - Klaus Fx Mayer
- Plant Genome and Systems Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | | |
Collapse
|
54
|
Bauer E, Schmutzer T, Barilar I, Mascher M, Gundlach H, Martis MM, Twardziok SO, Hackauf B, Gordillo A, Wilde P, Schmidt M, Korzun V, Mayer KFX, Schmid K, Schön CC, Scholz U. Towards a whole-genome sequence for rye (Secale cereale L.). THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2017; 89:853-869. [PMID: 27888547 DOI: 10.1111/tpj.13436] [Citation(s) in RCA: 127] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Revised: 11/08/2016] [Accepted: 11/21/2016] [Indexed: 05/18/2023]
Abstract
We report on a whole-genome draft sequence of rye (Secale cereale L.). Rye is a diploid Triticeae species closely related to wheat and barley, and an important crop for food and feed in Central and Eastern Europe. Through whole-genome shotgun sequencing of the 7.9-Gbp genome of the winter rye inbred line Lo7 we obtained a de novo assembly represented by 1.29 million scaffolds covering a total length of 2.8 Gbp. Our reference sequence represents nearly the entire low-copy portion of the rye genome. This genome assembly was used to predict 27 784 rye gene models based on homology to sequenced grass genomes. Through resequencing of 10 rye inbred lines and one accession of the wild relative S. vavilovii, we discovered more than 90 million single nucleotide variants and short insertions/deletions in the rye genome. From these variants, we developed the high-density Rye600k genotyping array with 600 843 markers, which enabled anchoring the sequence contigs along a high-density genetic map and establishing a synteny-based virtual gene order. Genotyping data were used to characterize the diversity of rye breeding pools and genetic resources, and to obtain a genome-wide map of selection signals differentiating the divergent gene pools. This rye whole-genome sequence closes a gap in Triticeae genome research, and will be highly valuable for comparative genomics, functional studies and genome-based breeding in rye.
Collapse
Affiliation(s)
- Eva Bauer
- Technical University of Munich, Plant Breeding, Liesel-Beckmann-Str. 2, 85354, Freising, Germany
| | - Thomas Schmutzer
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466, Stadt Seeland, Germany
| | - Ivan Barilar
- Universität Hohenheim, Crop Biodiversity and Breeding Informatics, Fruwirthstr. 21, 70599, Stuttgart, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466, Stadt Seeland, Germany
| | - Heidrun Gundlach
- Helmholtz Zentrum München, Plant Genome and Systems Biology, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Mihaela M Martis
- Helmholtz Zentrum München, Plant Genome and Systems Biology, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Sven O Twardziok
- Helmholtz Zentrum München, Plant Genome and Systems Biology, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Bernd Hackauf
- Julius Kühn-Institute, Institute for Breeding Research on Agricultural Crops, Rudolf-Schick-Platz 3a, 18190, Sanitz, Germany
| | - Andres Gordillo
- KWS LOCHOW GMBH, Ferdinand-von-Lochow-Str. 5, 29303, Bergen, Germany
| | - Peer Wilde
- KWS LOCHOW GMBH, Ferdinand-von-Lochow-Str. 5, 29303, Bergen, Germany
| | - Malthe Schmidt
- KWS LOCHOW GMBH, Ferdinand-von-Lochow-Str. 5, 29303, Bergen, Germany
| | - Viktor Korzun
- KWS LOCHOW GMBH, Ferdinand-von-Lochow-Str. 5, 29303, Bergen, Germany
| | - Klaus F X Mayer
- Helmholtz Zentrum München, Plant Genome and Systems Biology, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Karl Schmid
- Universität Hohenheim, Crop Biodiversity and Breeding Informatics, Fruwirthstr. 21, 70599, Stuttgart, Germany
| | - Chris-Carolin Schön
- Technical University of Munich, Plant Breeding, Liesel-Beckmann-Str. 2, 85354, Freising, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466, Stadt Seeland, Germany
| |
Collapse
|
55
|
Zhang Y, Fan C, Li S, Chen Y, Wang RRC, Zhang X, Han F, Hu Z. The Diversity of Sequence and Chromosomal Distribution of New Transposable Element-Related Segments in the Rye Genome Revealed by FISH and Lineage Annotation. FRONTIERS IN PLANT SCIENCE 2017; 8:1706. [PMID: 29046683 PMCID: PMC5632726 DOI: 10.3389/fpls.2017.01706] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 09/19/2017] [Indexed: 05/18/2023]
Abstract
Transposable elements (TEs) in plant genomes exhibit a great variety of structure, sequence content and copy number, making them important drivers for species diversity and genome evolution. Even though a genome-wide statistic summary of TEs in rye has been obtained using high-throughput DNA sequencing technology, the accurate diversity of TEs in rye, as well as their chromosomal distribution and evolution, remains elusive due to the repetitive sequence assembling problems and the high dynamic and nested nature of TEs. In this study, using genomic plasmid library construction combined with dot-blot hybridization and fluorescence in situ hybridization (FISH) analysis, we successfully isolated 70 unique FISH-positive TE-related sequences including 47 rye genome specific ones: 30 showed homology or partial homology with previously FISH characterized sequences and 40 have not been characterized. Among the 70 sequences, 48 sequences carried Ty3/gypsy-derived segments, 7 sequences carried Ty1/copia-derived segments and 15 sequences carried segments homologous with multiple TE families. 26 TE lineages were found in the 70 sequences, and among these lineages, Wilma was found in sequences dispersed in all chromosome regions except telomeric positions; Abiba was found in sequences predominantly located at pericentromeric and centromeric positions; Wis, Carmilla, and Inga were found in sequences displaying signals dispersed from distal regions toward pericentromeric positions; except DNA transposon lineages, all the other lineages were found in sequences displaying signals dispersed from proximal regions toward distal regions. A high percentage (21.4%) of chimeric sequences were identified in this study and their high abundance in rye genome suggested that new TEs might form through recombination and nested transposition. Our results also gave proofs that diverse TE lineages were arranged at centromeric and pericentromeric positions in rye, and lineages like Abiba might play a role in their structural organization and function. All these results might help in understanding the diversity and evolution of TEs in rye, as well as their driving forces in rye genome organization and evolution.
Collapse
Affiliation(s)
- Yingxin Zhang
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- Center for Life Science, University of Chinese Academy of Sciences, Beijing, China
| | - Chengming Fan
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- *Correspondence: Chengming Fan, Zanmin Hu,
| | - Shuangshuang Li
- Department of Life Science, Henan Normal University, Xinxiang, China
| | - Yuhong Chen
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | - Richard R.-C. Wang
- Forage and Range Research Laboratory, United States Department of Agriculture, Agricultural Research Service, Utah State University, Logan, UT, United States
| | - Xiangqi Zhang
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | - Fangpu Han
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | - Zanmin Hu
- Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
- Center for Life Science, University of Chinese Academy of Sciences, Beijing, China
- *Correspondence: Chengming Fan, Zanmin Hu,
| |
Collapse
|
56
|
Hooper CM, Castleden IR, Tanz SK, Aryamanesh N, Millar AH. SUBA4: the interactive data analysis centre for Arabidopsis subcellular protein locations. Nucleic Acids Res 2016; 45:D1064-D1074. [PMID: 27899614 PMCID: PMC5210537 DOI: 10.1093/nar/gkw1041] [Citation(s) in RCA: 307] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 10/20/2016] [Indexed: 12/15/2022] Open
Abstract
The SUBcellular location database for Arabidopsis proteins (SUBA4, http://suba.live) is a comprehensive collection of manually curated published data sets of large-scale subcellular proteomics, fluorescent protein visualization, protein-protein interaction (PPI) as well as subcellular targeting calls from 22 prediction programs. SUBA4 contains an additional 35 568 localizations totalling more than 60 000 experimental protein location claims as well as 37 new suborganellar localization categories. The experimental PPI data has been expanded to 26 327 PPI pairs including 856 PPI localizations from experimental fluorescent visualizations. The new SUBA4 user interface enables users to choose quickly from the filter categories: ‘subcellular location’, ‘protein properties’, ‘protein–protein interaction’ and ‘affiliations’ to build complex queries. This allows substantial expansion of search parameters into 80 annotation types comprising 1 150 204 new annotations to study metadata associated with subcellular localization. The ‘BLAST’ tab contains a sequence alignment tool to enable a sequence fragment from any species to find the closest match in Arabidopsis and retrieve data on subcellular location. Using the location consensus SUBAcon, the SUBA4 toolbox delivers three novel data services allowing interactive analysis of user data to provide relative compartmental protein abundances and proximity relationship analysis of PPI and coexpression partners from a submitted list of Arabidopsis gene identifiers.
Collapse
Affiliation(s)
- Cornelia M Hooper
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Ian R Castleden
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Sandra K Tanz
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| | - Nader Aryamanesh
- Department of Genetics and Physiology, Biocenter Oulu, FIN-90014 University of Oulu, Finland
| | - A Harvey Millar
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|
57
|
Martinez M. Computational Tools for Genomic Studies in Plants. Curr Genomics 2016; 17:509-514. [PMID: 28217007 PMCID: PMC5282602 DOI: 10.2174/1389202917666160520103447] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Revised: 12/09/2015] [Accepted: 12/21/2015] [Indexed: 12/03/2022] Open
Abstract
In recent years, the genomic sequence of numerous plant species including the main crop species has been determined. Computational tools have been developed to deal with the issue of which plant has been sequenced and where is the sequence hosted. In this mini-review, the databases for genome projects, the databases created to host species/clade projects and the databases developed to perform plant comparative genomics are revised. Because of their importance in modern research, an in-depth analysis of the plant comparative genomics databases has been performed. This comparative analysis is focused in the common and specific computational tools developed to achieve the particular objectives of each database. Besides, emerging high-performance bioinformatics tools specific for plant research are commented. What kind of computational approaches should be implemented in next years to efficiently analyze plant genomes is discussed.
Collapse
Affiliation(s)
- Manuel Martinez
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid, Campus Montegancedo, 28223-Pozuelo de Alarcón, Madrid, Spain
| |
Collapse
|