1
|
Triesch S, Denton AK, Bouvier JW, Buchmann JP, Reichel-Deland V, Guerreiro RNFM, Busch N, Schlüter U, Stich B, Kelly S, Weber APM. Transposable elements contribute to the establishment of the glycine shuttle in Brassicaceae species. Plant Biol (Stuttg) 2024; 26:270-281. [PMID: 38168881 DOI: 10.1111/plb.13601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 11/15/2023] [Indexed: 01/05/2024]
Abstract
C3 -C4 intermediate photosynthesis has evolved at least five times convergently in the Brassicaceae, despite this family lacking bona fide C4 species. The establishment of this carbon concentrating mechanism is known to require a complex suite of ultrastructural modifications, as well as changes in spatial expression patterns, which are both thought to be underpinned by a reconfiguration of existing gene-regulatory networks. However, to date, the mechanisms which underpin the reconfiguration of these gene networks are largely unknown. In this study, we used a pan-genomic association approach to identify genomic features that could confer differential gene expression towards the C3 -C4 intermediate state by analysing eight C3 species and seven C3 -C4 species from five independent origins in the Brassicaceae. We found a strong correlation between transposable element (TE) insertions in cis-regulatory regions and C3 -C4 intermediacy. Specifically, our study revealed 113 gene models in which the presence of a TE within a gene correlates with C3 -C4 intermediate photosynthesis. In this set, genes involved in the photorespiratory glycine shuttle are enriched, including the glycine decarboxylase P-protein whose expression domain undergoes a spatial shift during the transition to C3 -C4 photosynthesis. When further interrogating this gene, we discovered independent TE insertions in its upstream region which we conclude to be responsible for causing the spatial shift in GLDP1 gene expression. Our findings hint at a pivotal role of TEs in the evolution of C3 -C4 intermediacy, especially in mediating differential spatial gene expression.
Collapse
Affiliation(s)
- S Triesch
- Institute for Plant Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| | - A K Denton
- Institute for Plant Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| | - J W Bouvier
- Department of Biology, University of Oxford, Oxford, UK
| | - J P Buchmann
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
- Institute for Biological Data Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - V Reichel-Deland
- Institute for Plant Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - R N F M Guerreiro
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - N Busch
- Institute for Plant Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - U Schlüter
- Institute for Plant Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| | - B Stich
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
- Institute for Quantitative Genetics and Genomics of Plants, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - S Kelly
- Department of Biology, University of Oxford, Oxford, UK
| | - A P M Weber
- Institute for Plant Biochemistry, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Düsseldorf, Germany
| |
Collapse
|
2
|
Charon J, Buchmann JP, Sadiq S, Holmes EC. RdRp-Scan: A Bioinformatic Resource to Identify and Annotate Divergent RNA Viruses in Metagenomic Sequence Data. Virus Evol 2022; 8:veac082. [PMID: 36533143 PMCID: PMC9752661 DOI: 10.1093/ve/veac082] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 08/22/2022] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open
Abstract
Abstract
Despite a rapid expansion in the number of documented viruses following the advent of metagenomic sequencing, the identification and annotation of highly divergent RNA viruses remains challenging, particularly from poorly characterized hosts and environmental samples. Protein structures are more conserved than primary sequence data, such that structure-based comparisons provide an opportunity to reveal the viral “dusk matter”: viral sequences with low, but detectable, levels of sequence identity to known viruses with available protein structures. Here, we present a new open computational and resource – RdRp-scan – that contains a standardized bioinformatic toolkit to identify and annotate divergent RNA viruses in metagenomic sequence data based on the detection of RNA dependent RNA polymerase (RdRp) sequences. By combining RdRp-specific Hidden Markov models (HMM) and structural comparisons we show that RdRp-scan can efficiently detect RdRp sequences with identity levels as low as 10% to those from known viruses and not identifiable using standard sequence-to-sequence comparisons. In addition, to facilitate the annotation and placement of newly detected and divergent virus-like sequences into the diversity of RNA viruses, RdRp-scan provides new custom and curated databases of viral RdRp sequences and core motifs, as well as pre-built RdRp multiple sequence alignments. In parallel, our analysis of the sequence diversity detected by RdRp-scan revealed that while most of the taxonomically unassigned RdRps fell into pre-established clusters, with some falling into potentially new orders of RNA viruses related to the Wolframvirales and Tolivirales. Finally, a survey of the conserved A, B and C RdRp motifs within the RdRp-scan sequence database revealed additional variations of both sequence and position that might provide new insights into the structure, function and evolution of viral polymerases.
Collapse
Affiliation(s)
- Justine Charon
- Sydney Institute for Infectious Diseases, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney , Sydney, NSW 2006, Australia
| | - Jan P Buchmann
- Institute for Biological Data Science, Heinrich-Heine-University , Universitätsstrasse 1, D-40225 Düsseldorf, Germany
| | - Sabrina Sadiq
- Sydney Institute for Infectious Diseases, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney , Sydney, NSW 2006, Australia
| | - Edward C Holmes
- Sydney Institute for Infectious Diseases, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney , Sydney, NSW 2006, Australia
| |
Collapse
|
3
|
Buchmann JP, Holmes EC. Collecting and managing taxonomic data with NCBI-taxonomist. Bioinformatics 2020; 36:5548-5550. [PMID: 33326008 PMCID: PMC8016462 DOI: 10.1093/bioinformatics/btaa1027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 11/16/2020] [Accepted: 11/30/2020] [Indexed: 12/04/2022] Open
Abstract
Summary We present NCBI-taxonomist—a command-line tool written in Python that collects and manages taxonomic data from the National Center for Biotechnology Information (NCBI). NCBI-taxonomist does not depend on a pre-downloaded taxonomic database but can store data locally. NCBI-taxonomist has six commands to map, collect, extract, resolve, import and group taxonomic data that can be linked together to create powerful analytical pipelines. Because many lifescience databases use the same taxonomic information, the data managed by NCBI-taxonomist is not limited to NCBI and can be used to find data linked to taxonomic information present in other scientific databases. Availability and implementation NCBI-taxonomist is implemented in Python 3 (≥3.8) and available at https://gitlab.com/janpb/ncbi-taxonomist and via PyPi (https://pypi.org/project/ncbi-taxonomist/), as a Docker container (https://gitlab.com/janpb/ncbi-taxonomist/container_registry/) and Singularity (v3.5.3) image (https://cloud.sylabs.io/library/jpb/ncbi-taxonomist). NCBI-taxonomist is licensed under the GPLv3.
Collapse
Affiliation(s)
- Jan P Buchmann
- Marie Bashir Institute for Infectious Diseases and Biosecurity, School of Life and Environmental Sciences and School of Medical Sciences, The University of Sydney, Sydney, Australia
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, School of Life and Environmental Sciences and School of Medical Sciences, The University of Sydney, Sydney, Australia
| |
Collapse
|
4
|
Martí-Carreras J, Gener AR, Miller SD, Brito AF, Camacho CE, Connor R, Deboutte W, Glickman C, Kristensen DM, Meyer WK, Modha S, Norris AL, Saha S, Belford AK, Biederstedt E, Brister JR, Buchmann JP, Cooley NP, Edwards RA, Javkar K, Muchow M, Muralidharan HS, Pepe-Ranney C, Shah N, Shakya M, Tisza MJ, Tully BJ, Vanmechelen B, Virta VC, Weissman JL, Zalunin V, Efremov A, Busby B. NCBI's Virus Discovery Codeathon: Building "FIVE" -The Federated Index of Viral Experiments API Index. Viruses 2020; 12:v12121424. [PMID: 33322070 PMCID: PMC7764237 DOI: 10.3390/v12121424] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 12/02/2020] [Indexed: 02/05/2023] Open
Abstract
Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus–host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.
Collapse
Affiliation(s)
- Joan Martí-Carreras
- Laboratory of Clinical and Epidemiological Virology, KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, BE3000 Leuven, Belgium; (W.D.); (C.G.); (B.V.)
- Correspondence: (J.M.-C); (A.R.G.); (R.C.); (B.B.)
| | - Alejandro Rafael Gener
- Integrative Molecular and Biomedical Sciences Program, Baylor College of Medicine, Houston, TX 77030, USA
- Margaret M. and Albert B. Alkek Department of Medicine, Nephrology, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Genetics, MD Anderson Cancer Center, Houston, TX 77030, USA
- School of Medicine, Universidad Central del Caribe, Bayamón, PR 00960, USA
- Correspondence: (J.M.-C); (A.R.G.); (R.C.); (B.B.)
| | - Sierra D. Miller
- Genetics & Molecular Biology, Millersville University, 40 Dilworth Rd, Millersville, PA 17551, USA;
| | - Anderson F. Brito
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health (YSPH), 60 College Street, New Haven, CT 06510, USA;
| | - Christiam E. Camacho
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA; (C.E.C.); (J.R.B.); (V.Z.); (A.E.)
| | - Ryan Connor
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA; (C.E.C.); (J.R.B.); (V.Z.); (A.E.)
- Correspondence: (J.M.-C); (A.R.G.); (R.C.); (B.B.)
| | - Ward Deboutte
- Laboratory of Clinical and Epidemiological Virology, KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, BE3000 Leuven, Belgium; (W.D.); (C.G.); (B.V.)
| | - Cody Glickman
- Laboratory of Clinical and Epidemiological Virology, KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, BE3000 Leuven, Belgium; (W.D.); (C.G.); (B.V.)
| | - David M. Kristensen
- Computational Bioscience Program, University of Colorado Anschutz, Aurora, CO 80045, USA;
| | - Wynn K. Meyer
- AAAS Science and Technology Policy Fellow, Office of Data Science Strategy, Division of Program Coordination, Planning, and Strategic Initiatives, Office of the Director, National Institutes of Health, 31 Center Dr., Bethesda, MD 20894, USA;
| | - Sejal Modha
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK;
| | - Alexis L. Norris
- Biotechnology Graduate Program, University of Maryland Global Campus, 1616 McCormick Drive, Largo, MD 20774, USA;
| | - Surya Saha
- Boyce Thompson Institute, Ithaca, NY 14850, USA;
- School of Animal and Comparative Biomedical Sciences, The University of Arizona, Tucson, AZ 85721, USA
| | - Anna K. Belford
- Laboratory of Cellular Oncology, National Cancer Institute, 37 Convent Dr., Bethesda, MD 20894, USA; (A.K.B.); (M.J.T.)
| | - Evan Biederstedt
- Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA;
| | - James Rodney Brister
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA; (C.E.C.); (J.R.B.); (V.Z.); (A.E.)
| | - Jan P. Buchmann
- School of Life and Environmental Sciences and School of Medical Sciences, Marie Bashir Institute for Infectious Diseases and Biosecurity, The University of Sydney, Sydney, Australia;
| | - Nicholas P. Cooley
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, USA;
| | - Robert A. Edwards
- College of Science and Engineering, Flinders University, Bedford Park, SA 5042, Australia;
| | - Kiran Javkar
- Department of Computer Science, University of Maryland, College Park, MD 20740, USA; (K.J.); (H.S.M.); (N.S.)
- Joint Institute for Food Safety and Applied Nutrition, University of Maryland, College Park, MD 20740, USA
| | - Michael Muchow
- Novel Microdevices, Nucleic Acids, Baltimore, MD 21202, USA;
| | - Harihara Subrahmaniam Muralidharan
- Department of Computer Science, University of Maryland, College Park, MD 20740, USA; (K.J.); (H.S.M.); (N.S.)
- Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20740, USA
| | | | - Nidhi Shah
- Department of Computer Science, University of Maryland, College Park, MD 20740, USA; (K.J.); (H.S.M.); (N.S.)
| | - Migun Shakya
- Bioscience Division, Bikini Atoll Road, Los Alamos National Laboratory, Los Alamos, NM 87545, USA;
| | - Michael J. Tisza
- Laboratory of Cellular Oncology, National Cancer Institute, 37 Convent Dr., Bethesda, MD 20894, USA; (A.K.B.); (M.J.T.)
| | - Benjamin J. Tully
- Center for Dark Energy Biosphere Investigations, University of Southern California, Los Angeles, CA 90089, USA;
| | - Bert Vanmechelen
- Laboratory of Clinical and Epidemiological Virology, KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, BE3000 Leuven, Belgium; (W.D.); (C.G.); (B.V.)
| | - Valerie C. Virta
- AAAS Science & Technology Policy Fellow, National Institutes of Health, Center for Information Technology, 6555 Rock Spring Drive, Bethesda, MD 20817, USA;
| | - JL Weissman
- Department of Marine and Environmental Biology, University of Southern California, Los Angeles, CA 90089, USA;
| | - Vadim Zalunin
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA; (C.E.C.); (J.R.B.); (V.Z.); (A.E.)
| | - Alexandre Efremov
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA; (C.E.C.); (J.R.B.); (V.Z.); (A.E.)
| | - Ben Busby
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA; (C.E.C.); (J.R.B.); (V.Z.); (A.E.)
- DNANexus, 1975 W El Camino Real #204, Mountain View, CA 94040, USA
- Correspondence: (J.M.-C); (A.R.G.); (R.C.); (B.B.)
| |
Collapse
|
5
|
Buchmann JP, Holmes EC. Entrezpy: a Python library to dynamically interact with the NCBI Entrez databases. Bioinformatics 2020; 35:4511-4514. [PMID: 31077305 PMCID: PMC6821292 DOI: 10.1093/bioinformatics/btz385] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 04/29/2019] [Accepted: 05/04/2019] [Indexed: 11/16/2022] Open
Abstract
Summary Entrezpy is a Python library that automates the querying and downloading of data from the Entrez databases at National Center for Biotechnology Information by interacting with E-Utilities. Entrezpy implements complex queries by automatically creating E-Utility parameters from the results obtained that can then be used directly in subsequent queries. Entrezpy also allows the user to cache and retrieve results locally, implements interactions with all Entrez databases as part of an analysis pipeline and adjusts parameters within an ongoing query or using prior results. Entrezpy’s modular design enables it to easily extend and adjust existing E-Utility functions. Availability and implementation Entrezpy is implemented in Python 3 (≥3.6) and depends only on the Python Standard Library. It is available via PyPi (https://pypi.org/project/entrezpy/) and at https://gitlab.com/ncbipy/entrezpy.git. Entrezpy is licensed under the LGPLv3 and also at http://entrezpy.readthedocs.io/.
Collapse
Affiliation(s)
- Jan P Buchmann
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW 2006, Australia
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
6
|
Marcelino VR, Clausen PTLC, Buchmann JP, Wille M, Iredell JR, Meyer W, Lund O, Sorrell TC, Holmes EC. CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data. Genome Biol 2020; 21:103. [PMID: 32345331 PMCID: PMC7189439 DOI: 10.1186/s13059-020-02014-2] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 04/13/2020] [Indexed: 01/19/2023] Open
Abstract
There is an increasing demand for accurate and fast metagenome classifiers that can not only identify bacteria, but all members of a microbial community. We used a recently developed concept in read mapping to develop a highly accurate metagenomic classification pipeline named CCMetagen. The pipeline substantially outperforms other commonly used software in identifying bacteria and fungi and can efficiently use the entire NCBI nucleotide collection as a reference to detect species with incomplete genome data from all biological kingdoms. CCMetagen is user-friendly, and the results can be easily integrated into microbial community analysis software for streamlined and automated microbiome studies.
Collapse
Affiliation(s)
- Vanessa R Marcelino
- Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia.
- Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia.
- School of Life & Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia.
| | - Philip T L C Clausen
- National Food Institute, Technical University of Denmark, 2800, Kgs Lyngby, Denmark
| | - Jan P Buchmann
- School of Life & Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| | - Michelle Wille
- WHO Collaborating Centre for Reference and Research on Influenza, The Peter Doherty Institute for Infection and Immunity, Melbourne, VIC, 3000, Australia
| | - Jonathan R Iredell
- Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia
- Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia
- Westmead Hospital (Research and Education Network), Westmead, NSW, 2145, Australia
| | - Wieland Meyer
- Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia
- Westmead Hospital (Research and Education Network), Westmead, NSW, 2145, Australia
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia
| | - Ole Lund
- National Food Institute, Technical University of Denmark, 2800, Kgs Lyngby, Denmark
| | - Tania C Sorrell
- Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia
- Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, Westmead, NSW, 2145, Australia
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity and Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, 2006, Australia
- School of Life & Environmental Sciences, Charles Perkins Centre, The University of Sydney, Sydney, NSW, 2006, Australia
| |
Collapse
|
7
|
Connor R, Brister R, Buchmann JP, Deboutte W, Edwards R, Martí-Carreras J, Tisza M, Zalunin V, Andrade-Martínez J, Cantu A, D'Amour M, Efremov A, Fleischmann L, Forero-Junco L, Garmaeva S, Giluso M, Glickman C, Henderson M, Kellman B, Kristensen D, Leubsdorf C, Levi K, Levi S, Pakala S, Peddu V, Ponsero A, Ribeiro E, Roy F, Rutter L, Saha S, Shakya M, Shean R, Miller M, Tully B, Turkington C, Youens-Clark K, Vanmechelen B, Busby B. NCBI's Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements. Genes (Basel) 2019; 10:E714. [PMID: 31527408 PMCID: PMC6771016 DOI: 10.3390/genes10090714] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Revised: 09/05/2019] [Accepted: 09/05/2019] [Indexed: 01/26/2023] Open
Abstract
A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University campus starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps; (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node; (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience; and (iv) a cloud infrastructure allows a diverse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon.
Collapse
Affiliation(s)
- Ryan Connor
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Jan P Buchmann
- Charles Perkins Centre, School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW 2006, Australia.
| | - Ward Deboutte
- KU Leuven, Department of Microbiology & Immunology, Rega Institute, Leuven BE3000, Belgium.
| | - Rob Edwards
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Joan Martí-Carreras
- KU Leuven, Department of Microbiology & Immunology, Rega Institute, Leuven BE3000, Belgium.
| | - Mike Tisza
- Lab of Cellular Oncology, NCI, NIH, Bethesda, MD 20892-4263, USA.
| | - Vadim Zalunin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Juan Andrade-Martínez
- Research Group on Computational Biology and Microbial Ecology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia. Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá 111711, Colombia.
| | - Adrian Cantu
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Michael D'Amour
- D'Amour & Associates, 11839 Hilltop Drive, Los Altos Hills, CA 94024, USA.
| | - Alexandre Efremov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Lydia Fleischmann
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Laura Forero-Junco
- Research Group on Computational Biology and Microbial Ecology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia. Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá 111711, Colombia.
| | - Sanzhima Garmaeva
- Department of Genetics, University Medical Center Groningen, Groningen 9713AV, The Netherlands.
| | - Melissa Giluso
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Cody Glickman
- Computational Bioscience Program, University of Colorado Anschutz, Aurora, CO 80045, USA.
| | - Margaret Henderson
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Benjamin Kellman
- Bioinformatics and Systems Biology Program, University of California at San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA.
| | - David Kristensen
- Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA.
| | - Carl Leubsdorf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| | - Kyle Levi
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Shane Levi
- Department of Biology, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182, USA.
| | - Suman Pakala
- Division of Infectious Diseases, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, USA.
| | - Vikas Peddu
- Department of Laboratory Medicine, University of Washington Virology, 1616 Eastlake Ave E, Seattle, WA 98102, USA.
| | - Alise Ponsero
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ 85716, USA.
| | - Eldred Ribeiro
- MITRE Corporation, 7515 Colshire Drive, McLean, VA 22102-7539, USA.
| | - Farrah Roy
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
| | | | - Surya Saha
- Boyce Thompson Institute, Ithaca, NY 14853, USA.
| | - Migun Shakya
- Bioscience Division, Los Alamos National Lab, Los Alamos, NM 87545, USA.
| | - Ryan Shean
- Department of Laboratory Medicine, University of Washington Virology, 1616 Eastlake Ave E, Seattle, WA 98102, USA.
| | - Matthew Miller
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ 85716, USA.
| | - Benjamin Tully
- Center for Dark Energy Biosphere Investigations, University of Southern California, Los Angeles, CA 90089, USA.
| | | | - Ken Youens-Clark
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ 85716, USA.
| | - Bert Vanmechelen
- KU Leuven, Department of Microbiology & Immunology, Rega Institute, Leuven BE3000, Belgium.
| | - Ben Busby
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD 20894, USA.
| |
Collapse
|
8
|
Schlub TE, Buchmann JP, Holmes EC. A Simple Method to Detect Candidate Overlapping Genes in Viruses Using Single Genome Sequences. Mol Biol Evol 2019; 35:2572-2581. [PMID: 30099499 PMCID: PMC6188560 DOI: 10.1093/molbev/msy155] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Overlapping genes in viruses maximize the coding capacity of their genomes and allow the generation of new genes without major increases in genome size. Despite their importance, the evolution and function of overlapping genes are often not well understood, in part due to difficulties in their detection. In addition, most bioinformatic approaches for the detection of overlapping genes require the comparison of multiple genome sequences that may not be available in metagenomic surveys of virus biodiversity. We introduce a simple new method for identifying candidate functional overlapping genes using single virus genome sequences. Our method uses randomization tests to estimate the expected length of open reading frames and then identifies overlapping open reading frames that significantly exceed this length and are thus predicted to be functional. We applied this method to 2548 reference RNA virus genomes and find that it has both high sensitivity and low false discovery for genes that overlap by at least 50 nucleotides. Notably, this analysis provided evidence for 29 previously undiscovered functional overlapping genes, some of which are coded in the antisense direction suggesting there are limitations in our current understanding of RNA virus replication.
Collapse
Affiliation(s)
- Timothy E Schlub
- Sydney School of Public Health, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Jan P Buchmann
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW , Australia
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW , Australia
| |
Collapse
|
9
|
Geoghegan JL, Pirotta V, Harvey E, Smith A, Buchmann JP, Ostrowski M, Eden JS, Harcourt R, Holmes EC. Virological Sampling of Inaccessible Wildlife with Drones. Viruses 2018; 10:v10060300. [PMID: 29865228 PMCID: PMC6024715 DOI: 10.3390/v10060300] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2018] [Revised: 05/31/2018] [Accepted: 05/31/2018] [Indexed: 11/16/2022] Open
Abstract
There is growing interest in characterizing the viromes of diverse mammalian species, particularly in the context of disease emergence. However, little is known about virome diversity in aquatic mammals, in part due to difficulties in sampling. We characterized the virome of the exhaled breath (or blow) of the Eastern Australian humpback whale (Megaptera novaeangliae). To achieve an unbiased survey of virome diversity, a meta-transcriptomic analysis was performed on 19 pooled whale blow samples collected via a purpose-built Unmanned Aerial Vehicle (UAV, or drone) approximately 3 km off the coast of Sydney, Australia during the 2017 winter annual northward migration from Antarctica to northern Australia. To our knowledge, this is the first time that UAVs have been used to sample viruses. Despite the relatively small number of animals surveyed in this initial study, we identified six novel virus species from five viral families. This work demonstrates the potential of UAVs in studies of virus disease, diversity, and evolution.
Collapse
Affiliation(s)
- Jemma L Geoghegan
- Department of Biological Sciences, Macquarie University, Sydney, NSW 2109, Australia.
| | - Vanessa Pirotta
- Department of Biological Sciences, Macquarie University, Sydney, NSW 2109, Australia.
| | - Erin Harvey
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW 2006, Australia.
| | - Alastair Smith
- Heliguy Scientific Pty Ltd., Sydney, NSW 2204, Australia.
| | - Jan P Buchmann
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW 2006, Australia.
| | - Martin Ostrowski
- Department of Molecular Sciences, Macquarie University, Sydney, NSW 2109, Australia.
| | - John-Sebastian Eden
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW 2006, Australia.
- Westmead Institute for Medical Research, Centre for Virus Research, Westmead, NSW 2145, Australia.
| | - Robert Harcourt
- Department of Biological Sciences, Macquarie University, Sydney, NSW 2109, Australia.
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, NSW 2006, Australia.
| |
Collapse
|
10
|
Simon-Loriere E, Faye O, Faye O, Koivogui L, Magassouba N, Keita S, Thiberge JM, Diancourt L, Bouchier C, Vandenbogaert M, Caro V, Fall G, Buchmann JP, Matranga CB, Sabeti PC, Manuguerra JC, Holmes EC, Sall AA. Distinct lineages of Ebola virus in Guinea during the 2014 West African epidemic. Nature 2015; 524:102-4. [PMID: 26106863 PMCID: PMC10601606 DOI: 10.1038/nature14612] [Citation(s) in RCA: 83] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 06/05/2015] [Indexed: 11/09/2022]
Abstract
An epidemic of Ebola virus disease of unprecedented scale has been ongoing for more than a year in West Africa. As of 29 April 2015, there have been 26,277 reported total cases (of which 14,895 have been laboratory confirmed) resulting in 10,899 deaths. The source of the outbreak was traced to the prefecture of Guéckédou in the forested region of southeastern Guinea. The virus later spread to the capital, Conakry, and to the neighbouring countries of Sierra Leone, Liberia, Nigeria, Senegal and Mali. In March 2014, when the first cases were detected in Conakry, the Institut Pasteur of Dakar, Senegal, deployed a mobile laboratory in Donka hospital to provide diagnostic services to the greater Conakry urban area and other regions of Guinea. Through this process we sampled 85 Ebola viruses (EBOV) from patients infected from July to November 2014, and report their full genome sequences here. Phylogenetic analysis reveals the sustained transmission of three distinct viral lineages co-circulating in Guinea, including the urban setting of Conakry and its surroundings. One lineage is unique to Guinea and closely related to the earliest sampled viruses of the epidemic. A second lineage contains viruses probably reintroduced from neighbouring Sierra Leone on multiple occasions, while a third lineage later spread from Guinea to Mali. Each lineage is defined by multiple mutations, including non-synonymous changes in the virion protein 35 (VP35), glycoprotein (GP) and RNA-dependent RNA polymerase (L) proteins. The viral GP is characterized by a glycosylation site modification and mutations in the mucin-like domain that could modify the outer shape of the virion. These data illustrate the ongoing ability of EBOV to develop lineage-specific and potentially phenotypically important variation.
Collapse
Affiliation(s)
- Etienne Simon-Loriere
- Institut Pasteur, Functional Genetics of Infectious Diseases Unit, Paris Cedex 15, 75724 France
- CNRS URA3012, Paris, 75015 France
| | - Ousmane Faye
- Institut Pasteur de Dakar, Arbovirus and Viral Hemorrhagic Fever Unit, Dakar, BP 220 Senegal
| | - Oumar Faye
- Institut Pasteur de Dakar, Arbovirus and Viral Hemorrhagic Fever Unit, Dakar, BP 220 Senegal
| | - Lamine Koivogui
- Institut National de Santé Publique de Guinée, Conakry, Guinea
| | - Nfaly Magassouba
- Projet de fièvres hémorragiques de Guinée, Université Gamal Abdel Nasser, Conakry, BP 1147 Guinea
| | | | - Jean-Michel Thiberge
- Institut Pasteur, Unité Environnement et Risques Infectieux, Cellule d’Intervention Biologique d’Urgence, Paris Cedex 15, 75724 France
| | - Laure Diancourt
- Institut Pasteur, Unité Environnement et Risques Infectieux, Cellule d’Intervention Biologique d’Urgence, Paris Cedex 15, 75724 France
| | | | - Matthias Vandenbogaert
- Institut Pasteur, Unité Environnement et Risques Infectieux, Cellule d’Intervention Biologique d’Urgence, Paris Cedex 15, 75724 France
| | - Valérie Caro
- Institut Pasteur, Unité Environnement et Risques Infectieux, Cellule d’Intervention Biologique d’Urgence, Paris Cedex 15, 75724 France
| | - Gamou Fall
- Institut Pasteur de Dakar, Arbovirus and Viral Hemorrhagic Fever Unit, Dakar, BP 220 Senegal
| | - Jan P. Buchmann
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Biological Sciences and Sydney Medical School, The University of Sydney, Sydney, 2006 New South Wales Australia
| | | | - Pardis C. Sabeti
- Broad Institute, 75 Ames Street, Cambridge, 02142 Massachusetts USA
- Department of Organismic and Evolutionary Biology, FAS Center for Systems Biology, Harvard University, 52 Oxford Street, Cambridge, 02138 Massachusetts USA
| | - Jean-Claude Manuguerra
- Institut Pasteur, Unité Environnement et Risques Infectieux, Cellule d’Intervention Biologique d’Urgence, Paris Cedex 15, 75724 France
| | - Edward C. Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Biological Sciences and Sydney Medical School, The University of Sydney, Sydney, 2006 New South Wales Australia
| | - Amadou A. Sall
- Institut Pasteur de Dakar, Arbovirus and Viral Hemorrhagic Fever Unit, Dakar, BP 220 Senegal
| |
Collapse
|
11
|
Buchmann JP, Löytynoja A, Wicker T, Schulman AH. Analysis of CACTA transposases reveals intron loss as major factor influencing their exon/intron structure in monocotyledonous and eudicotyledonous hosts. Mob DNA 2014; 5:24. [PMID: 25206928 PMCID: PMC4158355 DOI: 10.1186/1759-8753-5-24] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 08/18/2014] [Indexed: 01/20/2023] Open
Abstract
Background CACTA elements are DNA transposons and are found in numerous organisms. Despite their low activity, several thousand copies can be identified in many genomes. CACTA elements transpose using a ‘cut-and-paste’ mechanism, which is facilitated by a DDE transposase. DDE transposases from CACTA elements contain, despite their conserved function, different exon numbers among various CACTA families. While earlier studies analyzed the ancestral history of the DDE transposases, no studies have examined exon loss and gain with a view of mechanisms that could drive the changes. Results We analyzed 64 transposases from different CACTA families among monocotyledonous and eudicotyledonous host species. The annotation of the exon/intron boundaries showed a range from one to six exons. A robust multiple sequence alignment of the 64 transposases based on their protein sequences was created and used for phylogenetic analysis, which revealed eight different clades. We observed that the exon numbers in CACTA transposases are not specific for a host genome. We found that ancient CACTA lineages diverged before the divergence of monocotyledons and eudicotyledons. Most exon/intron boundaries were found in three distinct regions among all the transposases, grouping 63 conserved intron/exon boundaries. Conclusions We propose a model for the ancestral CACTA transposase gene, which consists of four exons, that predates the divergence of the monocotyledons and eudicotyledons. Based on this model, we propose pathways of intron loss or gain to explain the observed variation in exon numbers. While intron loss appears to have prevailed, a putative case of intron gain was nevertheless observed.
Collapse
Affiliation(s)
- Jan P Buchmann
- Institute of Biotechnology, Viikki Biocenter, University of Helsinki, PO Box 65, FIN-00014 Helsinki, Finland ; Present address: Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Center, University of Sydney, Sydney NSW 2006, Australia
| | - Ari Löytynoja
- Institute of Biotechnology, Viikki Biocenter, University of Helsinki, PO Box 65, FIN-00014 Helsinki, Finland
| | - Thomas Wicker
- Institute of Plant Biology, University of Zurich, Zollikerstrasse 107, Zurich, Switzerland
| | - Alan H Schulman
- Institute of Biotechnology, Viikki Biocenter, University of Helsinki, PO Box 65, FIN-00014 Helsinki, Finland ; Biotechnology and Food Research, MTT Agrifood Research Finland, Myllytie 1, FIN-31600 Jokioinen, Finland
| |
Collapse
|
12
|
Moisy C, Schulman AH, Kalendar R, Buchmann JP, Pelsy F. The Tvv1 retrotransposon family is conserved between plant genomes separated by over 100 million years. Theor Appl Genet 2014; 127:1223-35. [PMID: 24590356 DOI: 10.1007/s00122-014-2293-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Accepted: 02/21/2014] [Indexed: 05/18/2023]
Abstract
Combining several different approaches, we have examined the structure, variability, and distribution of Tvv1 retrotransposons. Tvv1 is an unusual example of a low-copy retrotransposon metapopulation dispersed unevenly among very distant species and is promising for the development of molecular markers. Retrotransposons are ubiquitous throughout the genomes of the vascular plants, but individual retrotransposon families tend to be confined to the level of plant genus or at most family. This restricts the general applicability of a family as molecular markers. Here, we characterize a new plant retrotransposon named Tvv1_Sdem, a member of the Copia superfamily of LTR retrotransposons, from the genome of the wild potato Solanum demissum. Comparative analyses based on structure and sequence showed a high level of similarity of Tvv1_Sdem with Tvv1-VB, a retrotransposon previously described in the grapevine genome Vitis vinifera. Extending the analysis to other species by in silico and in vitro approaches revealed the presence of Tvv1 family members in potato, tomato, and poplar genomes, and led to the identification of full-length copies of Tvv1 in these species. We were also able to identify polymorphism in UTL sequences between Tvv1_Sdem copies from wild and cultivated potatoes that are useful as molecular markers. Combining different approaches, our results suggest that the Tvv1 family of retrotransposons has a monophyletic origin and has been maintained in both the rosids and the asterids, the major clades of dicotyledonous plants, since their divergence about 100 MYA. To our knowledge, Tvv1 represents an unusual plant retrotransposon metapopulation comprising highly similar members disjointedly dispersed among very distant species. The twin features of Tvv1 presence in evolutionarily distant genomes and the diversity of its UTL region in each species make it useful as a source of robust molecular markers for diversity studies and breeding.
Collapse
Affiliation(s)
- Cédric Moisy
- MTT/BI Plant Genomics Lab, Institute of Biotechnology, University of Helsinki, P.O. Box 65, Biocenter 3, Viikinkaari 1, 00014, Helsinki, Finland,
| | | | | | | | | |
Collapse
|
13
|
Shatalina M, Wicker T, Buchmann JP, Oberhaensli S, Simková H, Doležel J, Keller B. Genotype-specific SNP map based on whole chromosome 3B sequence information from wheat cultivars Arina and Forno. Plant Biotechnol J 2013; 11:23-32. [PMID: 23046423 DOI: 10.1111/pbi.12003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2012] [Revised: 08/27/2012] [Accepted: 08/30/2012] [Indexed: 05/10/2023]
Abstract
Agronomically important traits are frequently controlled by rare, genotype-specific alleles. Such genes can only be mapped in a population derived from the donor genotype. This requires the development of a specific genetic map, which is difficult in wheat because of the low level of polymorphism among elite cultivars. The absence of sufficient polymorphism, the complexity of the hexaploid wheat genome as well as the lack of complete sequence information make the construction of genetic maps with a high density of reproducible and polymorphic markers challenging. We developed a genotype-specific genetic map of chromosome 3B from winter wheat cultivars Arina and Forno. Chromosome 3B was isolated from the two cultivars and then sequenced to 10-fold coverage. This resulted in a single-nucleotide polymorphisms (SNP) database of the complete chromosome. Based on proposed synteny with the Brachypodium model genome and gene annotation, sequences close to coding regions were used for the development of 70 SNP-based markers. They were mapped on a Arina × Forno Recombinant Inbred Lines population and found to be spread over the complete chromosome 3B. While overall synteny was well maintained, numerous exceptions and inversions of syntenic gene order were identified. Additionally, we found that the majority of recombination events occurred in distal parts of chromosome 3B, particularly in hot-spot regions. Compared with the earlier map based on SSR and RFLP markers, the number of markers increased fourfold. The approach presented here allows fast development of genotype-specific polymorphic markers that can be used for mapping and marker-assisted selection.
Collapse
|
14
|
Oberhaensli S, Parlange F, Buchmann JP, Jenny FH, Abbott JC, Burgis TA, Spanu PD, Keller B, Wicker T. Comparative sequence analysis of wheat and barley powdery mildew fungi reveals gene colinearity, dates divergence and indicates host-pathogen co-evolution. Fungal Genet Biol 2011; 48:327-34. [DOI: 10.1016/j.fgb.2010.10.003] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2010] [Revised: 09/29/2010] [Accepted: 10/06/2010] [Indexed: 12/24/2022]
|
15
|
Abstract
Colinearity of genes in plant genomes generally decreases with increasing evolutionary distance while the actual number of genes remains more or less constant. To characterize the molecular mechanisms of this "gene movement," we identified non-colinear genes by three-way comparison of the genomes of Brachypodium, rice, and sorghum. We found that genomic fragments of up to 50 kb containing the non-colinear genes are duplicated to acceptor sites elsewhere in the genome. Apparent movement of genes may usually be the result of subsequent deletions of genes in the donor region. Often, the duplicated fragments are precisely bordered by transposable elements (TEs) at the acceptor site. Highly diagnostic sequence motifs at these borders strongly suggest that these gene movements were the result of double-strand break (DSB) repair through synthesis-dependent strand annealing. In these cases, a copy of the foreign DNA fragment is used as filler DNA to repair the DSB linked with the transposition of TEs. Interestingly, most TEs we found associated with gene movement have a very low copy number in the genome and for several we did not find autonomous copies. This suggests that some of these elements spontaneously arose from unspecific interaction with TE proteins that are encoded by autonomous elements. Additionally, we found evidence that gene movements can also be caused when DSBs are repaired after template slippage or unequal crossing-over events. The observed frequency of gene movements can explain the erosion of gene colinearity between plant genomes during evolution.
Collapse
Affiliation(s)
- Thomas Wicker
- Institute of Plant Biology, University Zurich, CH-8008 Zurich, Switzerland.
| | | | | |
Collapse
|