1
|
Reed CJ, Denise R, Hourihan J, Babor J, Jaroch M, Martinelli M, Hutinet G, de Crécy-Lagard V. Beyond blast: enabling microbiologists to better extract literature, taxonomic distributions and gene neighbourhood information for protein families. Microb Genom 2024; 10:001183. [PMID: 38323604 PMCID: PMC10926702 DOI: 10.1099/mgen.0.001183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Accepted: 01/08/2024] [Indexed: 02/08/2024] Open
Abstract
Capturing the published corpus of information on all members of a given protein family should be an essential step in any study focusing on specific members of that family. Using a previously gathered dataset of more than 280 references mentioning a member of the DUF34 (NIF3/Ngg1-interacting Factor 3) family, we evaluated the efficiency of different databases and search tools, and devised a workflow that experimentalists can use to capture the most information published on members of a protein family in the least amount of time. To complement this workflow, web-based platforms allowing for the exploration of protein family members across sequenced genomes or for the analysis of gene neighbourhood information were reviewed for their versatility and ease of use. Recommendations that can be used for experimentalist users, as well as educators, are provided and integrated within a customized, publicly accessible Wiki.
Collapse
Affiliation(s)
- Colbie J. Reed
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Rémi Denise
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
- APC Microbiome Ireland, University College Cork, Cork, Ireland
| | - Jacob Hourihan
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Jill Babor
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Marshall Jaroch
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
| | - Maria Martinelli
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL, USA
| | | | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, USA
- Department of Biology, Haverford College, Haverford, PA, USA
- UF Genetics Institute, University of Florida, Gainesville, FL, USA
| |
Collapse
|
2
|
Pinheiro A, Borges JR, Côrte-Real JV, Esteves PJ. Evolution of guanylate binding protein genes shows a remarkable variability within bats (Chiroptera). Front Immunol 2024; 15:1329098. [PMID: 38357541 PMCID: PMC10864436 DOI: 10.3389/fimmu.2024.1329098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 01/16/2024] [Indexed: 02/16/2024] Open
Abstract
Background GBPs (guanylate binding proteins), an evolutionary ancient protein family, play a key role in the host's innate immune response against bacterial, parasitic and viral infections. In Humans, seven GBP genes have been described (GBP1-7). Despite the interest these proteins have received over the last years, evolutionary studies have only been performed in primates, Tupaia and rodents. These have shown a pattern of gene gain and loss in each family, indicative of the birth-and-death evolution process. Results In this study, we analysed the evolution of this gene cluster in several bat species, belonging to the Yangochiroptera and Yinpterochiroptera sub-orders. Detailed analysis shows a conserved synteny and a gene expansion and loss history. Phylogenetic analysis showed that bats have GBPs 1,2 and 4-6. GBP2 has been lost in several bat families, being present only in Hipposideidae and Pteropodidae. GBPs1, 4 and 5 are present mostly as single-copy genes in all families but have suffered duplication events, particularly in Myotis myotis and Eptesicus fuscus. Most interestingly, we demonstrate that GBP6 duplicated in a Chiroptera ancestor species originating two genes, which we named GBP6a and GBP6b, with different subsequent evolutionary histories. GBP6a underwent several duplication events in all families while GBP6b is present as a single copy gene and has been lost in Pteropodidae, Miniopteridae and Desmodus rotundus, a Phyllostomidae. With 14 and 15 GBP genes, Myotis myotis and Eptesicus fuscus stand out as having far more copies than all other studied bat species. Antagonistically, Pteropodidae have the lowest number of GBP genes in bats. Conclusion Bats are important reservoirs of viruses, many of which have become zoonotic diseases in the last decades. Further functional studies on bats GBPs will help elucidate their function, evolutionary history, and the role of bats as virus reservoirs.
Collapse
Affiliation(s)
- Ana Pinheiro
- CIBIO-UP, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, InBIO, Laboratório Associado, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - J. Ricardo Borges
- CIBIO-UP, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, InBIO, Laboratório Associado, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - João Vasco Côrte-Real
- CIBIO-UP, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, InBIO, Laboratório Associado, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
- Max von Pettenkofer Institute and Gene Center, Virology, National Reference Center for Retroviruses, Faculty of Medicine, Ludwig Maximilian University of Munich (LMU) München, Munich, Germany
| | - Pedro J. Esteves
- CIBIO-UP, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, InBIO, Laboratório Associado, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
- CITS - Centro de Investigação em Tecnologias de Saúde, CESPU, Gandra, Portugal
| |
Collapse
|
3
|
Reed CJ, Denise R, Hourihan J, Babor J, Jaroch M, Martinelli M, Hutinet G, de Crécy-Lagard V. Beyond Blast: Enabling Microbiologists to Better Extract Literature, Taxonomic Distributions and Gene Neighborhood Information for Protein Families. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.05.03.539116. [PMID: 37205517 PMCID: PMC10187207 DOI: 10.1101/2023.05.03.539116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Capturing the published corpus of information on all members of a given protein family should be an essential step in any study focusing on specific members of that said family. Using a previously gathered dataset of more than 280 references mentioning a member of the DUF34 (NIF3/Ngg1-interacting Factor 3), we evaluated the efficiency of different databases and search tools, and devised a workflow that experimentalists can use to capture the most published information on members of a protein family in the least amount of time. To complement this workflow, web-based platforms allowing for the exploration of protein family members across sequenced genomes or for the analysis of gene neighborhood information were reviewed for their versatility and ease of use. Recommendations that can be used for experimentalist users, as well as educators, are provided and integrated within a customized, publicly accessible Wiki.
Collapse
Affiliation(s)
- Colbie J. Reed
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA
| | - Rémi Denise
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA
| | - Jacob Hourihan
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA
| | - Jill Babor
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA
| | - Marshall Jaroch
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA
| | - Maria Martinelli
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA
| | - Geoffrey Hutinet
- Department of Biology, Haverford College, 370 Lancaster Avenue, Haverford, PA 19041, USA
| | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, University of Florida, Gainesville, FL 32611, USA
- Department of Biology, Haverford College, 370 Lancaster Avenue, Haverford, PA 19041, USA
- University of Florida Genetics Institute, Gainesville, FL 32610, USA
| |
Collapse
|
4
|
Vertacnik KL, Herrig DK, Godfrey RK, Hill T, Geib SM, Unckless RL, Nelson DR, Linnen CR. Evolution of five environmentally responsive gene families in a pine-feeding sawfly, Neodiprion lecontei (Hymenoptera: Diprionidae). Ecol Evol 2023; 13:e10506. [PMID: 37791292 PMCID: PMC10542623 DOI: 10.1002/ece3.10506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2023] [Revised: 07/17/2023] [Accepted: 07/21/2023] [Indexed: 10/05/2023] Open
Abstract
A central goal in evolutionary biology is to determine the predictability of adaptive genetic changes. Despite many documented cases of convergent evolution at individual loci, little is known about the repeatability of gene family expansions and contractions. To address this void, we examined gene family evolution in the redheaded pine sawfly Neodiprion lecontei, a noneusocial hymenopteran and exemplar of a pine-specialized lineage evolved from angiosperm-feeding ancestors. After assembling and annotating a draft genome, we manually annotated multiple gene families with chemosensory, detoxification, or immunity functions before characterizing their genomic distributions and molecular evolution. We find evidence of recent expansions of bitter gustatory receptor, clan 3 cytochrome P450, olfactory receptor, and antimicrobial peptide subfamilies, with strong evidence of positive selection among paralogs in a clade of gustatory receptors possibly involved in the detection of bitter compounds. In contrast, these gene families had little evidence of recent contraction via pseudogenization. Overall, our results are consistent with the hypothesis that in response to novel selection pressures, gene families that mediate ecological interactions may expand and contract predictably. Testing this hypothesis will require the comparative analysis of high-quality annotation data from phylogenetically and ecologically diverse insect species and functionally diverse gene families. To this end, increasing sampling in under-sampled hymenopteran lineages and environmentally responsive gene families and standardizing manual annotation methods should be prioritized.
Collapse
Affiliation(s)
- Kim L. Vertacnik
- Department of EntomologyUniversity of KentuckyLexingtonKentuckyUSA
| | | | - R. Keating Godfrey
- McGuire Center for Lepidoptera and Biodiversity, University of FloridaGainesvilleFloridaUSA
| | - Tom Hill
- National Institute of Allergy and Infectious DiseasesBethesdaMarylandUSA
| | - Scott M. Geib
- Tropical Crop and Commodity Protection Research UnitUnited States Department of Agriculture: Agriculture Research Service Pacific Basin Agricultural Research CenterHiloHawaiiUSA
| | - Robert L. Unckless
- Department of Molecular BiosciencesUniversity of KansasLawrenceKansasUSA
| | - David R. Nelson
- Department of Microbiology, Immunology and BiochemistryUniversity of Tennessee Health Science CenterMemphisTennesseeUSA
| | | |
Collapse
|
5
|
Liu Z, Du Y, Sun Z, Cheng B, Bi Z, Yao Z, Liang Y, Zhang H, Yao R, Kang S, Shi Y, Wan H, Qin D, Xiang L, Leng L, Chen S. Manual correction of genome annotation improved alternative splicing identification of Artemisia annua. PLANTA 2023; 258:83. [PMID: 37721598 DOI: 10.1007/s00425-023-04237-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 09/04/2023] [Indexed: 09/19/2023]
Abstract
Gene annotation is essential for genome-based studies. However, algorithm-based genome annotation is difficult to fully and correctly reveal genomic information, especially for species with complex genomes. Artemisia annua L. is the only commercial resource of artemisinin production though the content of artemisinin is still to be improved. Genome-based genetic modification and breeding are useful strategies to boost artemisinin content and therefore, ensure the supply of artemisinin and reduce costs, but better gene annotation is urgently needed. In this study, we manually corrected the newly released genome annotation of A. annua using second- and third-generation transcriptome data. We found that incorrect gene information may lead to differences in structural, functional, and expression levels compared to the original expectations. We also identified alternative splicing events and found that genome annotation information impacted identifying alternative splicing genes. We further demonstrated that genome annotation information and alternative splicing could affect gene expression estimation and gene function prediction. Finally, we provided a valuable version of A. annua genome annotation and demonstrated the importance of gene annotation in future research.
Collapse
Affiliation(s)
- Zhaoyu Liu
- School of Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin, 300193, China
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China
| | - Yupeng Du
- College of Life Science, Northeast Forestry University, Harbin, 150040, China
| | - Zhihao Sun
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China
- School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China
| | - Bohan Cheng
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Zenghao Bi
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China
| | - Zhicheng Yao
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Yuting Liang
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China
| | - Huiling Zhang
- College of Horticulture, Sichuan Agricultural University, Chengdu, 611130, China
| | - Run Yao
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China
| | - Shen Kang
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China
| | - Yuhua Shi
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Huihua Wan
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Dou Qin
- Prescription Laboratory of Xinjiang Traditional Uyghur Medicine, Xinjiang Institute of Traditional Uyghur Medicine, Urmuqi, 830000, China
| | - Li Xiang
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China.
- Prescription Laboratory of Xinjiang Traditional Uyghur Medicine, Xinjiang Institute of Traditional Uyghur Medicine, Urmuqi, 830000, China.
| | - Liang Leng
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China.
| | - Shilin Chen
- School of Chinese Materia Medica, Tianjin University of Traditional Chinese Medicine, Tianjin, 300193, China.
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China.
| |
Collapse
|
6
|
Chafra F, Borim Correa F, Oni F, Konu Karakayalı Ö, Stadler PF, Nunes da Rocha U. StandEnA: a customizable workflow for standardized annotation and generating a presence-absence matrix of proteins. BIOINFORMATICS ADVANCES 2023; 3:vbad069. [PMID: 37448812 PMCID: PMC10336186 DOI: 10.1093/bioadv/vbad069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 04/20/2023] [Accepted: 06/08/2023] [Indexed: 07/15/2023]
Abstract
Motivation Several genome annotation tools standardize annotation outputs for comparability. During standardization, these tools do not allow user-friendly customization of annotation databases; limiting their flexibility and applicability in downstream analysis. Results StandEnA is a user-friendly command-line tool for Linux that facilitates the generation of custom databases by retrieving protein sequences from multiple databases. Directed by a user-defined list of standard names, StandEnA retrieves synonyms to search for corresponding sequences in a set of public databases. Custom databases are used in prokaryotic genome annotation to generate standardized presence-absence matrices and reference files containing standard database identifiers. To showcase StandEnA, we applied it to six metagenome-assembled genomes to analyze three different pathways. Availability and implementation StandEnA is an open-source software available at https://github.com/mdsufz/StandEnA. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Fatma Chafra
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research—UFZ, Leipzig 04318, Germany
- Department of Molecular Biology and Genetics, Bilkent University, Ankara 06800, Turkey
| | - Felipe Borim Correa
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research—UFZ, Leipzig 04318, Germany
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig 04107, Germany
| | - Faith Oni
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research—UFZ, Leipzig 04318, Germany
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig 04107, Germany
| | - Özlen Konu Karakayalı
- Department of Molecular Biology and Genetics, Bilkent University, Ankara 06800, Turkey
- Interdisciplinary Program in Neuroscience, Bilkent University, Ankara 06800, Turkey
- UNAM-Institute of Materials Science and Nanotechnology, Bilkent University, Ankara 06800, Turkey
| | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig 04107, Germany
- Interdisciplinary Center for Bioinformatics, German Center for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, Leipzig Research Center for Civilization Diseases, Leipzig Research Center for Civilization Diseases (LIFE), University of Leipzig, Leipzig 04109, Germany
- Max Planck Institute for Mathematics in the Sciences, Leipzig 04103, Germany
- Institute for Theoretical Chemistry, University of Vienna, Vienna 1090, Austria
- Facultad de Ciencias, Universidad National de Colombia, Sede Bogotá 111711, Colombia
- Santa Fe Institute, Santa Fe, NM 87501, USA
| | - Ulisses Nunes da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research—UFZ, Leipzig 04318, Germany
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig 04107, Germany
| |
Collapse
|
7
|
Leistikow KR, Beattie RE, Hristova KR. Probiotics beyond the farm: Benefits, costs, and considerations of using antibiotic alternatives in livestock. FRONTIERS IN ANTIBIOTICS 2022; 1:1003912. [PMID: 39816405 PMCID: PMC11732145 DOI: 10.3389/frabi.2022.1003912] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Accepted: 09/22/2022] [Indexed: 01/18/2025]
Abstract
The increasing global expansion of antimicrobial resistant infections warrants the development of effective antibiotic alternative therapies, particularly for use in livestock production, an agricultural sector that is perceived to disproportionately contribute to the antimicrobial resistance (AMR) crisis by consuming nearly two-thirds of the global antibiotic supply. Probiotics and probiotic derived compounds are promising alternative therapies, and their successful use in disease prevention, treatment, and animal performance commands attention. However, insufficient or outdated probiotic screening techniques may unintentionally contribute to this crisis, and few longitudinal studies have been conducted to determine what role probiotics play in AMR dissemination in animal hosts and the surrounding environment. In this review, we briefly summarize the current literature regarding the efficacy, feasibility, and limitations of probiotics, including an evaluation of their impact on the animal microbiome and resistome and their potential to influence AMR in the environment. Probiotic application for livestock is often touted as an ideal alternative therapy that might reduce the need for antibiotic use in agriculture and the negative downstream impacts. However, as detailed in this review, limited research has been conducted linking probiotic usage with reductions in AMR in agricultural or natural environments. Additionally, we discuss the methods, including limitations, of current probiotic screening techniques across the globe, highlighting approaches aimed at reducing antibiotic usage and ensuring safe and effective probiotic mediated health outcomes. Based on this information, we propose economic and logistical considerations for bringing probiotic therapies to market including regulatory roadblocks, future innovations, and the significant gaps in knowledge requiring additional research to ensure probiotics are suitable long-term options for livestock producers as an antibiotic alternative therapy.
Collapse
Affiliation(s)
- Kyle R. Leistikow
- Department of Biological Sciences, Marquette University, Milwaukee, WI, United States
| | - Rachelle E. Beattie
- U.S. Geological Survey, Columbia Environmental Research Center, Columbia, MO, United States
| | | |
Collapse
|
8
|
Cervantes-Gracia K, Chahwan R, Husi H. Integrative OMICS Data-Driven Procedure Using a Derivatized Meta-Analysis Approach. Front Genet 2022; 13:828786. [PMID: 35186042 PMCID: PMC8855827 DOI: 10.3389/fgene.2022.828786] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 01/12/2022] [Indexed: 12/24/2022] Open
Abstract
The wealth of high-throughput data has opened up new opportunities to analyze and describe biological processes at higher resolution, ultimately leading to a significant acceleration of scientific output using high-throughput data from the different omics layers and the generation of databases to store and report raw datasets. The great variability among the techniques and the heterogeneous methodologies used to produce this data have placed meta-analysis methods as one of the approaches of choice to correlate the resultant large-scale datasets from different research groups. Through multi-study meta-analyses, it is possible to generate results with greater statistical power compared to individual analyses. Gene signatures, biomarkers and pathways that provide new insights of a phenotype of interest have been identified by the analysis of large-scale datasets in several fields of science. However, despite all the efforts, a standardized regulation to report large-scale data and to identify the molecular targets and signaling networks is still lacking. Integrative analyses have also been introduced as complementation and augmentation for meta-analysis methodologies to generate novel hypotheses. Currently, there is no universal method established and the different methods available follow different purposes. Herein we describe a new unifying, scalable and straightforward methodology to meta-analyze different omics outputs, but also to integrate the significant outcomes into novel pathways describing biological processes of interest. The significance of using proper molecular identifiers is highlighted as well as the potential to further correlate molecules from different regulatory levels. To show the methodology’s potential, a set of transcriptomic datasets are meta-analyzed as an example.
Collapse
Affiliation(s)
| | - Richard Chahwan
- Institute of Experimental Immunology, University of Zurich, Zurich, Switzerland
- *Correspondence: Richard Chahwan, ; Holger Husi,
| | - Holger Husi
- Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow, United Kingdom
- Division of Biomedical Sciences, Centre for Health Science, University of the Highlands and Islands, Inverness, United Kingdom
- *Correspondence: Richard Chahwan, ; Holger Husi,
| |
Collapse
|
9
|
Kress WJ, Soltis DE, Kersey PJ, Wegrzyn JL, Leebens-Mack JH, Gostel MR, Liu X, Soltis PS. Green plant genomes: What we know in an era of rapidly expanding opportunities. Proc Natl Acad Sci U S A 2022; 119:e2115640118. [PMID: 35042803 PMCID: PMC8795535 DOI: 10.1073/pnas.2115640118] [Citation(s) in RCA: 65] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Green plants play a fundamental role in ecosystems, human health, and agriculture. As de novo genomes are being generated for all known eukaryotic species as advocated by the Earth BioGenome Project, increasing genomic information on green land plants is essential. However, setting standards for the generation and storage of the complex set of genomes that characterize the green lineage of life is a major challenge for plant scientists. Such standards will need to accommodate the immense variation in green plant genome size, transposable element content, and structural complexity while enabling research into the molecular and evolutionary processes that have resulted in this enormous genomic variation. Here we provide an overview and assessment of the current state of knowledge of green plant genomes. To date fewer than 300 complete chromosome-scale genome assemblies representing fewer than 900 species have been generated across the estimated 450,000 to 500,000 species in the green plant clade. These genomes range in size from 12 Mb to 27.6 Gb and are biased toward agricultural crops with large branches of the green tree of life untouched by genomic-scale sequencing. Locating suitable tissue samples of most species of plants, especially those taxa from extreme environments, remains one of the biggest hurdles to increasing our genomic inventory. Furthermore, the annotation of plant genomes is at present undergoing intensive improvement. It is our hope that this fresh overview will help in the development of genomic quality standards for a cohesive and meaningful synthesis of green plant genomes as we scale up for the future.
Collapse
Affiliation(s)
- W John Kress
- National Museum of Natural History, Smithsonian Institution, Department of Botany, Washington, DC 20013-7012;
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755
- Arnold Arboretum, Harvard University, Boston, MA 02130
| | - Douglas E Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611
- Biodiversity Institute, University of Florida, Gainesville, FL 32611
- Department of Biology, University of Florida, Gainesville, FL 32611
| | - Paul J Kersey
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, United Kingdom
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, Institute for Systems Genomics: Computational Biology Core, University of Connecticut, Storrs, CT 06269-3214
| | - James H Leebens-Mack
- Department of Plant Biology, 2101 Miller Plant Sciences, University of Georgia, Athens, GA 30602-7271
| | - Morgan R Gostel
- Botanical Research Institute of Texas, Fort Worth, TX 76107-3400
| | - Xin Liu
- China National GeneBank, BGI-Shenzhen, Shenzhen 518120, China
| | - Pamela S Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611
- Biodiversity Institute, University of Florida, Gainesville, FL 32611
| |
Collapse
|
10
|
Dimonaco NJ, Aubrey W, Kenobi K, Clare A, Creevey CJ. No one tool to rule them all: prokaryotic gene prediction tool annotations are highly dependent on the organism of study. Bioinformatics 2021; 38:1198-1207. [PMID: 34875010 PMCID: PMC8825762 DOI: 10.1093/bioinformatics/btab827] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Revised: 11/13/2021] [Accepted: 12/02/2021] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION The biases in CoDing Sequence (CDS) prediction tools, which have been based on historic genomic annotations from model organisms, impact our understanding of novel genomes and metagenomes. This hinders the discovery of new genomic information as it results in predictions being biased towards existing knowledge. To date, users have lacked a systematic and replicable approach to identify the strengths and weaknesses of any CDS prediction tool and allow them to choose the right tool for their analysis. RESULTS We present an evaluation framework (ORForise) based on a comprehensive set of 12 primary and 60 secondary metrics that facilitate the assessment of the performance of CDS prediction tools. This makes it possible to identify which performs better for specific use-cases. We use this to assess 15 ab initio- and model-based tools representing those most widely used (historically and currently) to generate the knowledge in genomic databases. We find that the performance of any tool is dependent on the genome being analysed, and no individual tool ranked as the most accurate across all genomes or metrics analysed. Even the top-ranked tools produced conflicting gene collections, which could not be resolved by aggregation. The ORForise evaluation framework provides users with a replicable, data-led approach to make informed tool choices for novel genome annotations and for refining historical annotations. AVAILABILITY AND IMPLEMENTATION Code and datasets for reproduction and customisation are available at https://github.com/NickJD/ORForise. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nicholas J Dimonaco
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3PD, UK,To whom correspondence should be addressed.
| | - Wayne Aubrey
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, UK
| | - Kim Kenobi
- Department of Mathematics, Aberystwyth University, Aberystwyth SY23 3BZ, UK
| | - Amanda Clare
- Department of Computer Science, Aberystwyth University, Aberystwyth SY23 3DB, UK
| | | |
Collapse
|
11
|
Renn D, Shepard L, Vancea A, Karan R, Arold ST, Rueping M. Novel Enzymes From the Red Sea Brine Pools: Current State and Potential. Front Microbiol 2021; 12:732856. [PMID: 34777282 PMCID: PMC8578733 DOI: 10.3389/fmicb.2021.732856] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 10/05/2021] [Indexed: 11/23/2022] Open
Abstract
The Red Sea is a marine environment with unique chemical characteristics and physical topographies. Among the various habitats offered by the Red Sea, the deep-sea brine pools are the most extreme in terms of salinity, temperature and metal contents. Nonetheless, the brine pools host rich polyextremophilic bacterial and archaeal communities. These microbial communities are promising sources for various classes of enzymes adapted to harsh environments - extremozymes. Extremozymes are emerging as novel biocatalysts for biotechnological applications due to their ability to perform catalytic reactions under harsh biophysical conditions, such as those used in many industrial processes. In this review, we provide an overview of the extremozymes from different Red Sea brine pools and discuss the overall biotechnological potential of the Red Sea proteome.
Collapse
Affiliation(s)
- Dominik Renn
- KAUST Catalysis Center (KCC), Division of Physical Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
- Institute of Organic Chemistry, RWTH Aachen, Aachen, Germany
| | - Lera Shepard
- KAUST Catalysis Center (KCC), Division of Physical Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Alexandra Vancea
- Computational Bioscience Research Center (CBRC), Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Ram Karan
- KAUST Catalysis Center (KCC), Division of Physical Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Stefan T. Arold
- Computational Bioscience Research Center (CBRC), Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
- Centre de Biologie Structurale, CNRS, INSERM, Université de Montpellier, Montpellier, France
| | - Magnus Rueping
- KAUST Catalysis Center (KCC), Division of Physical Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
- Institute for Experimental Molecular Imaging (ExMI), University Clinic, RWTH Aachen, Aachen, Germany
| |
Collapse
|
12
|
Queirós P, Delogu F, Hickl O, May P, Wilmes P. Mantis: flexible and consensus-driven genome annotation. Gigascience 2021; 10:giab042. [PMID: 34076241 PMCID: PMC8170692 DOI: 10.1093/gigascience/giab042] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Revised: 03/22/2021] [Accepted: 05/14/2021] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The rapid development of the (meta-)omics fields has produced an unprecedented amount of high-resolution and high-fidelity data. Through the use of these datasets we can infer the role of previously functionally unannotated proteins from single organisms and consortia. In this context, protein function annotation can be described as the identification of regions of interest (i.e., domains) in protein sequences and the assignment of biological functions. Despite the existence of numerous tools, challenges remain in terms of speed, flexibility, and reproducibility. In the big data era, it is also increasingly important to cease limiting our findings to a single reference, coalescing knowledge from different data sources, and thus overcoming some limitations in overly relying on computationally generated data from single sources. RESULTS We implemented a protein annotation tool, Mantis, which uses database identifiers intersection and text mining to integrate knowledge from multiple reference data sources into a single consensus-driven output. Mantis is flexible, allowing for the customization of reference data and execution parameters, and is reproducible across different research goals and user environments. We implemented a depth-first search algorithm for domain-specific annotation, which significantly improved annotation performance compared to sequence-wide annotation. The parallelized implementation of Mantis results in short runtimes while also outputting high coverage and high-quality protein function annotations. CONCLUSIONS Mantis is a protein function annotation tool that produces high-quality consensus-driven protein annotations. It is easy to set up, customize, and use, scaling from single genomes to large metagenomes. Mantis is available under the MIT license at https://github.com/PedroMTQ/mantis.
Collapse
Affiliation(s)
- Pedro Queirós
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Francesco Delogu
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Oskar Hickl
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Patrick May
- Bioinformatics Core, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| | - Paul Wilmes
- Systems Ecology, Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 6 Avenue du Swing, 4367 Esch-sur-Alzette, Luxembourg
| |
Collapse
|
13
|
Abstract
Ribosomal proteins (RPs) are highly conserved across the bacterial and archaeal domains. Although many RPs are essential for survival, genome analysis demonstrates the absence of some RP genes in many bacterial and archaeal genomes. Furthermore, global transposon mutagenesis and/or targeted deletion showed that elimination of some RP genes had only a moderate effect on the bacterial growth rate. Here, we systematically analyze the evolutionary conservation of RPs in prokaryotes by compiling the list of the ribosomal genes that are missing from one or more genomes in the recently updated version of the Clusters of Orthologous Genes (COG) database. Some of these absences occurred because the respective genes carried frameshifts, presumably, resulting from sequencing errors, while others were overlooked and not translated during genome annotation. Apart from these annotation errors, we identified multiple genuine losses of RP genes in a variety of bacteria and archaea. Some of these losses are clade-specific, whereas others occur in symbionts and parasites with dramatically reduced genomes. The lists of computationally and experimentally defined non-essential ribosomal genes show a substantial overlap, revealing a common trend in prokaryote ribosome evolution that could be linked to the architecture and assembly of the ribosomes. Thus, RPs that are located at the surface of the ribosome and/or are incorporated at a late stage of ribosome assembly are more likely to be non-essential and to be lost during microbial evolution, particularly, in the course of genome compaction.IMPORTANCEIn many prokaryote genomes, one or more ribosomal protein (RP) genes are missing. Analysis of 1,309 prokaryote genomes included in the COG database shows that only about half of the RPs are universally conserved in bacteria and archaea. In contrast, up to 16 other RPs are missing in some genomes, primarily, tiny (<1 Mb) genomes of host-associated bacteria and archaea. Ten universal and nine archaea-specific ribosomal proteins show clear patterns of lineage-specific gene loss. Most of the RPs that are frequently lost from bacterial genomes are located on the ribosome periphery and are non-essential in Escherichia coli and Bacillus subtilis These results reveal general trends and common constraints in the architecture and evolution of ribosomes in prokaryotes.
Collapse
|
14
|
Ejigu GF, Jung J. Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing. BIOLOGY 2020; 9:E295. [PMID: 32962098 PMCID: PMC7565776 DOI: 10.3390/biology9090295] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 09/13/2020] [Accepted: 09/16/2020] [Indexed: 12/16/2022]
Abstract
Next-Generation Sequencing (NGS) has made it easier to obtain genome-wide sequence data and it has shifted the research focus into genome annotation. The challenging tasks involved in annotation rely on the currently available tools and techniques to decode the information contained in nucleotide sequences. This information will improve our understanding of general aspects of life and evolution and improve our ability to diagnose genetic disorders. Here, we present a summary of both structural and functional annotations, as well as the associated comparative annotation tools and pipelines. We highlight visualization tools that immensely aid the annotation process and the contributions of the scientific community to the annotation. Further, we discuss quality-control practices and the need for re-annotation, and highlight the future of annotation.
Collapse
Affiliation(s)
| | - Jaehee Jung
- Department of Information and Communication Engineering, Myongji University, Yongin-si 17058, Gyeonggi-do, Korea;
| |
Collapse
|
15
|
Belknap KC, Park CJ, Barth BM, Andam CP. Genome mining of biosynthetic and chemotherapeutic gene clusters in Streptomyces bacteria. Sci Rep 2020; 10:2003. [PMID: 32029878 PMCID: PMC7005152 DOI: 10.1038/s41598-020-58904-9] [Citation(s) in RCA: 135] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 01/22/2020] [Indexed: 01/05/2023] Open
Abstract
Streptomyces bacteria are known for their prolific production of secondary metabolites, many of which have been widely used in human medicine, agriculture and animal health. To guide the effective prioritization of specific biosynthetic gene clusters (BGCs) for drug development and targeting the most prolific producer strains, knowledge about phylogenetic relationships of Streptomyces species, genome-wide diversity and distribution patterns of BGCs is critical. We used genomic and phylogenetic methods to elucidate the diversity of major classes of BGCs in 1,110 publicly available Streptomyces genomes. Genome mining of Streptomyces reveals high diversity of BGCs and variable distribution patterns in the Streptomyces phylogeny, even among very closely related strains. The most common BGCs are non-ribosomal peptide synthetases, type 1 polyketide synthases, terpenes, and lantipeptides. We also found that numerous Streptomyces species harbor BGCs known to encode antitumor compounds. We observed that strains that are considered the same species can vary tremendously in the BGCs they carry, suggesting that strain-level genome sequencing can uncover high levels of BGC diversity and potentially useful derivatives of any one compound. These findings suggest that a strain-level strategy for exploring secondary metabolites for clinical use provides an alternative or complementary approach to discovering novel pharmaceutical compounds from microbes.
Collapse
Affiliation(s)
- Kaitlyn C Belknap
- University of New Hampshire, Department of Molecular, Cellular and Biomedical Sciences, Durham, NH, 03824, USA
| | - Cooper J Park
- University of New Hampshire, Department of Molecular, Cellular and Biomedical Sciences, Durham, NH, 03824, USA
| | - Brian M Barth
- University of New Hampshire, Department of Molecular, Cellular and Biomedical Sciences, Durham, NH, 03824, USA
| | - Cheryl P Andam
- University of New Hampshire, Department of Molecular, Cellular and Biomedical Sciences, Durham, NH, 03824, USA.
| |
Collapse
|
16
|
Magnowska Z, Jana B, Brochmann RP, Hesketh A, Lametsch R, De Gobba C, Guardabassi L. Carprofen-induced depletion of proton motive force reverses TetK-mediated doxycycline resistance in methicillin-resistant Staphylococcus pseudintermedius. Sci Rep 2019; 9:17834. [PMID: 31780689 PMCID: PMC6882848 DOI: 10.1038/s41598-019-54091-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Accepted: 10/31/2019] [Indexed: 11/09/2022] Open
Abstract
We previously showed that doxycycline (DOX) and carprofen (CPF), a veterinary non-steroidal anti-inflammatory drug, have synergistic antimicrobial activity against methicillin-resistant Staphylococcus pseudintermedius (MRSP) carrying the tetracycline resistance determinant TetK. To elucidate the molecular mechanism of this synergy, we investigated the effects of the two drugs, individually and in combination, using a comprehensive approach including RNA sequencing, two-dimensional differential in-gel electrophoresis, macromolecule biosynthesis assays and fluorescence spectroscopy. Exposure of TetK-positive MRSP to CPF alone resulted in upregulation of pathways that generate ATP and NADH, and promote the proton gradient. We showed that CPF is a proton carrier that dissipates the electrochemical potential of the membrane. In the presence of both CPF and DOX, the energy compensation strategy was attenuated by downregulation of all the processes involved, such as citric acid cycle, oxidative phosphorylation and ATP-providing arginine deiminase pathway. Furthermore, protein biosynthesis inhibition increased from 20% under DOX exposure alone to 75% upon simultaneous exposure to CPF. We conclude that synergistic interaction of the drugs restores DOX susceptibility in MRSP by compromising proton-motive-force-dependent TetK-mediated efflux of the antibiotic. MRSP is unable to counterbalance CPF-mediated PMF depletion by cellular metabolic adaptations, resulting in intracellular accumulation of DOX and inhibition of protein biosynthesis.
Collapse
Affiliation(s)
- Zofia Magnowska
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark.
| | - Bimal Jana
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Rikke Prejh Brochmann
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Andrew Hesketh
- Department of Biochemistry and Cambridge Systems Biology Centre, University of Cambridge, Cambridge, United Kingdom.,School of Pharmacy and Biomolecular Sciences, University of Brighton, Brighton, United Kingdom
| | - Rene Lametsch
- Department of Food Science, Faculty of Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Cristian De Gobba
- Department of Food Science, Faculty of Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Luca Guardabassi
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, Denmark. .,Department of Pathobiology and Population Sciences, The Royal Veterinary College, Hatfield, United Kingdom.
| |
Collapse
|
17
|
Mohanraj U, Wan X, Spruit CM, Skurnik M, Pajunen MI. A Toxicity Screening Approach to Identify Bacteriophage-Encoded Anti-Microbial Proteins. Viruses 2019; 11:E1057. [PMID: 31739448 PMCID: PMC6893735 DOI: 10.3390/v11111057] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 10/29/2019] [Accepted: 11/12/2019] [Indexed: 12/23/2022] Open
Abstract
The rapid emergence of antibiotic resistance among many pathogenic bacteria has created a profound need to discover new alternatives to antibiotics. Bacteriophages, the viruses of microbes, express special proteins to overtake the metabolism of the bacterial host they infect, the best known of which are involved in bacterial lysis. However, the functions of majority of bacteriophage encoded gene products are not known, i.e., they represent the hypothetical proteins of unknown function (HPUFs). In the current study we present a phage genomics-based screening approach to identify phage HPUFs with antibacterial activity with a long-term goal to use them as leads to find unknown targets to develop novel antibacterial compounds. The screening assay is based on the inhibition of bacterial growth when a toxic gene is expression-cloned into a plasmid vector. It utilizes an optimized plating assay producing a significant difference in the number of transformants after ligation of the toxic and non-toxic genes into a cloning vector. The screening assay was first tested and optimized using several known toxic and non-toxic genes. Then, it was applied to screen 94 HPUFs of bacteriophage φR1-RT, and identified four HPUFs that were toxic to Escherichia coli. This optimized assay is in principle useful in the search for bactericidal proteins of any phage, and also opens new possibilities to understanding the strategies bacteriophages use to overtake bacterial hosts.
Collapse
Affiliation(s)
- Ushanandini Mohanraj
- Department of Bacteriology and Immunology, Medicum, Human Microbiome Research Program, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland; (U.M.); (X.W.); (C.M.S.); (M.S.)
- Department of Virology, Medicum, University of Helsinki, 00290 Helsinki, Finland
| | - Xing Wan
- Department of Bacteriology and Immunology, Medicum, Human Microbiome Research Program, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland; (U.M.); (X.W.); (C.M.S.); (M.S.)
- Division Animal and Human Health Engineering, Kasteelpark Arenberg 21 - box 2462, 3001 Leuven, Belgium
| | - Cindy M. Spruit
- Department of Bacteriology and Immunology, Medicum, Human Microbiome Research Program, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland; (U.M.); (X.W.); (C.M.S.); (M.S.)
- Laboratory of Microbiology, Wageningen University and Research, 6708 WE Wageningen, The Netherlands
| | - Mikael Skurnik
- Department of Bacteriology and Immunology, Medicum, Human Microbiome Research Program, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland; (U.M.); (X.W.); (C.M.S.); (M.S.)
- Division of Clinical Microbiology, Helsinki University Hospital, HUSLAB, 00290 Helsinki, Finland
| | - Maria I. Pajunen
- Department of Bacteriology and Immunology, Medicum, Human Microbiome Research Program, Faculty of Medicine, University of Helsinki, 00290 Helsinki, Finland; (U.M.); (X.W.); (C.M.S.); (M.S.)
| |
Collapse
|
18
|
Vizán-Rico HI, Mayer C, Petersen M, McKenna DD, Zhou X, Gómez-Zurita J. Patterns and Constraints in the Evolution of Sperm Individualization Genes in Insects, with an Emphasis on Beetles. Genes (Basel) 2019; 10:E776. [PMID: 31590243 PMCID: PMC6826512 DOI: 10.3390/genes10100776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Revised: 09/20/2019] [Accepted: 10/01/2019] [Indexed: 11/17/2022] Open
Abstract
Gene expression profiles can change dramatically between sexes and sex bias may contribute specific macroevolutionary dynamics for sex-biased genes. However, these dynamics are poorly understood at large evolutionary scales due to the paucity of studies that have assessed orthology and functional homology for sex-biased genes and the pleiotropic effects possibly constraining their evolutionary potential. Here, we explore the correlation of sex-biased expression with macroevolutionary processes that are associated with sex-biased genes, including duplications and accelerated evolutionary rates. Specifically, we examined these traits in a group of 44 genes that orchestrate sperm individualization during spermatogenesis, with both unbiased and sex-biased expression. We studied these genes in the broad evolutionary framework of the Insecta, with a particular focus on beetles (order Coleoptera). We studied data mined from 119 insect genomes, including 6 beetle models, and from 19 additional beetle transcriptomes. For the subset of physically and/or genetically interacting proteins, we also analyzed how their network structure may condition the mode of gene evolution. The collection of genes was highly heterogeneous in duplication status, evolutionary rates, and rate stability, but there was statistical evidence for sex bias correlated with faster evolutionary rates, consistent with theoretical predictions. Faster rates were also correlated with clocklike (insect amino acids) and non-clocklike (beetle nucleotides) substitution patterns in these genes. Statistical associations (higher rates for central nodes) or lack thereof (centrality of duplicated genes) were in contrast to some current evolutionary hypotheses, highlighting the need for more research on these topics.
Collapse
Affiliation(s)
- Helena I. Vizán-Rico
- Animal Biodiversity and Evolution, Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain;
| | - Christoph Mayer
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113 Bonn, Germany; (C.M.); (M.P.)
| | - Malte Petersen
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, 53113 Bonn, Germany; (C.M.); (M.P.)
| | - Duane D. McKenna
- Center for Biodiversity Research, Department of Biological Sciences, University of Memphis, Memphis, TN 38152, USA;
| | - Xin Zhou
- Department of Entomology, College of Plant Protection, China Agricultural University, Beijing 100193, China;
| | - Jesús Gómez-Zurita
- Animal Biodiversity and Evolution, Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), 08003 Barcelona, Spain;
| |
Collapse
|
19
|
Santiago CRDN, Assis RDAB, Moreira LM, Digiampietri LA. Gene Tags Assessment by Comparative Genomics (GTACG): A User-Friendly Framework for Bacterial Comparative Genomics. Front Genet 2019; 10:725. [PMID: 31507629 PMCID: PMC6718126 DOI: 10.3389/fgene.2019.00725] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 07/10/2019] [Indexed: 12/04/2022] Open
Abstract
Genomics research has produced an exponential amount of data. However, the genetic knowledge pertaining to certain phenotypic characteristics is lacking. Also, a considerable part of these genomes have coding sequences (CDSs) with unknown functions, posing additional challenges to researchers. Phylogenetically close microorganisms share much of their CDSs, and certain phenotypes unique to a set of microorganisms may be the result of the genes found exclusively in those microorganisms. This study presents the GTACG framework, an easy-to-use tool for identifying in the subgroups of bacterial genomes whose microorganisms have common phenotypic characteristics, to find data that differentiates them from other associated genomes in a simple and fast way. The GTACG analysis is based on the formation of homologous CDS clusters from local alignments. The front-end is easy to use, and the installation packages have been developed to enable users lacking knowledge of programming languages or bioinformatics analyze high-throughput data using the tool. The validation of the GTACG framework has been carried out based on a case report involving a set of 161 genomes from the Xanthomonadaceae family, in which 19 families of orthologous proteins were found in 90% of the plant-associated genomes, allowing the identification of the proteins potentially associated with adaptation and virulence in plant tissue. The results show the potential use of GTACG in the search for new targets for molecular studies, and GTACG can be used as a research tool by biologists who lack advanced knowledge in the use of computational tools for bacterial comparative genomics.
Collapse
Affiliation(s)
| | - Renata de Almeida Barbosa Assis
- Biotecnology Graduate Program, Núcleo de Pesquisas em Ciências Biológicas, Federal University of Ouro Preto, Ouro Preto, Brazil
| | - Leandro Marcio Moreira
- Biotecnology Graduate Program, Núcleo de Pesquisas em Ciências Biológicas, Federal University of Ouro Preto, Ouro Preto, Brazil
- Department of Biological Sciences, Federal University of Ouro Preto, Ouro Preto, Brazil
| | - Luciano Antonio Digiampietri
- Bioinformatics Graduate Program, University of Sao Paulo, Sao Paulo, Brazil
- School of Arts, Science, and Humanities, University of Sao Paulo, Sao Paulo, Brazil
| |
Collapse
|
20
|
Sichtig H, Minogue T, Yan Y, Stefan C, Hall A, Tallon L, Sadzewicz L, Nadendla S, Klimke W, Hatcher E, Shumway M, Aldea DL, Allen J, Koehler J, Slezak T, Lovell S, Schoepp R, Scherf U. FDA-ARGOS is a database with public quality-controlled reference genomes for diagnostic use and regulatory science. Nat Commun 2019; 10:3313. [PMID: 31346170 PMCID: PMC6658474 DOI: 10.1038/s41467-019-11306-6] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Accepted: 07/02/2019] [Indexed: 02/08/2023] Open
Abstract
FDA proactively invests in tools to support innovation of emerging technologies, such as infectious disease next generation sequencing (ID-NGS). Here, we introduce FDA-ARGOS quality-controlled reference genomes as a public database for diagnostic purposes and demonstrate its utility on the example of two use cases. We provide quality control metrics for the FDA-ARGOS genomic database resource and outline the need for genome quality gap filling in the public domain. In the first use case, we show more accurate microbial identification of Enterococcus avium from metagenomic samples with FDA-ARGOS reference genomes compared to non-curated GenBank genomes. In the second use case, we demonstrate the utility of FDA-ARGOS reference genomes for Ebola virus target sequence comparison as part of a composite validation strategy for ID-NGS diagnostic tests. The use of FDA-ARGOS as an in silico target sequence comparator tool combined with representative clinical testing could reduce the burden for completing ID-NGS clinical trials. To be able to use infectious disease next generation sequencing as a diagnostic tool, appropriate reference datasets are required. Here, Sichtig et al. describe FDA-ARGOS, a reference database for high-quality microbial reference genomes, and demonstrate its utility on the example of two use cases.
Collapse
Affiliation(s)
- Heike Sichtig
- U.S. Food and Drug Administration, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA.
| | - Timothy Minogue
- U.S. Army Medical Research Institute of Infectious Diseases, 1425 Porter Street, Frederick, MD, 21702, USA.
| | - Yi Yan
- U.S. Food and Drug Administration, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
| | - Christopher Stefan
- U.S. Army Medical Research Institute of Infectious Diseases, 1425 Porter Street, Frederick, MD, 21702, USA
| | - Adrienne Hall
- U.S. Army Medical Research Institute of Infectious Diseases, 1425 Porter Street, Frederick, MD, 21702, USA
| | - Luke Tallon
- Institute for Genome Sciences at the University of Maryland, 670 W. Baltimore Street, Baltimore, MD, 21201, USA
| | - Lisa Sadzewicz
- Institute for Genome Sciences at the University of Maryland, 670 W. Baltimore Street, Baltimore, MD, 21201, USA
| | - Suvarna Nadendla
- Institute for Genome Sciences at the University of Maryland, 670 W. Baltimore Street, Baltimore, MD, 21201, USA
| | - William Klimke
- National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Eneida Hatcher
- National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Martin Shumway
- National Center for Biotechnology Information, National Library of Medicine, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | | | - Jonathan Allen
- Lawrence Livermore National Laboratory, P.O. Box 808, Livermore, CA, 94551, USA
| | - Jeffrey Koehler
- U.S. Army Medical Research Institute of Infectious Diseases, 1425 Porter Street, Frederick, MD, 21702, USA
| | - Tom Slezak
- Lawrence Livermore National Laboratory, P.O. Box 808, Livermore, CA, 94551, USA
| | - Stephen Lovell
- U.S. Food and Drug Administration, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
| | - Randal Schoepp
- U.S. Army Medical Research Institute of Infectious Diseases, 1425 Porter Street, Frederick, MD, 21702, USA
| | - Uwe Scherf
- U.S. Food and Drug Administration, 10903 New Hampshire Ave, Silver Spring, MD, 20993, USA
| |
Collapse
|
21
|
Louro R, Santos-Silva C, Nobre T. What is in a name? Terfezia classification revisited. Fungal Biol 2019; 123:267-273. [PMID: 30928035 DOI: 10.1016/j.funbio.2019.01.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 01/11/2019] [Indexed: 12/22/2022]
Abstract
Desert truffles (mycorrhizal hypogeous Ascomycota) are found in arid and semi-arid areas of the globe and have great ecological and economic importance. Terfezia is undoubtedly the most diversified of all desert truffle genera, but its taxonomy is far from resolved. Specifically, the large number of newly described species plus the high intraspecific morphological variability observed within some Terfezia lineages as rendered the use of molecular techniques mandatory for specimen's discrimination. But the subsequent increasing amount of sequence data produced also a huge number of undescribed taxa that required determination. We compiled and used the public available ITS data on Terfezia spp. on the custom-curated UNITE database to reconstruct the genus phylogeny. We found at least 17 distinct lineages within the genus and successfully resolved some of the more pressing taxonomic issues, namely the T. leptoderma/olbiensis complex and some misapplied synonymy. Based on this resolved phylogeny, and motivated by the recent new described species, we proposed an identification key to Terfezia genus highlighting the importance of morphological and ecological characterization.
Collapse
Affiliation(s)
- Rogério Louro
- Biology Department, Macromycology Laboratory, Instituto de Ciências Agrárias e Ambientais Mediterrânicas, University of Évora, Évora, Portugal.
| | - Celeste Santos-Silva
- Biology Department, Macromycology Laboratory, Instituto de Ciências Agrárias e Ambientais Mediterrânicas, University of Évora, Évora, Portugal.
| | - Tânia Nobre
- Instituto de Ciências Agrárias e Ambientais Mediterrânicas, University of Évora, Apartado 94, 7002-554, Évora, Portugal.
| |
Collapse
|
22
|
Harwood CR, Mouillon JM, Pohl S, Arnau J. Secondary metabolite production and the safety of industrially important members of the Bacillus subtilis group. FEMS Microbiol Rev 2018; 42:721-738. [PMID: 30053041 PMCID: PMC6199538 DOI: 10.1093/femsre/fuy028] [Citation(s) in RCA: 129] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Accepted: 07/17/2018] [Indexed: 11/14/2022] Open
Abstract
Members of the 'Bacillus subtilis group' include some of the most commercially important bacteria, used for the production of a wide range of industrial enzymes and fine biochemicals. Increasingly, group members have been developed for use as animal feed enhancers and antifungal biocontrol agents. The group has long been recognised to produce a range of secondary metabolites and, despite their long history of safe usage, this has resulted in an increased focus on their safety. Traditional methods used to detect the production of secondary metabolites and other potentially harmful compounds have relied on phenotypic tests. Such approaches are time consuming and, in some cases, lack specificity. Nowadays, accessibility to genome data and associated bioinformatical tools provides a powerful means for identifying gene clusters associated with the synthesis of secondary metabolites. This review focuses primarily on well-characterised strains of B. subtilis and B. licheniformis and their synthesis of non-ribosomally synthesised peptides and polyketides. Where known, the activities and toxicities of their secondary metabolites are discussed, together with the limitations of assays currently used to assess their toxicity. Finally, the regulatory framework under which such strains are authorised for use in the production of food and feed enzymes is also reviewed.
Collapse
Affiliation(s)
- Colin R Harwood
- Centre for Bacterial Cell Biology, Institute for Cell and Molecular Biology, Newcastle University, Newcastle upon Tyne NE2 4AX, UK
| | - Jean-Marie Mouillon
- Department of Fungal Strain Technology and Strain Approval Support, Novozymes A/S, Krogshoevej 36, DK-2880 Bagsvaerd, Denmark
| | - Susanne Pohl
- Centre for Bacterial Cell Biology, Institute for Cell and Molecular Biology, Newcastle University, Newcastle upon Tyne NE2 4AX, UK
| | - José Arnau
- Department of Fungal Strain Technology and Strain Approval Support, Novozymes A/S, Krogshoevej 36, DK-2880 Bagsvaerd, Denmark
| |
Collapse
|
23
|
Salazar AN, Abeel T. Approximate, simultaneous comparison of microbial genome architectures via syntenic anchoring of quiver representations. Bioinformatics 2018; 34:i732-i742. [PMID: 30423098 PMCID: PMC6129293 DOI: 10.1093/bioinformatics/bty614] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Motivation A long-standing limitation in comparative genomic studies is the dependency on a reference genome, which hinders the spectrum of genetic diversity that can be identified across a population of organisms. This is especially true in the microbial world where genome architectures can significantly vary. There is therefore a need for computational methods that can simultaneously analyze the architectures of multiple genomes without introducing bias from a reference. Results In this article, we present Ptolemy: a novel method for studying the diversity of genome architectures-such as structural variation and pan-genomes-across a collection of microbial assemblies without the need of a reference. Ptolemy is a 'top-down' approach to compare whole genome assemblies. Genomes are represented as labeled multi-directed graphs-known as quivers-which are then merged into a single, canonical quiver by identifying 'gene anchors' via synteny analysis. The canonical quiver represents an approximate, structural alignment of all genomes in a given collection encoding structural variation across (sub-) populations within the collection. We highlight various applications of Ptolemy by analyzing structural variation and the pan-genomes of different datasets composing of Mycobacterium, Saccharomyces, Escherichia and Shigella species. Our results show that Ptolemy is flexible and can handle both conserved and highly dynamic genome architectures. Ptolemy is user-friendly-requires only FASTA-formatted assembly along with a corresponding GFF-formatted file-and resource-friendly-can align 24 genomes in ∼10 mins with four CPUs and <2 GB of RAM. Availability and implementation Github: https://github.com/AbeelLab/ptolemy. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex N Salazar
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Thomas Abeel
- Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
24
|
Wylezich C, Papa A, Beer M, Höper D. A Versatile Sample Processing Workflow for Metagenomic Pathogen Detection. Sci Rep 2018; 8:13108. [PMID: 30166611 PMCID: PMC6117295 DOI: 10.1038/s41598-018-31496-1] [Citation(s) in RCA: 87] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Accepted: 08/16/2018] [Indexed: 11/09/2022] Open
Abstract
Metagenomics is currently the only generic method for pathogen detection. Starting from RNA allows the assessment of the whole sample community including RNA viruses. Here we present our modular concerted protocol for sample processing for diagnostic metagenomics analysis of human, animal, and food samples. The workflow does not rely on dedicated amplification steps at any stage in the process and, in contrast to published methods, libraries prepared accordingly will yield only minute amounts of unclassifiable reads. We confirmed the performance of the approach using a spectrum of pathogen/matrix-combinations showing it has the potential to become a commonly usable analytical framework.
Collapse
Affiliation(s)
- Claudia Wylezich
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut (FLI), 17493, Greifswald-Insel Riems, Germany.
| | - Anna Papa
- Department of Microbiology, Medical School, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
| | - Martin Beer
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut (FLI), 17493, Greifswald-Insel Riems, Germany
| | - Dirk Höper
- Institute of Diagnostic Virology, Friedrich-Loeffler-Institut (FLI), 17493, Greifswald-Insel Riems, Germany.
| |
Collapse
|
25
|
Rippin M, Borchhardt N, Williams L, Colesie C, Jung P, Büdel B, Karsten U, Becker B. Genus richness of microalgae and Cyanobacteria in biological soil crusts from Svalbard and Livingston Island: morphological versus molecular approaches. Polar Biol 2018. [DOI: 10.1007/s00300-018-2252-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
26
|
Thomas CM, Thomson NR, Cerdeño-Tárraga AM, Brown CJ, Top EM, Frost LS. Annotation of plasmid genes. Plasmid 2017; 91:61-67. [PMID: 28365184 DOI: 10.1016/j.plasmid.2017.03.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Accepted: 03/23/2017] [Indexed: 10/19/2022]
Abstract
Good annotation of plasmid genomes is essential to maximise the value of the rapidly increasing volume of plasmid sequences. This short review highlights some of the current issues and suggests some ways forward. Where a well-studied related plasmid system exists we recommend that new annotation adheres to the convention already established for that system, so long as it is based on sound principles and solid experimental evidence, even if some of the new genes are more similar to homologues in different systems. Where a well-established model does not exist we provide generic gene names that reflect likely biochemical activity rather than overall purpose particularly, for example, where genes clearly belong to a type IV secretion system but it is not known whether they function in conjugative transfer or virulence. We also recommend that annotators use a whole system naming approach to avoid ending up with an illogical mixture of names from other systems based on the highest scoring match from a BLAST search. In addition, where function has not been experimentally established we recommend using just the locus tag, rather than a function-related gene name, while recording possible functions as notes rather than in a provisional name.
Collapse
Affiliation(s)
- Christopher M Thomas
- School of Biosciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK.
| | | | | | - Celeste J Brown
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844-3051, United States
| | - Eva M Top
- Department of Biological Sciences, University of Idaho, Moscow, ID 83844-3051, United States
| | - Laura S Frost
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta T6G 2E9, Canada
| |
Collapse
|
27
|
Zallot R, Harrison KJ, Kolaczkowski B, de Crécy-Lagard V. Functional Annotations of Paralogs: A Blessing and a Curse. Life (Basel) 2016; 6:life6030039. [PMID: 27618105 PMCID: PMC5041015 DOI: 10.3390/life6030039] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Revised: 08/29/2016] [Accepted: 09/02/2016] [Indexed: 12/15/2022] Open
Abstract
Gene duplication followed by mutation is a classic mechanism of neofunctionalization, producing gene families with functional diversity. In some cases, a single point mutation is sufficient to change the substrate specificity and/or the chemistry performed by an enzyme, making it difficult to accurately separate enzymes with identical functions from homologs with different functions. Because sequence similarity is often used as a basis for assigning functional annotations to genes, non-isofunctional gene families pose a great challenge for genome annotation pipelines. Here we describe how integrating evolutionary and functional information such as genome context, phylogeny, metabolic reconstruction and signature motifs may be required to correctly annotate multifunctional families. These integrative analyses can also lead to the discovery of novel gene functions, as hints from specific subgroups can guide the functional characterization of other members of the family. We demonstrate how careful manual curation processes using comparative genomics can disambiguate subgroups within large multifunctional families and discover their functions. We present the COG0720 protein family as a case study. We also discuss strategies to automate this process to improve the accuracy of genome functional annotation pipelines.
Collapse
Affiliation(s)
- Rémi Zallot
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Katherine J Harrison
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Bryan Kolaczkowski
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| | - Valérie de Crécy-Lagard
- Department of Microbiology and Cell Science, Institute of Food and Agricultural Sciences, University of Florida, Gainesville, FL 32611, USA.
| |
Collapse
|
28
|
Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, Lomsadze A, Pruitt KD, Borodovsky M, Ostell J. NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 2016; 44:6614-24. [PMID: 27342282 PMCID: PMC5001611 DOI: 10.1093/nar/gkw569] [Citation(s) in RCA: 4882] [Impact Index Per Article: 542.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Revised: 06/08/2016] [Accepted: 06/13/2016] [Indexed: 12/01/2022] Open
Abstract
Recent technological advances have opened unprecedented opportunities for large-scale sequencing and analysis of populations of pathogenic species in disease outbreaks, as well as for large-scale diversity studies aimed at expanding our knowledge across the whole domain of prokaryotes. To meet the challenge of timely interpretation of structure, function and meaning of this vast genetic information, a comprehensive approach to automatic genome annotation is critically needed. In collaboration with Georgia Tech, NCBI has developed a new approach to genome annotation that combines alignment based methods with methods of predicting protein-coding and RNA genes and other functional elements directly from sequence. A new gene finding tool, GeneMarkS+, uses the combined evidence of protein and RNA placement by homology as an initial map of annotation to generate and modify ab initio gene predictions across the whole genome. Thus, the new NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) relies more on sequence similarity when confident comparative data are available, while it relies more on statistical predictions in the absence of external evidence. The pipeline provides a framework for generation and analysis of annotation on the full breadth of prokaryotic taxonomy. For additional information on PGAP see https://www.ncbi.nlm.nih.gov/genome/annotation_prok/ and the NCBI Handbook, https://www.ncbi.nlm.nih.gov/books/NBK174280/.
Collapse
Affiliation(s)
- Tatiana Tatusova
- National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USA
| | - Michael DiCuccio
- National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USA
| | - Azat Badretdin
- National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USA
| | - Vyacheslav Chetvernin
- National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USA
| | - Eric P Nawrocki
- National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USA
| | - Leonid Zaslavsky
- National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USA
| | - Alexandre Lomsadze
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech, Atlanta, GA 30332, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USA
| | - Mark Borodovsky
- Wallace H. Coulter Department of Biomedical Engineering, Georgia Tech, Atlanta, GA 30332, USA School of Computational Science and Engineering, Georgia Tech, Atlanta, GA 30332, USA
| | - James Ostell
- National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD 20894, USA
| |
Collapse
|
29
|
Andreevskaya M, Johansson P, Jääskeläinen E, Rämö T, Ritari J, Paulin L, Björkroth J, Auvinen P. Lactobacillus oligofermentans glucose, ribose and xylose transcriptomes show higher similarity between glucose and xylose catabolism-induced responses in the early exponential growth phase. BMC Genomics 2016; 17:539. [PMID: 27487841 PMCID: PMC4972977 DOI: 10.1186/s12864-016-2840-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 06/15/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Lactobacillus oligofermentans has been mostly isolated from cold-stored packaged meat products in connection with their spoilage, but its precise role in meat spoilage is unknown. It belongs to the L. vaccinostercus group of obligate heterofermentative lactobacilli that generally ferment pentoses (e.g. xylose and ribose) more efficiently than hexoses (e.g. glucose). However, more efficient hexose utilization can be induced. The regulation mechanisms of the carbohydrate catabolism in such bacteria have been scarcely studied. To address this question, we provided the complete genome sequence of L. oligofermentans LMG 22743(T) and generated time course transcriptomes during its growth on glucose, ribose and xylose. RESULTS The genome was manually annotated and its main functional features were examined. L. oligofermentans was confirmed to be able to efficiently utilize several hexoses and maltose, which is, presumably, induced by its repeated cultivation with glucose in vitro. Unexpectedly, in the beginning of the exponential growth phase, glucose- and xylose-induced transcriptome responses were more similar, whereas toward the end of the growth phase xylose and ribose transcriptomes became more alike. The promoter regions of genes simultaneously upregulated both on glucose and xylose in comparison with ribose (particularly, hexose and xylose utilization genes) were found to be enriched in the CcpA- binding site. Transcriptionally, no glucose-induced carbon catabolite repression was detected. The catabolism of glucose, which requires initial oxidation, led to significant overexpression of the NAD(P)H re-oxidation genes, the upstream regions of which were found to contain a motif, which was highly similar to a Rex repressor binding site. CONCLUSIONS This paper presents the second complete genome and the first study of carbohydrate catabolism-dependent transcriptome response for a member of the L. vaccinostercus group. The transcriptomic changes detected in L. oligofermentans for growth with different carbohydrates differ significantly from those of facultative heterofermentative lactobacilli. The mechanism of CcpA regulation, putatively contributing to the observed similarities between glucose- and xylose-induced transcriptome responses and the absence of stringent carbon catabolite control, requires further studies. Finally, the cell redox balance maintenance, in terms of the NAD(P)+/NAD(P)H ratio, was predicted to be regulated by the Rex transcriptional regulator, supporting the previously made inference of Rex-regulons for members of the Lactobacillaceae family.
Collapse
Affiliation(s)
| | - Per Johansson
- Department of Food Hygiene and Environmental Health, University of Helsinki, Helsinki, Finland
| | - Elina Jääskeläinen
- Department of Food Hygiene and Environmental Health, University of Helsinki, Helsinki, Finland
| | - Tanja Rämö
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
- Present Address: The National Bureau of Investigation, Vantaa, Finland
| | - Jarmo Ritari
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
- Present Address: Finnish Red Cross Blood Service, Helsinki, Finland
| | - Lars Paulin
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Johanna Björkroth
- Department of Food Hygiene and Environmental Health, University of Helsinki, Helsinki, Finland
| | - Petri Auvinen
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| |
Collapse
|
30
|
Nobre T, Campos MD, Lucic-Mercy E, Arnholdt-Schmitt B. Misannotation Awareness: A Tale of Two Gene-Groups. FRONTIERS IN PLANT SCIENCE 2016; 7:868. [PMID: 27379147 PMCID: PMC4909761 DOI: 10.3389/fpls.2016.00868] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Accepted: 06/02/2016] [Indexed: 06/01/2023]
Abstract
Incorrectly or simply not annotated data is largely increasing in most public databases, undoubtedly caused by the rise in sequence data and the more recent boom of genomic projects. Molecular biologists and bioinformaticists should join efforts to tackle this issue. Practical challenges have been experienced when studying the alternative oxidase (AOX) gene family, and hence the motivation for the present work. Commonly used databases were screened for their capacity to distinguish AOX from the plastid terminal oxidase (also called plastoquinol terminal oxidase; PTOX) and we put forward a simple approach, based on amino acids signatures, that unequivocally distinguishes these gene families. Further, available sequence data on the AOX family in plants was carefully revised to: (1) confirm the classification as AOX and (2) identify to which AOX family member they belong to. We bring forward the urgent need of misannotation awareness and re-annotation of public AOX sequences by highlighting different types of misclassifications and the large under-estimation of data availability.
Collapse
Affiliation(s)
- Tania Nobre
- EU Marie Curie Chair, Instituto de Ciências Agrárias e Ambientais Mediterrânicas, Universidade de ÉvoraÉvora, Portugal
| | - M. Doroteia Campos
- EU Marie Curie Chair, Instituto de Ciências Agrárias e Ambientais Mediterrânicas, Universidade de ÉvoraÉvora, Portugal
| | | | - Birgit Arnholdt-Schmitt
- EU Marie Curie Chair, Instituto de Ciências Agrárias e Ambientais Mediterrânicas, Universidade de ÉvoraÉvora, Portugal
| |
Collapse
|
31
|
Andreevskaya M, Hultman J, Johansson P, Laine P, Paulin L, Auvinen P, Björkroth J. Complete genome sequence of Leuconostoc gelidum subsp. gasicomitatum KG16-1, isolated from vacuum-packaged vegetable sausages. Stand Genomic Sci 2016; 11:40. [PMID: 27274361 PMCID: PMC4895993 DOI: 10.1186/s40793-016-0164-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Accepted: 05/31/2016] [Indexed: 11/10/2022] Open
Abstract
Leuconostoc gelidum subsp. gasicomitatum is a predominant lactic acid bacterium (LAB) in spoilage microbial communities of different kinds of modified-atmosphere packaged (MAP) food products. So far, only one genome sequence of a poultry-originating type strain of this bacterium (LMG 18811(T)) has been available. In the current study, we present the completely sequenced and functionally annotated genome of strain KG16-1 isolated from a vegetable-based product. In addition, six other vegetable-associated strains were sequenced to study possible "niche" specificity suggested by recent multilocus sequence typing. The genome of strain KG16-1 consisted of one circular chromosome and three plasmids, which together contained 2,035 CDSs. The chromosome carried at least three prophage regions and one of the plasmids encoded a galactan degradation cluster, which might provide a survival advantage in plant-related environments. The genome comparison with LMG 18811(T) and six other vegetable strains suggests no major differences between the meat- and vegetable-associated strains that would explain their "niche" specificity. Finally, the comparison with the genomes of other leuconostocs highlights the distribution of functionally interesting genes across the L. gelidum strains and the genus Leuconostoc.
Collapse
Affiliation(s)
- Margarita Andreevskaya
- Institute of Biotechnology, University of Helsinki, Viikinkaari 5D, 00790 Helsinki, Finland
| | - Jenni Hultman
- Department of Food Hygiene and Environmental Health, University of Helsinki, Agnes Sjöbergin katu 2, 00790 Helsinki, Finland
| | - Per Johansson
- Department of Food Hygiene and Environmental Health, University of Helsinki, Agnes Sjöbergin katu 2, 00790 Helsinki, Finland
| | - Pia Laine
- Institute of Biotechnology, University of Helsinki, Viikinkaari 5D, 00790 Helsinki, Finland
| | - Lars Paulin
- Institute of Biotechnology, University of Helsinki, Viikinkaari 5D, 00790 Helsinki, Finland
| | - Petri Auvinen
- Institute of Biotechnology, University of Helsinki, Viikinkaari 5D, 00790 Helsinki, Finland
| | - Johanna Björkroth
- Department of Food Hygiene and Environmental Health, University of Helsinki, Agnes Sjöbergin katu 2, 00790 Helsinki, Finland
| |
Collapse
|
32
|
Lugli GA, Milani C, Mancabelli L, van Sinderen D, Ventura M. MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation. FEMS Microbiol Lett 2016; 363:fnw049. [DOI: 10.1093/femsle/fnw049] [Citation(s) in RCA: 79] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/24/2016] [Indexed: 12/18/2022] Open
|
33
|
A Comprehensive Review of Emerging Computational Methods for Gene Identification. JOURNAL OF INFORMATION PROCESSING SYSTEMS 2016. [DOI: 10.3745/jips.04.0023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
34
|
Kalkatawi M, Alam I, Bajic VB. BEACON: automated tool for Bacterial GEnome Annotation ComparisON. BMC Genomics 2015; 16:616. [PMID: 26283419 PMCID: PMC4539851 DOI: 10.1186/s12864-015-1826-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 08/07/2015] [Indexed: 11/25/2022] Open
Abstract
Background Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). Results The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON’s utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27 %, while the number of genes without any function assignment is reduced. Conclusions We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1826-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Manal Kalkatawi
- Computational Bioscience Research Centre (CBRC), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Kingdom of Saudi Arabia.
| | - Intikhab Alam
- Computational Bioscience Research Centre (CBRC), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Kingdom of Saudi Arabia.
| | - Vladimir B Bajic
- Computational Bioscience Research Centre (CBRC), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Kingdom of Saudi Arabia.
| |
Collapse
|
35
|
Roer R, Abehsera S, Sagi A. Exoskeletons across the Pancrustacea: Comparative Morphology, Physiology, Biochemistry and Genetics. Integr Comp Biol 2015; 55:771-91. [DOI: 10.1093/icb/icv080] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|
36
|
Genome Sequence and Transcriptome Analysis of Meat-Spoilage-Associated Lactic Acid Bacterium Lactococcus piscium MKFS47. Appl Environ Microbiol 2015; 81:3800-11. [PMID: 25819958 DOI: 10.1128/aem.00320-15] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2015] [Accepted: 03/23/2015] [Indexed: 11/20/2022] Open
Abstract
Lactococcus piscium is a psychrotrophic lactic acid bacterium and is known to be one of the predominant species within spoilage microbial communities in cold-stored packaged foods, particularly in meat products. Its presence in such products has been associated with the formation of buttery and sour off-odors. Nevertheless, the spoilage potential of L. piscium varies dramatically depending on the strain and growth conditions. Additional knowledge about the genome is required to explain such variation, understand its phylogeny, and study gene functions. Here, we present the complete and annotated genomic sequence of L. piscium MKFS47, combined with a time course analysis of the glucose catabolism-based transcriptome. In addition, a comparative analysis of gene contents was done for L. piscium MKFS47 and 29 other lactococci, revealing three distinct clades within the genus. The genome of L. piscium MKFS47 consists of one chromosome, carrying 2,289 genes, and two plasmids. A wide range of carbohydrates was predicted to be fermented, and growth on glycerol was observed. Both carbohydrate and glycerol catabolic pathways were significantly upregulated in the course of time as a result of glucose exhaustion. At the same time, differential expression of the pyruvate utilization pathways, implicated in the formation of spoilage substances, switched the metabolism toward a heterofermentative mode. In agreement with data from previous inoculation studies, L. piscium MKFS47 was identified as an efficient producer of buttery-odor compounds under aerobic conditions. Finally, genes and pathways that may contribute to increased survival in meat environments were considered.
Collapse
|
37
|
Kuhn JH, Andersen KG, Bào Y, Bavari S, Becker S, Bennett RS, Bergman NH, Blinkova O, Bradfute S, Brister JR, Bukreyev A, Chandran K, Chepurnov AA, Davey RA, Dietzgen RG, Doggett NA, Dolnik O, Dye JM, Enterlein S, Fenimore PW, Formenty P, Freiberg AN, Garry RF, Garza NL, Gire SK, Gonzalez JP, Griffiths A, Happi CT, Hensley LE, Herbert AS, Hevey MC, Hoenen T, Honko AN, Ignatyev GM, Jahrling PB, Johnson JC, Johnson KM, Kindrachuk J, Klenk HD, Kobinger G, Kochel TJ, Lackemeyer MG, Lackner DF, Leroy EM, Lever MS, Mühlberger E, Netesov SV, Olinger GG, Omilabu SA, Palacios G, Panchal RG, Park DJ, Patterson JL, Paweska JT, Peters CJ, Pettitt J, Pitt L, Radoshitzky SR, Ryabchikova EI, Saphire EO, Sabeti PC, Sealfon R, Shestopalov AM, Smither SJ, Sullivan NJ, Swanepoel R, Takada A, Towner JS, van der Groen G, Volchkov VE, Volchkova VA, Wahl-Jensen V, Warren TK, Warfield KL, Weidmann M, Nichol ST. Filovirus RefSeq entries: evaluation and selection of filovirus type variants, type sequences, and names. Viruses 2014; 6:3663-3682. [PMID: 25256396 PMCID: PMC4189044 DOI: 10.3390/v6093663] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Accepted: 09/23/2014] [Indexed: 12/14/2022] Open
Abstract
Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information's (NCBI's) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [ ()////-], we report consensus decisions from a majority of past and currently active filovirus experts on the eight filovirus type variants and isolates to be represented in RefSeq, their final designations, and their associated sequences.
Collapse
Affiliation(s)
- Jens H Kuhn
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Kristian G Andersen
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Yīmíng Bào
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Sina Bavari
- United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
| | - Stephan Becker
- Institut für Virologie, Philipps-Universität Marburg, 35043 Marburg, Germany.
| | - Richard S Bennett
- National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
| | - Nicholas H Bergman
- National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
| | - Olga Blinkova
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | - J Rodney Brister
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Alexander Bukreyev
- Department of Pathology and Galveston National Laboratory, University of Texas Medical Branch, Galveston, TX 77555, USA.
| | - Kartik Chandran
- Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, NY 10461, USA.
| | - Alexander A Chepurnov
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Robert A Davey
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Ralf G Dietzgen
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Norman A Doggett
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Olga Dolnik
- Institut für Virologie, Philipps-Universität Marburg, 35043 Marburg, Germany.
| | - John M Dye
- United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
| | - Sven Enterlein
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Paul W Fenimore
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Pierre Formenty
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Alexander N Freiberg
- Department of Pathology and Galveston National Laboratory, University of Texas Medical Branch, Galveston, TX 77555, USA.
| | - Robert F Garry
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Nicole L Garza
- United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
| | - Stephen K Gire
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Jean-Paul Gonzalez
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA. :
| | - Anthony Griffiths
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Christian T Happi
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Lisa E Hensley
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Andrew S Herbert
- United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
| | - Michael C Hevey
- National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
| | - Thomas Hoenen
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Anna N Honko
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Georgy M Ignatyev
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Peter B Jahrling
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Joshua C Johnson
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Karl M Johnson
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Jason Kindrachuk
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Hans-Dieter Klenk
- Institut für Virologie, Philipps-Universität Marburg, 35043 Marburg, Germany.
| | - Gary Kobinger
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Tadeusz J Kochel
- National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
| | - Matthew G Lackemeyer
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Daniel F Lackner
- National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
| | - Eric M Leroy
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Mark S Lever
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Elke Mühlberger
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Sergey V Netesov
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Gene G Olinger
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Sunday A Omilabu
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Gustavo Palacios
- United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
| | - Rekha G Panchal
- United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
| | - Daniel J Park
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Jean L Patterson
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Janusz T Paweska
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Clarence J Peters
- Department of Pathology and Galveston National Laboratory, University of Texas Medical Branch, Galveston, TX 77555, USA.
| | - James Pettitt
- Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
| | - Louise Pitt
- United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
| | - Sheli R Radoshitzky
- United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
| | - Elena I Ryabchikova
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Erica Ollmann Saphire
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Pardis C Sabeti
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Rachel Sealfon
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | - Sophie J Smither
- FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
| | - Nancy J Sullivan
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Robert Swanepoel
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Ayato Takada
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Jonathan S Towner
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Guido van der Groen
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Viktor E Volchkov
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Valentina A Volchkova
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Victoria Wahl-Jensen
- National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
| | - Travis K Warren
- United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
| | - Kelly L Warfield
- Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | - Manfred Weidmann
- United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
| | - Stuart T Nichol
- IViral Special Pathogens Branch, Division of High-Consequence Pathogens Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA.
| |
Collapse
|
38
|
Apagyi KI, Ellington MJ. A survey of metallo-β-lactamase sequence accuracy before the data deluge. J Antimicrob Chemother 2014; 69:3431-5. [PMID: 25085656 DOI: 10.1093/jac/dku284] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Katinka I Apagyi
- Public Health England, Clinical Microbiology and Public Health Laboratory, Addenbrooke's Hospital, Hills Road, Cambridge CB2 0QQ, UK
| | - Matthew J Ellington
- Public Health England, Clinical Microbiology and Public Health Laboratory, Addenbrooke's Hospital, Hills Road, Cambridge CB2 0QQ, UK
| |
Collapse
|
39
|
Poux S, Magrane M, Arighi CN, Bridge A, O'Donovan C, Laiho K. Expert curation in UniProtKB: a case study on dealing with conflicting and erroneous data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2014; 2014:bau016. [PMID: 24622611 PMCID: PMC3950660 DOI: 10.1093/database/bau016] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
UniProtKB/Swiss-Prot provides expert curation with information extracted from literature and curator-evaluated computational analysis. As knowledgebases continue to play an increasingly important role in scientific research, a number of studies have evaluated their accuracy and revealed various errors. While some are curation errors, others are the result of incorrect information published in the scientific literature. By taking the example of sirtuin-5, a complex annotation case, we will describe the curation procedure of UniProtKB/Swiss-Prot and detail how we report conflicting information in the database. We will demonstrate the importance of collaboration between resources to ensure curation consistency and the value of contributions from the user community in helping maintain error-free resources. Database URL:www.uniprot.org
Collapse
Affiliation(s)
- Sylvain Poux
- SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland, European Molecular Biology Laboratory (EMBL), European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Protein Information Resource, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE 19711, USA and Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven Street North West, Suite 1200, Washington, DC 20007, USA
| | | | | | | | | | | | | |
Collapse
|
40
|
Stubben CJ, Challacombe JF. Mining locus tags in PubMed Central to improve microbial gene annotation. BMC Bioinformatics 2014; 15:43. [PMID: 24499370 PMCID: PMC3937057 DOI: 10.1186/1471-2105-15-43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2013] [Accepted: 01/18/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The scientific literature contains millions of microbial gene identifiers within the full text and tables, but these annotations rarely get incorporated into public sequence databases. We propose to utilize the Open Access (OA) subset of PubMed Central (PMC) as a gene annotation database and have developed an R package called pmcXML to automatically mine and extract locus tags from full text, tables and supplements. RESULTS We mined locus tags from 1835 OA publications in ten microbial genomes and extracted tags mentioned in 30,891 sentences in main text and 20,489 rows in tables. We identified locus tag pairs marking the start and end of a region such as an operon or genomic island and expanded these ranges to add another 13,043 tags. We also searched for locus tags in supplementary tables and publications outside the OA subset in Burkholderia pseudomallei K96243 for comparison. There were 168 publications containing 48,470 locus tags and 83% of mentions were from supplementary materials and 9% from publications outside the OA subset. CONCLUSIONS B. pseudomallei locus tags within the full text and tables of OA publications represent only a small fraction of the total mentions in the literature. For microbial genomes with very few functionally characterized proteins, the locus tags mentioned in supplementary tables and within ranges like genomic islands contain the majority of locus tags. Significantly, the functions in the R package provide access to additional resources in the OA subset that are not currently indexed or returned by searching PMC.
Collapse
Affiliation(s)
- Chris J Stubben
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Jean F Challacombe
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| |
Collapse
|
41
|
Frazee AC, Sabunciyan S, Hansen KD, Irizarry RA, Leek JT. Differential expression analysis of RNA-seq data at single-base resolution. Biostatistics 2014; 15:413-26. [PMID: 24398039 PMCID: PMC4059460 DOI: 10.1093/biostatistics/kxt053] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
RNA-sequencing (RNA-seq) is a flexible technology for measuring genome-wide expression that is rapidly replacing microarrays as costs become comparable. Current differential expression analysis methods for RNA-seq data fall into two broad classes: (1) methods that quantify expression within the boundaries of genes previously published in databases and (2) methods that attempt to reconstruct full length RNA transcripts. The first class cannot discover differential expression outside of previously known genes. While the second approach does possess discovery capabilities, statistical analysis of differential expression is complicated by the ambiguity and variability incurred while assembling transcripts and estimating their abundances. Here, we propose a novel method that first identifies differentially expressed regions (DERs) of interest by assessing differential expression at each base of the genome. The method then segments the genome into regions comprised of bases showing similar differential expression signal, and then assigns a measure of statistical significance to each region. Optionally, DERs can be annotated using a reference database of genomic features. We compare our approach with leading competitors from both current classes of differential expression methods and highlight the strengths and weaknesses of each. A software implementation of our method is available on github (https://github.com/alyssafrazee/derfinder).
Collapse
Affiliation(s)
- Alyssa C Frazee
- Department of Biostatistics, The Johns Hopkins University Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA
| | - Sarven Sabunciyan
- Department of Pediatrics, The Johns Hopkins University School of Medicine, 600 North Wolfe Street, Baltimore, MD 21287, USA
| | - Kasper D Hansen
- Department of Biostatistics, The Johns Hopkins University Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA
| | - Rafael A Irizarry
- Department of Biostatistics, The Johns Hopkins University Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA
| | - Jeffrey T Leek
- Department of Biostatistics, The Johns Hopkins University Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA
| |
Collapse
|
42
|
Krishnakumar S, Durai DA, Wangikar PP, Viswanathan GA. SHARP: genome-scale identification of gene-protein-reaction associations in cyanobacteria. PHOTOSYNTHESIS RESEARCH 2013; 118:181-190. [PMID: 23975204 DOI: 10.1007/s11120-013-9910-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Accepted: 08/07/2013] [Indexed: 06/02/2023]
Abstract
Genome scale metabolic model provides an overview of an organism's metabolic capability. These genome-specific metabolic reconstructions are based on identification of gene to protein to reaction (GPR) associations and, in turn, on homology with annotated genes from other organisms. Cyanobacteria are photosynthetic prokaryotes which have diverged appreciably from their nonphotosynthetic counterparts. They also show significant evolutionary divergence from plants, which are well studied for their photosynthetic apparatus. We argue that context-specific sequence and domain similarity can add to the repertoire of the GPR associations and significantly expand our view of the metabolic capability of cyanobacteria. We took an approach that combines the results of context-specific sequence-to-sequence similarity search with those of sequence-to-profile searches. We employ PSI-BLAST for the former, and CDD, Pfam, and COG for the latter. An optimization algorithm was devised to arrive at a weighting scheme to combine the different evidences with KEGG-annotated GPRs as training data. We present the algorithm in the form of software "Systematic, Homology-based Automated Re-annotation for Prokaryotes (SHARP)." We predicted 3,781 new GPR associations for the 10 prokaryotes considered of which eight are cyanobacteria species. These new GPR associations fall in several metabolic pathways and were used to annotate 7,718 gaps in the metabolic network. These new annotations led to discovery of several pathways that may be active and thereby providing new directions for metabolic engineering of these species for production of useful products. Metabolic model developed on such a reconstructed network is likely to give better phenotypic predictions.
Collapse
Affiliation(s)
- S Krishnakumar
- Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India
| | | | | | | |
Collapse
|
43
|
Hoffman S, Podgurski A. The use and misuse of biomedical data: is bigger really better? AMERICAN JOURNAL OF LAW & MEDICINE 2013; 39:497-538. [PMID: 24494442 DOI: 10.1177/009885881303900401] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Very large biomedical research databases, containing electronic health records (EHR) and genomic data from millions of patients, have been heralded recently for their potential to accelerate scientific discovery and produce dramatic improvements in medical treatments. Research enabled by these databases may also lead to profound changes in law, regulation, social policy, and even litigation strategies. Yet, is "big data" necessarily better data? This paper makes an original contribution to the legal literature by focusing on what can go wrong in the process of biomedical database research and what precautions are necessary to avoid critical mistakes. We address three main reasons for approaching such research with care and being cautious in relying on its outcomes for purposes of public policy or litigation. First, the data contained in biomedical databases is surprisingly likely to be incorrect or incomplete. Second, systematic biases, arising from both the nature of the data and the preconceptions of investigators, are serious threats to the validity of research results, especially in answering causal questions. Third, data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers. In short, this paper sheds much-needed light on the problems of credulous and uninformed acceptance of research results derived from biomedical databases. An understanding of the pitfalls of big data analysis is of critical importance to anyone who will rely on or dispute its outcomes, including lawyers, policymakers, and the public at large. The Article also recommends technical, methodological, and educational interventions to combat the dangers of database errors and abuses.
Collapse
Affiliation(s)
- Sharona Hoffman
- Law-Medicine Center, Case Western Reserve University School of Law, USA
| | | |
Collapse
|
44
|
The UniProt Consortium. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Res 2013; 41:D43-7. [PMID: 23161681 PMCID: PMC3531094 DOI: 10.1093/nar/gks1068] [Citation(s) in RCA: 545] [Impact Index Per Article: 45.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2012] [Revised: 10/11/2012] [Accepted: 10/11/2012] [Indexed: 12/22/2022] Open
Abstract
The mission of the Universal Protein Resource (UniProt) (http://www.uniprot.org) is to support biological research by providing a freely accessible, stable, comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase. It integrates, interprets and standardizes data from numerous resources to achieve the most comprehensive catalogue of protein sequences and functional annotation. UniProt comprises four major components, each optimized for different uses, the UniProt Archive, the UniProt Knowledgebase, the UniProt Reference Clusters and the UniProt Metagenomic and Environmental Sequence Database. UniProt is produced by the UniProt Consortium, which consists of groups from the European Bioinformatics Institute (EBI), the SIB Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR). UniProt is updated and distributed every 4 weeks and can be accessed online for searches or downloads.
Collapse
Affiliation(s)
- The UniProt Consortium
- The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211 Geneva 4, Switzerland, Protein Information Resource, Georgetown University Medical Center, 3300 Whitehaven Street North West, Suite 1200, Washington, DC 20007 and Protein Information Resource, University of Delaware, 15 Innovation Way, Suite 205, Newark, DE 19711, USA
| |
Collapse
|
45
|
Brister JR, Le Mercier P, Hu JC. Microbial virus genome annotation-mustering the troops to fight the sequence onslaught. Virology 2012; 434:175-80. [PMID: 23084289 PMCID: PMC3518702 DOI: 10.1016/j.virol.2012.09.027] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2012] [Revised: 09/17/2012] [Accepted: 09/24/2012] [Indexed: 11/27/2022]
Abstract
The revolution in virus genome sequencing promises to effectively map the extant biological universe and reveal fundamental relationships between viral biology, genome structure, and evolution. Indeed, microbial virus genomes include large numbers of conserved coding sequences of unknown function as well as unique gene combinations, implying that that these viruses will be a significant source of novel protein biochemistry and genome architecture. Yet, making sense of the approaching phalanx of A's, G's, T's, and C's stretching across the genome sequencing horizon will require innovation and an unprecedented coordination of annotation efforts among stakeholders.
Collapse
Affiliation(s)
- J. Rodney Brister
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894
| | - Phillippe Le Mercier
- Swiss-Prot group, Swiss Institute of Bioinformatics, CMU, 1 Michel Servet, 1211 Geneva 4
| | - James C. Hu
- Department of Biochemistry and Biophysics, Texas Agrilife Research, Texas A&M University College Station, TX 77843, USA
| |
Collapse
|
46
|
|