1
|
Dommer J, Van Doorslaer K, Afrasiabi C, Browne K, Ezeji S, Kim L, Dolan M, McBride AA. PaVE 2.0: Behind the Scenes of the Papillomavirus Episteme. J Mol Biol 2025; 437:168925. [PMID: 39732323 PMCID: PMC12145264 DOI: 10.1016/j.jmb.2024.168925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2024] [Revised: 12/20/2024] [Accepted: 12/23/2024] [Indexed: 12/30/2024]
Abstract
The Papilloma Virus Episteme (PaVE) https://pave.niaid.nih.gov/ was initiated by NIAID in 2008 to provide a highly curated bioinformatic and knowledge resource for the papillomavirus scientific community. It rapidly became the fundamental and core resource for papillomavirus researchers and clinicians worldwide. Over time, the software infrastructure became severely outdated. In PaVE 2.0, the underlying libraries and hosting platform have been completely upgraded and rebuilt using Amazon Web Services (AWS) tools and automated CI/CD (continuous integration and deployment) pipelines for deployment of the application and data (now in AWS S3 cloud storage). PaVE 2.0 is hosted on three AWS ECS (elastic container service) using the NIAID Operations & Engineering Branch's Monarch tech stack and terraform. A new Celery queue supports longer running tasks. The framework is Python Flask with a JavaScript/JINJA template front end, and the database switched from MySQL to Neo4j. A Swagger API (Application Programming Interface) performs database queries, and executes jobs for BLAST, MAFFT, and the L1 typing tooland will allow future programmatic data access. All major tools such as BLAST, the L1 typing tool, genome locus viewer, phylogenetic tree generator, multiple sequence alignment, and protein structure viewer were modernized and enhanced to support more users. Multiple sequence alignment uses MAFFT instead of COBALT. The protein structure viewer was changed from Jmol to Mol*, the new embeddable viewer used by RCSB (Research Collaboratory for Structural Bioinformatics). In summary, PaVE 2.0 allows us to continue to provide this essential resource with an open-source framework that could be used as a template for molecular biology databases of other viruses.
Collapse
Affiliation(s)
- Jennifer Dommer
- Bioinformatics and Computational Biosciences Branch (BCBB), National Institute of Allergy and Infectious Diseases, Bethesda, MD, USA
| | - Koenraad Van Doorslaer
- Department of Immunobiology, College of Medicine, BIO5 Institute, University of Arizona, Tucson, AZ, USA
| | - Cyrus Afrasiabi
- Bioinformatics and Computational Biosciences Branch (BCBB), National Institute of Allergy and Infectious Diseases, Bethesda, MD, USA
| | - Kristen Browne
- Bioinformatics and Computational Biosciences Branch (BCBB), National Institute of Allergy and Infectious Diseases, Bethesda, MD, USA
| | - Sam Ezeji
- Bioinformatics and Computational Biosciences Branch (BCBB), National Institute of Allergy and Infectious Diseases, Bethesda, MD, USA
| | - Lewis Kim
- Bioinformatics and Computational Biosciences Branch (BCBB), National Institute of Allergy and Infectious Diseases, Bethesda, MD, USA
| | - Michael Dolan
- Bioinformatics and Computational Biosciences Branch (BCBB), National Institute of Allergy and Infectious Diseases, Bethesda, MD, USA
| | - Alison A McBride
- Laboratory of Viral Diseases, National Institute of Allergy and Infectious Diseases, Bethesda, MD, USA.
| |
Collapse
|
2
|
Liu H, Laiho A, Törönen P, Holm L. 3-D substructure search by transitive closure in AlphaFold database. Protein Sci 2025; 34:e70169. [PMID: 40400345 PMCID: PMC12095923 DOI: 10.1002/pro.70169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 05/01/2025] [Accepted: 05/02/2025] [Indexed: 05/23/2025]
Abstract
Identifying structural relationships between proteins is crucial for understanding their functions and evolutionary histories. We present ISS_ProtSci, a Python package designed for structural similarity searches within the AlphaFold Database v2 (AFDB2). ISS_ProtSci incorporates DaliLite to identify geometrically similar structures and uses a transitive closure algorithm to iteratively explore neighboring shells of proteins. The precomputed all-against-all comparisons generated by Foldseek, chosen for its speed, are validated by DaliLite for precision. Search results are annotated with metadata from UniProtKB and Pfam protein family classifications, using hmmsearch to identify protein domains. Outputs, including Dali pairwise alignment data, are provided in TSV format for easy filtering and analysis. Our method offers a significant improvement in recall over existing tools like Foldseek, especially in detecting more distantly related proteins. This is particularly valuable in structurally diverse protein families where traditional sequence-based or fast structural methods struggle. ISS_ProtSci delivers practical runtimes and flexibility, allowing users to input a PDB file, define the minimum size of the common core, and evaluate results using Pfam clans. In evaluating our method across 12 test cases based on Pfam clans, we achieved over 99% recall of relevant proteins, even in challenging cases where Foldseek's recall dropped below 50%. ISS_ProtSci not only identifies closely related proteins but also uncovers previously unrecognized structural relationships, contributing to more accurate protein family classifications. The software can be downloaded from http://ekhidna2.biocenter.helsinki.fi/ISS_ProtSci/.
Collapse
Affiliation(s)
- Hao Liu
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental SciencesUniversity of HelsinkiHelsinkiFinland
| | - Aleksi Laiho
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental SciencesUniversity of HelsinkiHelsinkiFinland
| | - Petri Törönen
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental SciencesUniversity of HelsinkiHelsinkiFinland
| | - Liisa Holm
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental SciencesUniversity of HelsinkiHelsinkiFinland
- Institute of BiotechnologyHiLIFE, University of HelsinkiHelsinkiFinland
| |
Collapse
|
3
|
Dapkūnas J, Margelevičius M. Web-based GTalign: bridging speed and accuracy in protein structure analysis. Nucleic Acids Res 2025:gkaf398. [PMID: 40331429 DOI: 10.1093/nar/gkaf398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2025] [Revised: 04/20/2025] [Accepted: 04/28/2025] [Indexed: 05/08/2025] Open
Abstract
Accurate protein structure alignment is essential for understanding structural and functional relationships. Here, we introduce GTalign-web, a web-based implementation of GTalign, a spatial index-driven protein structure alignment tool, designed for accessibility and high-performance structural searches. Benchmarked against the DALI and Foldseek servers, GTalign-web demonstrates superior accuracy while maintaining rapid search times. Its utility is further highlighted in annotating uncharacterized proteins through searches against UniRef30. GTalign-web provides a useful resource for protein structure analysis and functional annotation and is available at https://bioinformatics.lt/comer/gtalign. This website is free and open to all users, and there is no login requirement.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, 10257 Vilnius, Lithuania
| | - Mindaugas Margelevičius
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, 10257 Vilnius, Lithuania
| |
Collapse
|
4
|
Pérez-Cruz C, Moraleda-Montoya A, Liébana R, Terrones O, Arrizabalaga U, García-Alija M, Lorizate M, Martínez Gascueña A, García-Álvarez I, Nieto-Garai JA, Olazar-Intxausti J, Rodríguez-Colinas B, Mann E, Chiara JL, Contreras FX, Guerin ME, Trastoy B, Alonso-Sáez L. Mechanisms of recalcitrant fucoidan breakdown in marine Planctomycetota. Nat Commun 2024; 15:10906. [PMID: 39738071 PMCID: PMC11685898 DOI: 10.1038/s41467-024-55268-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 12/05/2024] [Indexed: 01/01/2025] Open
Abstract
Marine brown algae produce the highly recalcitrant polysaccharide fucoidan, contributing to long-term oceanic carbon storage and climate regulation. Fucoidan is degraded by specialized heterotrophic bacteria, which promote ecosystem function and global carbon turnover using largely uncharacterized mechanisms. Here, we isolate and study two Planctomycetota strains from the microbiome associated with the alga Fucus spiralis, which grow efficiently on chemically diverse fucoidans. One of the strains appears to internalize the polymer, while the other strain degrades it extracellularly. Multi-omic approaches show that fucoidan breakdown is mediated by the expression of divergent polysaccharide utilization loci, and endo-fucanases of family GH168 are strongly upregulated during fucoidan digestion. Enzymatic assays and structural biology studies reveal how GH168 endo-fucanases degrade various fucoidan cores from brown algae, assisted by auxiliary hydrolytic enzymes. Overall, our results provide insights into fucoidan processing mechanisms in macroalgal-associated bacteria.
Collapse
Affiliation(s)
- Carla Pérez-Cruz
- AZTI, Marine Research, Basque Research and Technology Alliance (BRTA), Sukarrieta, Spain
| | - Alicia Moraleda-Montoya
- Structural Glycoimmunology Laboratory, Biobizkaia Health Research Institute, Barakaldo, Spain
| | - Raquel Liébana
- AZTI, Marine Research, Basque Research and Technology Alliance (BRTA), Sukarrieta, Spain
| | - Oihana Terrones
- Department of Biochemistry and Molecular Biology, Faculty of Science and Technology, University of the Basque Country, Leioa, Spain
| | - Uxue Arrizabalaga
- AZTI, Marine Research, Basque Research and Technology Alliance (BRTA), Sukarrieta, Spain
| | - Mikel García-Alija
- Structural Glycoimmunology Laboratory, Biobizkaia Health Research Institute, Barakaldo, Spain
| | - Maier Lorizate
- Department of Biochemistry and Molecular Biology, Faculty of Science and Technology, University of the Basque Country, Leioa, Spain
| | - Ana Martínez Gascueña
- Structural Glycoimmunology Laboratory, Biobizkaia Health Research Institute, Barakaldo, Spain
| | - Isabel García-Álvarez
- Facultad de Ciencias Experimentales, Universidad Francisco de Vitoria, Pozuelo de Alarcón, Madrid, Spain
| | - Jon Ander Nieto-Garai
- Department of Biochemistry and Molecular Biology, Faculty of Science and Technology, University of the Basque Country, Leioa, Spain
| | - June Olazar-Intxausti
- Department of Biochemistry and Molecular Biology, Faculty of Science and Technology, University of the Basque Country, Leioa, Spain
| | - Bárbara Rodríguez-Colinas
- Facultad de Ciencias Experimentales, Universidad Francisco de Vitoria, Pozuelo de Alarcón, Madrid, Spain
| | - Enrique Mann
- Instituto de Química Orgánica General (IQOG-CSIC), Madrid, Spain
| | - José Luis Chiara
- Instituto de Química Orgánica General (IQOG-CSIC), Madrid, Spain
| | - Francesc-Xabier Contreras
- Department of Biochemistry and Molecular Biology, Faculty of Science and Technology, University of the Basque Country, Leioa, Spain.
- Instituto Biofisika (UPV/EHU, CSIC), University of the Basque Country, Leioa, Spain.
- Ikerbasque, Basque Foundation for Science, Bilbao, Spain.
| | - Marcelo E Guerin
- Structural Glycobiology Laboratory, Department of Structural and Molecular Biology; Molecular Biology Institute of Barcelona (IBMB), Spanish National Research Council (CSIC), Barcelona Science Park, Tower R, Barcelona, Spain.
| | - Beatriz Trastoy
- Structural Glycoimmunology Laboratory, Biobizkaia Health Research Institute, Barakaldo, Spain.
- Ikerbasque, Basque Foundation for Science, Bilbao, Spain.
| | - Laura Alonso-Sáez
- AZTI, Marine Research, Basque Research and Technology Alliance (BRTA), Sukarrieta, Spain.
| |
Collapse
|
5
|
Aplakidou E, Vergoulidis N, Chasapi M, Venetsianou NK, Kokoli M, Panagiotopoulou E, Iliopoulos I, Karatzas E, Pafilis E, Georgakopoulos-Soares I, Kyrpides NC, Pavlopoulos GA, Baltoumas FA. Visualizing metagenomic and metatranscriptomic data: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2011-2033. [PMID: 38765606 PMCID: PMC11101950 DOI: 10.1016/j.csbj.2024.04.060] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open
Abstract
The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.
Collapse
Affiliation(s)
- Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nikolaos Vergoulidis
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Chasapi
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Kokoli
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Eleni Panagiotopoulou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikos C. Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Center of New Biotechnologies & Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Greece
- Hellenic Army Academy, 16673 Vari, Greece
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| |
Collapse
|
6
|
McCann HM, Meade CD, Banerjee B, Penev PI, Dean Williams L, Petrov AS. RiboVision2: A Web Server for Advanced Visualization of Ribosomal RNAs. J Mol Biol 2024; 436:168556. [PMID: 39237196 DOI: 10.1016/j.jmb.2024.168556] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/24/2024] [Accepted: 03/25/2024] [Indexed: 09/07/2024]
Abstract
RiboVision2 is a web server designed to visualize phylogenetic, structural, and evolutionary properties of ribosomal RNAs simultaneously at the levels of primary, secondary, and three-dimensional structure and in the context of full ribosomal complexes. RiboVision2 instantly computes and displays a broad variety of data; it has no login requirements, is open-source, free for all users, and available at https://ribovision2.chemistry.gatech.edu.
Collapse
Affiliation(s)
- Holly M McCann
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Caeden D Meade
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Biswajit Banerjee
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Petar I Penev
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Loren Dean Williams
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Anton S Petrov
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA; School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.
| |
Collapse
|
7
|
Schebera J, Zeckzer D, Wiegreffe D. A layout framework for genome-wide multiple sequence alignment graphs. FRONTIERS IN BIOINFORMATICS 2024; 4:1358374. [PMID: 39221004 PMCID: PMC11362851 DOI: 10.3389/fbinf.2024.1358374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 07/08/2024] [Indexed: 09/04/2024] Open
Abstract
Sequence alignments are often used to analyze genomic data. However, such alignments are often only calculated and compared on small sequence intervals for analysis purposes. When comparing longer sequences, these are usually divided into shorter sequence intervals for better alignment results. This usually means that the order context of the original sequence is lost. To prevent this, it is possible to use a graph structure to represent the order of the original sequence on the alignment blocks. The visualization of these graph structures can provide insights into the structural variations of genomes in a semi-global context. In this paper, we propose a new graph drawing framework for representing gMSA data. We produce a hierarchical graph layout that supports the comparative analysis of genomes. Based on a reference, the differences and similarities of the different genome orders are visualized. In this work, we present a complete graph drawing framework for gMSA graphs together with the respective algorithms for each of the steps. Additionally, we provide a prototype and an example data set for analyzing gMSA graphs. Based on this data set, we demonstrate the functionalities of the framework using two examples.
Collapse
Affiliation(s)
- Jeremias Schebera
- Image and Signal Processing Group, Institute for Computer Science, Leipzig University, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Leipzig University, Leipzig, Germany
| | - Dirk Zeckzer
- Image and Signal Processing Group, Institute for Computer Science, Leipzig University, Leipzig, Germany
| | - Daniel Wiegreffe
- Image and Signal Processing Group, Institute for Computer Science, Leipzig University, Leipzig, Germany
| |
Collapse
|
8
|
Andress Huacachino A, Joo J, Narayanan N, Tehim A, Himes BE, Penning TM. Aldo-keto reductase (AKR) superfamily website and database: An update. Chem Biol Interact 2024; 398:111111. [PMID: 38878851 PMCID: PMC11232437 DOI: 10.1016/j.cbi.2024.111111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 05/09/2024] [Accepted: 06/13/2024] [Indexed: 06/23/2024]
Abstract
The aldo-keto reductase (AKR) superfamily is a large family of proteins found across the kingdoms of life. Shared features of the family include 1) structural similarities such as an (α/β)8-barrel structure, disordered loop structure, cofactor binding site, and a catalytic tetrad, and 2) the ability to catalyze the nicotinamide adenine dinucleotide (phosphate) reduced (NAD(P)H)-dependent reduction of a carbonyl group. A criteria of family membership is that the protein must have a measured function, and thus, genomic sequences suggesting the transcription of potential AKR proteins are considered pseudo-members until evidence of a functionally expressed protein is available. Currently, over 200 confirmed AKR superfamily members are reported to exist. A systematic nomenclature for the AKR superfamily exists to facilitate family and subfamily designations of the member to be communicated easily. Specifically, protein names include the root "AKR", followed by the family represented by an Arabic number, the subfamily-if one exists-represented by a letter, and finally, the individual member represented by an Arabic number. The AKR superfamily database has been dedicated to tracking and reporting the current knowledge of the AKRs since 1997, and the website was last updated in 2003. Here, we present an updated version of the website and database that were released in 2023. The database contains genetic, functional, and structural data drawn from various sources, while the website provides alignment information and family tree structure derived from bioinformatics analyses.
Collapse
Affiliation(s)
- Andrea Andress Huacachino
- Department of Biochemistry & Biophysics, University of Pennsylvania, Philadelphia, PA, 19104-6061, USA; Center of Excellence in Environmental Toxicology, University of Pennsylvania, Philadelphia, PA, 19104-6061, USA
| | - Jaehyun Joo
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104-6061, USA
| | - Nisha Narayanan
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104-6061, USA
| | - Anisha Tehim
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104-6061, USA
| | - Blanca E Himes
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, 19104-6061, USA; Center of Excellence in Environmental Toxicology, University of Pennsylvania, Philadelphia, PA, 19104-6061, USA
| | - Trevor M Penning
- Center of Excellence in Environmental Toxicology, University of Pennsylvania, Philadelphia, PA, 19104-6061, USA; Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, 19104-6061, USA.
| |
Collapse
|
9
|
Sidharthan VK, Reddy V, Kiran G, Rajeswari V, Baranwal VK, Kumar MK, Kumar KS. Probing of plant transcriptomes reveals the hidden genetic diversity of the family Secoviridae. Arch Virol 2024; 169:150. [PMID: 38898334 DOI: 10.1007/s00705-024-06076-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Accepted: 05/07/2024] [Indexed: 06/21/2024]
Abstract
Secoviruses are single-stranded RNA viruses that infect plants. In the present study, we identified 61 putative novel secoviral genomes in various plant species by mining publicly available plant transcriptome data. These viral sequences represent the genomes of 13 monopartite and 48 bipartite secovirids. The genome sequences of 52 secovirids were coding-complete, and nine were partial. Except for small open reading frames (ORFs) determined in waikaviral genomes and RNA2 of torradoviruses, all of the recovered genomes/genome segments contained a large ORF encoding a polyprotein. Based on genome organization and phylogeny, all but three of the novel secoviruses were assigned to different genera. The genome organization of two identified waika-like viruses resembled that of the recently identified waika-like virus Triticum aestivum secovirus. Phylogenetic analysis revealed a pattern of host-virus co-evolution in a few waika- and waika-like viruses and increased phylogenetic diversity of nepoviruses. The study provides a basis for further investigation of the biological properties of these novel secoviruses.
Collapse
Affiliation(s)
- V Kavi Sidharthan
- Division of Genetics and Tree Improvement, ICFRE-Institute of Forest Biodiversity, Hyderabad, India.
| | - Vijayprakash Reddy
- Division of Genetics and Tree Improvement, ICFRE-Institute of Forest Biodiversity, Hyderabad, India
| | - G Kiran
- Division of Genetics and Tree Improvement, ICFRE-Institute of Forest Biodiversity, Hyderabad, India
| | - V Rajeswari
- School of Agricultural Sciences, Malla Reddy University, Hyderabad, India
| | - V K Baranwal
- Division of Plant Pathology, ICAR-Indian Agricultural Research Institute, New Delhi, India
| | - M Kiran Kumar
- Division of Genetics and Tree Improvement, ICFRE-Institute of Forest Biodiversity, Hyderabad, India
| | - K Sudheer Kumar
- Division of Genetics and Tree Improvement, ICFRE-Institute of Forest Biodiversity, Hyderabad, India
| |
Collapse
|
10
|
Wendt G, Collins JJ. Horizontal gene transfer of a functional cki homolog in the human pathogen Schistosoma mansoni. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596073. [PMID: 38853947 PMCID: PMC11160599 DOI: 10.1101/2024.05.27.596073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Schistosomes are parasitic flatworms responsible for the neglected tropical disease schistosomiasis, causing devastating morbidity and mortality in the developing world. The parasites are protected by a skin-like tegument, and maintenance of this tegument is controlled by a schistosome ortholog of the tumor suppressor TP53. To understand mechanistically how p53-1 controls tegument production, we identified a cyclin dependent kinase inhibitor homolog (cki) that was co-expressed with p53-1. RNA interference of cki resulted in a hyperproliferation phenotype, that, in combination with p53-1 RNA interference yielded abundant tumor-like growths, indicating that cki and p53-1 are bona fide tumor suppressors in Schistosoma mansoni. Interestingly, cki homologs are widely present throughout parasitic flatworms but evidently absent from their free-living ancestors, suggesting this cki homolog came from an ancient horizontal gene transfer event. This in turn implies that the evolution of parasitism in flatworms may have been aided by a highly unusual means of metazoan genetic inheritance.
Collapse
Affiliation(s)
- George Wendt
- Department of Pharmacology, University of Texas Southwestern Medical Center
| | - James J Collins
- Department of Pharmacology, University of Texas Southwestern Medical Center
| |
Collapse
|
11
|
Cannon EK, Portwood JL, Hayford RK, Haley OC, Gardiner JM, Andorf CM, Woodhouse MR. Enhanced pan-genomic resources at the maize genetics and genomics database. Genetics 2024; 227:iyae036. [PMID: 38577974 DOI: 10.1093/genetics/iyae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 01/13/2024] [Indexed: 04/06/2024] Open
Abstract
Pan-genomes, encompassing the entirety of genetic sequences found in a collection of genomes within a clade, are more useful than single reference genomes for studying species diversity. This is especially true for a species like Zea mays, which has a particularly diverse and complex genome. Presenting pan-genome data, analyses, and visualization is challenging, especially for a diverse species, but more so when pan-genomic data is linked to extensive gene model and gene data, including classical gene information, markers, insertions, expression and proteomic data, and protein structures as is the case at MaizeGDB. Here, we describe MaizeGDB's expansion to include the genic subset of the Zea pan-genome in a pan-gene data center featuring the maize genomes hosted at MaizeGDB, and the outgroup teosinte Zea genomes from the Pan-Andropoganeae project. The new data center offers a variety of browsing and visualization tools, including sequence alignment visualization, gene trees and other tools, to explore pan-genes in Zea that were calculated by the pipeline Pandagma. Combined, these data will help maize researchers study the complexity and diversity of Zea, and to use the comparative functions to validate pan-gene relationships for a selected gene model.
Collapse
Affiliation(s)
- Ethalinda K Cannon
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA
| | - John L Portwood
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA
| | - Rita K Hayford
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA
| | - Olivia C Haley
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA
| | - Jack M Gardiner
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
| | - Carson M Andorf
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | | |
Collapse
|
12
|
Baltoumas FA, Karatzas E, Liu S, Ovchinnikov S, Sofianatos Y, Chen IM, Kyrpides N, Pavlopoulos G. NMPFamsDB: a database of novel protein families from microbial metagenomes and metatranscriptomes. Nucleic Acids Res 2024; 52:D502-D512. [PMID: 37811892 PMCID: PMC10767849 DOI: 10.1093/nar/gkad800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 09/19/2023] [Indexed: 10/10/2023] Open
Abstract
The Novel Metagenome Protein Families Database (NMPFamsDB) is a database of metagenome- and metatranscriptome-derived protein families, whose members have no hits to proteins of reference genomes or Pfam domains. Each protein family is accompanied by multiple sequence alignments, Hidden Markov Models, taxonomic information, ecosystem and geolocation metadata, sequence and structure predictions, as well as 3D structure models predicted with AlphaFold2. In its current version, NMPFamsDB hosts over 100 000 protein families, each with at least 100 members. The reported protein families significantly expand (more than double) the number of known protein sequence clusters from reference genomes and reveal new insights into their habitat distribution, origins, functions and taxonomy. We expect NMPFamsDB to be a valuable resource for microbial proteome-wide analyses and for further discovery and characterization of novel functions. NMPFamsDB is publicly available in http://www.nmpfamsdb.org/ or https://bib.fleming.gr/NMPFamsDB.
Collapse
Affiliation(s)
- Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - Sirui Liu
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA 02138, USA
| | - Yorgos Sofianatos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
| | - I-Min Chen
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8150, USA
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8150, USA
| | - Georgios A Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, 16672, Greece
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720-8150, USA
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 75 Mikras Asias Street, Athens 11527, Greece
| |
Collapse
|
13
|
Zhao P, Zhou S, Xu P, Su H, Han Y, Dong J, Sui H, Li X, Hu Y, Wu Z, Liu B, Zhang T, Yang F. RVdb: a comprehensive resource and analysis platform for rhinovirus research. Nucleic Acids Res 2024; 52:D770-D776. [PMID: 37930838 PMCID: PMC10768139 DOI: 10.1093/nar/gkad937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 10/08/2023] [Accepted: 10/10/2023] [Indexed: 11/08/2023] Open
Abstract
Rhinovirus (RV), a prominent causative agent of both upper and lower respiratory diseases, ranks among the most prevalent human respiratory viruses. RV infections are associated with various illnesses, including colds, asthma exacerbations, croup and pneumonia, imposing significant and extended societal burdens. Characterized by a high mutation rate and genomic diversity, RV displays a diverse serological landscape, encompassing a total of 174 serotypes identified to date. Understanding RV genetic diversity is crucial for epidemiological surveillance and investigation of respiratory diseases. This study introduces a comprehensive and high-quality RV data resource, designated RVdb (http://rvdb.mgc.ac.cn), covering 26 909 currently identified RV strains, along with RV-related sequences, 3D protein structures and publications. Furthermore, this resource features a suite of web-based utilities optimized for easy browsing and searching, as well as automatic sequence annotation, multiple sequence alignment (MSA), phylogenetic tree construction, RVdb BLAST and a serotyping pipeline. Equipped with a user-friendly interface and integrated online bioinformatics tools, RVdb provides a convenient and powerful platform on which to analyse the genetic characteristics of RVs. Additionally, RVdb also supports the efforts of virologists and epidemiologists to monitor and trace both existing and emerging RV-related infectious conditions in a public health context.
Collapse
Affiliation(s)
- Peng Zhao
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Siyu Zhou
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Panpan Xu
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Haoxiang Su
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Yelin Han
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Jie Dong
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Hongtao Sui
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Xin Li
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Yongfeng Hu
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Zhiqiang Wu
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Bo Liu
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Ting Zhang
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
| | - Fan Yang
- NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 102629, P.R. China
- Key Laboratory of Respiratory Disease Pathogenomics, Chinese Academy of Medical Sciences, Beijing 102629, P.R. China
- State Key Laboratory of Respiratory Health and Multimorbidity, Beijing 102629, P.R. China
| |
Collapse
|
14
|
Budiš J, Krampl W, Kucharík M, Hekel R, Goga A, Sitarčík J, Lichvár M, Smol’ak D, Böhmer M, Baláž A, Ďuriš F, Gazdarica J, Šoltys K, Turňa J, Radvánszky J, Szemes T. SnakeLines: integrated set of computational pipelines for sequencing reads. J Integr Bioinform 2023; 20:jib-2022-0059. [PMID: 37602733 PMCID: PMC10757078 DOI: 10.1515/jib-2022-0059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Accepted: 03/21/2023] [Indexed: 08/22/2023] Open
Abstract
With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilising sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibility of computational analyses across separated computational centres with inconsistent versions of installed libraries and bioinformatics tools. We propose an easily extensible set of computational pipelines, called SnakeLines, for processing sequencing reads; including mapping, assembly, variant calling, viral identification, transcriptomics, and metagenomics analysis. Individual steps of an analysis, along with methods and their parameters can be readily modified in a single configuration file. Provided pipelines are embedded in virtual environments that ensure isolation of required resources from the host operating system, rapid deployment, and reproducibility of analysis across different Unix-based platforms. SnakeLines is a powerful framework for the automation of bioinformatics analyses, with emphasis on a simple set-up, modifications, extensibility, and reproducibility. The framework is already routinely used in various research projects and their applications, especially in the Slovak national surveillance of SARS-CoV-2.
Collapse
Affiliation(s)
- Jaroslav Budiš
- Geneton Ltd., 841 04Bratislava, Slovakia
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
| | - Werner Krampl
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Marcel Kucharík
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
| | - Rastislav Hekel
- Geneton Ltd., 841 04Bratislava, Slovakia
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Adrián Goga
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Computer Science, Faculty of Mathematics, Physics and Informatics, Comenius University, 841 04Bratislava, Slovakia
| | - Jozef Sitarčík
- Geneton Ltd., 841 04Bratislava, Slovakia
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
| | - Michal Lichvár
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
| | - Dávid Smol’ak
- Geneton Ltd., 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Miroslav Böhmer
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Andrej Baláž
- Geneton Ltd., 841 04Bratislava, Slovakia
- Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics, Comenius University, 841 04Bratislava, Slovakia
| | - František Ďuriš
- Geneton Ltd., 841 04Bratislava, Slovakia
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
| | - Juraj Gazdarica
- Geneton Ltd., 841 04Bratislava, Slovakia
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
| | - Katarína Šoltys
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Ján Turňa
- Slovak Centre of Scientific and Technical Information, 811 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| | - Ján Radvánszky
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences, 845 05Bratislava, Slovakia
| | - Tomáš Szemes
- Geneton Ltd., 841 04Bratislava, Slovakia
- Comenius University Science Park, 841 04Bratislava, Slovakia
- Department of Molecular Biology, Faculty of Natural Sciences, Comenius University, 841 04Bratislava, Slovakia
| |
Collapse
|
15
|
Ouyang Y, Nauwynck HJ. PCV2 Uptake by Porcine Monocytes Is Strain-Dependent and Is Associated with Amino Acid Characteristics on the Capsid Surface. Microbiol Spectr 2023; 11:e0380522. [PMID: 36719220 PMCID: PMC10100887 DOI: 10.1128/spectrum.03805-22] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 01/13/2023] [Indexed: 02/01/2023] Open
Abstract
Porcine circovirus type 2 (PCV2) is associated with several economically important diseases that are described as PCV2-associated diseases (PCVADs). PCV2 is replicating in lymphoblasts, and PCV2 particles are taken up by monocytes without effective replication or complete degradation. Glycosaminoglycans (GAGs) have been demonstrated to be important receptors for PCV2 binding and entry in T-lymphocytes and continuous cell lines. The objective of this study was to determine whether differences exist in viral uptake and outcome among six PCV2 strains from different disease outbreaks in primary porcine monocytes: Stoon-1010 (PCV2a; PMWS), 1121 (PCV2a; abortion), 1147 (PCV2b; PDNS), 09V448 (PCV2d-1; PCVAD with high viral load in lymphoid tissues [PCVADhigh]), DE222-13 (PCV2d-2; PCVADhigh), and 19V245 (PCV2d-2; PCVADhigh). The uptake of PCV2 in peripheral blood monocytes was different among the PCV2 strains. A large number of PCV2 particles were found in the monocytes for Stoon-1010, DE222-13, and 19V245, while a low number was found for 1121, 1147, and 09V448. Competition with, and removal of GAGs on the cell surface, demonstrated an important role of chondroitin sulfate (CS) and dermatan sulfate (DS) in PCV2 entry into monocytes. The mapping of positively/negatively charged amino acids exposed on the surface of PCV2 capsids revealed that their number and distribution could have an impact on the binding of the capsids to GAGs, and the internalization into monocytes. Based on the distribution of positively charged amino acids on PCV2 capsids, phosphacan was hypothesized, and further demonstrated, as an effective candidate to mediate virus attachment to, and internalization in, monocytes. IMPORTANCE PCV2 is present on almost every pig farm in the world and is associated with a high number of diseases (PCV2-associated diseases [PCVADs]). It causes severe economic losses. Although vaccination is successfully applied in the field, there are still a lot of unanswered questions on the pathogenesis of PCV2 infections. This article reports on the uptake difference of various PCV2 strains by peripheral blood monocytes, and reveals the mechanism of the strong viral uptake ability of monocytes of Piétrain pigs. We further demonstrated that: (i) GAGs mediate the uptake of PCV2 particles by monocytes, (ii) positively charged three-wings-windmill-like amino acid patterns on the capsid outer surface are activating PCV2 uptake, and (iii) phosphacan is one of the potential candidates for PCV2 internalization. These results provide new insights into the mechanisms involved in PCVAD and contribute to a better understanding of PCV2 evolution. This may lead to the development of resistant pigs.
Collapse
Affiliation(s)
- Yueling Ouyang
- Laboratory of Virology, Department of Translational Physiology, Infectiology and Public Health, Faculty of Veterinary Medicine, Ghent University, Ghent, Belgium
| | - Hans J. Nauwynck
- Laboratory of Virology, Department of Translational Physiology, Infectiology and Public Health, Faculty of Veterinary Medicine, Ghent University, Ghent, Belgium
| |
Collapse
|
16
|
Wafula EK, Zhang H, Von Kuster G, Leebens-Mack JH, Honaas LA, dePamphilis CW. PlantTribes2: Tools for comparative gene family analysis in plant genomics. FRONTIERS IN PLANT SCIENCE 2023; 13:1011199. [PMID: 36798801 PMCID: PMC9928214 DOI: 10.3389/fpls.2022.1011199] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 12/02/2022] [Indexed: 05/12/2023]
Abstract
Plant genome-scale resources are being generated at an increasing rate as sequencing technologies continue to improve and raw data costs continue to fall; however, the cost of downstream analyses remains large. This has resulted in a considerable range of genome assembly and annotation qualities across plant genomes due to their varying sizes, complexity, and the technology used for the assembly and annotation. To effectively work across genomes, researchers increasingly rely on comparative genomic approaches that integrate across plant community resources and data types. Such efforts have aided the genome annotation process and yielded novel insights into the evolutionary history of genomes and gene families, including complex non-model organisms. The essential tools to achieve these insights rely on gene family analysis at a genome-scale, but they are not well integrated for rapid analysis of new data, and the learning curve can be steep. Here we present PlantTribes2, a scalable, easily accessible, highly customizable, and broadly applicable gene family analysis framework with multiple entry points including user provided data. It uses objective classifications of annotated protein sequences from existing, high-quality plant genomes for comparative and evolutionary studies. PlantTribes2 can improve transcript models and then sort them, either genome-scale annotations or individual gene coding sequences, into pre-computed orthologous gene family clusters with rich functional annotation information. Then, for gene families of interest, PlantTribes2 performs downstream analyses and customizable visualizations including, (1) multiple sequence alignment, (2) gene family phylogeny, (3) estimation of synonymous and non-synonymous substitution rates among homologous sequences, and (4) inference of large-scale duplication events. We give examples of PlantTribes2 applications in functional genomic studies of economically important plant families, namely transcriptomics in the weedy Orobanchaceae and a core orthogroup analysis (CROG) in Rosaceae. PlantTribes2 is freely available for use within the main public Galaxy instance and can be downloaded from GitHub or Bioconda. Importantly, PlantTribes2 can be readily adapted for use with genomic and transcriptomic data from any kind of organism.
Collapse
Affiliation(s)
- Eric K Wafula
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Huiting Zhang
- Tree Fruit Research Laboratory, United States Department of Agriculture (USDA), Agricultural Research Service (ARS), Wenatchee, WA, United States
- Department of Horticulture, Washington State University, Pullman, WA, United States
| | - Gregory Von Kuster
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, United States
| | | | - Loren A Honaas
- Tree Fruit Research Laboratory, United States Department of Agriculture (USDA), Agricultural Research Service (ARS), Wenatchee, WA, United States
| | - Claude W dePamphilis
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, United States
| |
Collapse
|
17
|
Kjær KH, Winther Pedersen M, De Sanctis B, De Cahsan B, Korneliussen TS, Michelsen CS, Sand KK, Jelavić S, Ruter AH, Schmidt AMA, Kjeldsen KK, Tesakov AS, Snowball I, Gosse JC, Alsos IG, Wang Y, Dockter C, Rasmussen M, Jørgensen ME, Skadhauge B, Prohaska A, Kristensen JÅ, Bjerager M, Allentoft ME, Coissac E, Rouillard A, Simakova A, Fernandez-Guerra A, Bowler C, Macias-Fauria M, Vinner L, Welch JJ, Hidy AJ, Sikora M, Collins MJ, Durbin R, Larsen NK, Willerslev E. A 2-million-year-old ecosystem in Greenland uncovered by environmental DNA. Nature 2022; 612:283-291. [PMID: 36477129 PMCID: PMC9729109 DOI: 10.1038/s41586-022-05453-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 10/18/2022] [Indexed: 12/12/2022]
Abstract
Late Pliocene and Early Pleistocene epochs 3.6 to 0.8 million years ago1 had climates resembling those forecasted under future warming2. Palaeoclimatic records show strong polar amplification with mean annual temperatures of 11-19 °C above contemporary values3,4. The biological communities inhabiting the Arctic during this time remain poorly known because fossils are rare5. Here we report an ancient environmental DNA6 (eDNA) record describing the rich plant and animal assemblages of the Kap København Formation in North Greenland, dated to around two million years ago. The record shows an open boreal forest ecosystem with mixed vegetation of poplar, birch and thuja trees, as well as a variety of Arctic and boreal shrubs and herbs, many of which had not previously been detected at the site from macrofossil and pollen records. The DNA record confirms the presence of hare and mitochondrial DNA from animals including mastodons, reindeer, rodents and geese, all ancestral to their present-day and late Pleistocene relatives. The presence of marine species including horseshoe crab and green algae support a warmer climate than today. The reconstructed ecosystem has no modern analogue. The survival of such ancient eDNA probably relates to its binding to mineral surfaces. Our findings open new areas of genetic research, demonstrating that it is possible to track the ecology and evolution of biological communities from two million years ago using ancient eDNA.
Collapse
Affiliation(s)
- Kurt H Kjær
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
| | - Mikkel Winther Pedersen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Bianca De Sanctis
- Department of Zoology, University of Cambridge, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Binia De Cahsan
- Section for Molecular Ecology and Evolution, The Globe Institute, Faculty of Health and Medical Sciences, Copenhagen, Denmark
| | - Thorfinn S Korneliussen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Christian S Michelsen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Niels Bohr Institute, University of Copenhagen, Copenhagen, Denmark
| | - Karina K Sand
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Stanislav Jelavić
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Université Grenoble Alpes, Université Savoie Mont Blanc, CNRS, IRD, Université Gustave Eiffel, ISTerre, Grenoble, France
| | - Anthony H Ruter
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Astrid M A Schmidt
- Nordic Foundation for Development and Ecology (NORDECO), Copenhagen, Denmark
- DIS Study Abroad in Scandinavia, University of Copenhagen, Copenhagen, Denmark
| | - Kristian K Kjeldsen
- Department of Glaciology and Climate, Geological Survey of Denmark and Greenland, Copenhagen, Denmark
| | - Alexey S Tesakov
- Geological Institute, Russian Academy of Sciences, Moscow, Russia
| | - Ian Snowball
- Department of Earth Sciences, Uppsala University, Uppsala, Sweden
| | - John C Gosse
- Department of Earth and Environmental Sciences, Dalhousie University, Halifax, Nova Scotia, Canada
| | - Inger G Alsos
- The Arctic University Museum of Norway, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Yucheng Wang
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Zoology, University of Cambridge, Cambridge, UK
| | | | | | | | | | - Ana Prohaska
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Zoology, University of Cambridge, Cambridge, UK
| | - Jeppe Å Kristensen
- Environmental Change Institute, School of Geography and the Environment, University of Oxford, Oxford, UK
- Geological Survey of Denmark and Greenland, (GEUS), Copenhagen, Denmark
| | - Morten Bjerager
- Department of Geophysics and Sedimentary Basins, Geological Survey of Denmark and Greenland, Copenhagen, Denmark
| | - Morten E Allentoft
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Sciences, Curtin University, Perth, Western Australia, Australia
| | - Eric Coissac
- The Arctic University Museum of Norway, UiT-The Arctic University of Norway, Tromsø, Norway
- University of Grenoble-Alpes, Université Savoie Mont Blanc, CNRS, LECA, Grenoble, France
| | - Alexandra Rouillard
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Geosciences, UiT-The Arctic University of Norway, Tromsø, Norway
| | | | - Antonio Fernandez-Guerra
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Chris Bowler
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Ecole Normale Supérieure, CNRS, INSERM Université PSL, Paris, France
| | - Marc Macias-Fauria
- School of Geography and the Environment, University of Oxford, Oxford, UK
| | - Lasse Vinner
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - John J Welch
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Alan J Hidy
- Center for Accelerator Mass Spectrometry, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - Martin Sikora
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Matthew J Collins
- Department of Archaeology, University of Cambridge, Cambridge, UK
- Section for GeoBiology, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Nicolaj K Larsen
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
| | - Eske Willerslev
- Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark.
- Department of Zoology, University of Cambridge, Cambridge, UK.
- MARUM, University of Bremen, Bremen, Germany.
| |
Collapse
|
18
|
Brenes Guallar MA, Fokkens L, Rep M, Berke L, van Dam P. Fusarium oxysporum effector clustering version 2: An updated pipeline to infer host range. FRONTIERS IN PLANT SCIENCE 2022; 13:1012688. [PMID: 36340405 PMCID: PMC9627151 DOI: 10.3389/fpls.2022.1012688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
The fungus Fusarium oxysporum is infamous for its devastating effects on economically important crops worldwide. F. oxysporum isolates are grouped into formae speciales based on their ability to cause disease on different hosts. Assigning F. oxysporum strains to formae speciales using non-experimental procedures has proven to be challenging due to their genetic heterogeneity and polyphyletic nature. However, genetically diverse isolates of the same forma specialis encode similar repertoires of effectors, proteins that are secreted by the fungus and contribute to the establishment of compatibility with the host. Based on this observation, we previously designed the F. oxysporum Effector Clustering (FoEC) pipeline which is able to classify F. oxysporum strains by forma specialis based on hierarchical clustering of the presence of predicted putative effector sequences, solely using genome assemblies as input. Here we present the updated FoEC2 pipeline which is more user friendly, customizable and, due to multithreading, has improved scalability. It is designed as a Snakemake pipeline and incorporates a new interactive visualization app. We showcase FoEC2 by clustering 537 publicly available F. oxysporum genomes and further analysis of putative effector families as multiple sequence alignments. We confirm classification of isolates into formae speciales and are able to further identify their subtypes. The pipeline is available on github: https://github.com/pvdam3/FoEC2.
Collapse
Affiliation(s)
- Megan A. Brenes Guallar
- Bioinformatics and Software Development Team, Genetwister Technologies B.V., Wageningen, Netherlands
| | - Like Fokkens
- Laboratory of Phytopathology, Wageningen University, Wageningen, Netherlands
- Molecular Plant Pathology, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
| | - Martijn Rep
- Molecular Plant Pathology, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, Netherlands
| | - Lidija Berke
- Bioinformatics and Software Development Team, Genetwister Technologies B.V., Wageningen, Netherlands
| | - Peter van Dam
- Bioinformatics and Software Development Team, Genetwister Technologies B.V., Wageningen, Netherlands
| |
Collapse
|
19
|
Droc G, Martin G, Guignon V, Summo M, Sempéré G, Durant E, Soriano A, Baurens FC, Cenci A, Breton C, Shah T, Aury JM, Ge XJ, Harrison PH, Yahiaoui N, D’Hont A, Rouard M. The banana genome hub: a community database for genomics in the Musaceae. HORTICULTURE RESEARCH 2022; 9:uhac221. [PMID: 36479579 PMCID: PMC9720444 DOI: 10.1093/hr/uhac221] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 09/22/2022] [Indexed: 06/17/2023]
Abstract
The Banana Genome Hub provides centralized access for genome assemblies, annotations, and the extensive related omics resources available for bananas and banana relatives. A series of tools and unique interfaces are implemented to harness the potential of genomics in bananas, leveraging the power of comparative analysis, while recognizing the differences between datasets. Besides effective genomic tools like BLAST and the JBrowse genome browser, additional interfaces enable advanced gene search and gene family analyses including multiple alignments and phylogenies. A synteny viewer enables the comparison of genome structures between chromosome-scale assemblies. Interfaces for differential expression analyses, metabolic pathways and GO enrichment were also added. A catalogue of variants spanning the banana diversity is made available for exploration, filtering, and export to a wide variety of software. Furthermore, we implemented new ways to graphically explore gene presence-absence in pangenomes as well as genome ancestry mosaics for cultivated bananas. Besides, to guide the community in future sequencing efforts, we provide recommendations for nomenclature of locus tags and a curated list of public genomic resources (assemblies, resequencing, high density genotyping) and upcoming resources-planned, ongoing or not yet public. The Banana Genome Hub aims at supporting the banana scientific community for basic, translational, and applied research and can be accessed at https://banana-genome-hub.southgreen.fr.
Collapse
Affiliation(s)
| | - Guillaume Martin
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
| | - Valentin Guignon
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
| | - Marilyne Summo
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
| | - Guilhem Sempéré
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- CIRAD, UMR INTERTRYP, F-34398 Montpellier, France
- INTERTRYP, Université de Montpellier, CIRAD, IRD, 34398 Montpellier, France
| | - Eloi Durant
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- Syngenta Seeds SAS, Saint-Sauveur, 31790, France
- DIADE, Univ Montpellier, CIRAD, IRD, Montpellier, 34830, France
| | - Alexandre Soriano
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
| | - Franc-Christophe Baurens
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
| | - Alberto Cenci
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
| | - Catherine Breton
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
| | | | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| | - Xue-Jun Ge
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510520, China
- Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou 510520, China
| | - Pat Heslop Harrison
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510520, China
- Department of Genetics and Genome Biology, University of Leicester, Leicester LE1 7RH, UK
| | - Nabila Yahiaoui
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
| | - Angélique D’Hont
- CIRAD, UMR AGAP Institut, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
| | | |
Collapse
|
20
|
Yang Q, Liu T, Wu T, Lei T, Li Y, Wang X. GGDB: A Grameneae genome alignment database of homologous genes hierarchically related to evolutionary events. PLANT PHYSIOLOGY 2022; 190:340-351. [PMID: 35789395 PMCID: PMC9434254 DOI: 10.1093/plphys/kiac297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 06/01/2022] [Indexed: 06/15/2023]
Abstract
The genomes of Gramineae plants have been preferentially sequenced owing to their economic value. These genomes are often quite complex, for example harboring many duplicated genes, and are the main source of genetic innovation and often the result of recurrent polyploidization. Deciphering these complex genome structures and linking duplicated genes to specific polyploidization events are important for understanding the biology and evolution of plants. However, efforts have been hampered by the complexity of analyzing these genomes. Here, we analyzed 29 well-assembled and up-to-date Gramineae genome sequences by hierarchically relating duplicated genes in collinear regions to specific polyploidization or speciation events. We separated duplicated genes produced by each event, established lists of paralogous and orthologous genes, and ultimately constructed an online database, GGDB (http://www.grassgenome.com/). Homologous gene lists from each plant and between plants can be displayed, searched, and downloaded from the database. Interactive comparison tools are deployed to demonstrate homology among user-selected plants and to draw genome-scale or local alignment figures and gene-based phylogenetic trees corrected by exploiting gene collinearity. Using these tools and figures, users can easily detect structural changes in genomes and explore the effects of paleo-polyploidy on crop genome structure and function. The GGDB will provide a useful platform for improving our understanding of genome changes and functional innovation in Gramineae plants.
Collapse
Affiliation(s)
- Qihang Yang
- School of Life Science, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Center for Genomics and Bio-computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Tao Liu
- School of Life Science, North China University of Science and Technology, Tangshan, Hebei 063210, China
- College of Sciences, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Tong Wu
- School of Life Science, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Center for Genomics and Bio-computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Tianyu Lei
- School of Life Science, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Center for Genomics and Bio-computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | - Yuxian Li
- School of Life Science, North China University of Science and Technology, Tangshan, Hebei 063210, China
- Center for Genomics and Bio-computing, North China University of Science and Technology, Tangshan, Hebei 063210, China
| | | |
Collapse
|
21
|
Ruggiero E, Lavezzo E, Grazioli M, Zanin I, Marušič M, Plavec J, Richter SN, Toppo S. Human Virus Genomes Are Enriched in Conserved Adenine/Thymine/Uracil Multiple Tracts That Pause Polymerase Progression. Front Microbiol 2022; 13:915069. [PMID: 35722311 PMCID: PMC9198555 DOI: 10.3389/fmicb.2022.915069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 05/02/2022] [Indexed: 11/13/2022] Open
Abstract
The DNA secondary structures that deviate from the classic Watson and Crick base pairing are increasingly being reported to form transiently in the cell and regulate specific cellular mechanisms. Human viruses are cell parasites that have evolved mechanisms shared with the host cell to support their own replication and spreading. Contrary to human host cells, viruses display a diverse array of nucleic acid types, which include DNA or RNA in single-stranded or double-stranded conformations. This heterogeneity improves the possible occurrence of non-canonical nucleic acid structures. We have previously shown that human virus genomes are enriched in G-rich sequences that fold in four-stranded nucleic acid secondary structures, the G-quadruplexes.Here, by extensive bioinformatics analysis on all available genomes, we showed that human viruses are enriched in highly conserved multiple A (and T or U) tracts, with such an array that they could in principle form quadruplex structures. By circular dichroism, NMR, and Taq polymerase stop assays, we proved that, while A/T/U-quadruplexes do not form, these tracts still display biological significance, as they invariably trigger polymerase pausing within two bases from the A/T/U tract. “A” bases display the strongest effect. Most of the identified A-tracts are in the coding strand, both at the DNA and RNA levels, suggesting their possible relevance during viral translation. This study expands on the presence and mechanism of nucleic acid secondary structures in human viruses and provides a new direction for antiviral research.
Collapse
Affiliation(s)
| | - Enrico Lavezzo
- Department of Molecular Medicine, University of Padua, Padua, Italy
| | - Marco Grazioli
- Department of Molecular Medicine, University of Padua, Padua, Italy
| | - Irene Zanin
- Department of Molecular Medicine, University of Padua, Padua, Italy
| | - Maja Marušič
- Slovenian NMR Centre, National Institute of Chemistry, Ljubljana, Slovenia
| | - Janez Plavec
- Slovenian NMR Centre, National Institute of Chemistry, Ljubljana, Slovenia
| | - Sara N Richter
- Department of Molecular Medicine, University of Padua, Padua, Italy
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padua, Padua, Italy.,CRIBI Biotechnology Center, University of Padua, Padua, Italy
| |
Collapse
|
22
|
Samanta MK, Gayen S, Harris C, Maclary E, Murata-Nakamura Y, Malcore RM, Porter RS, Garay PM, Vallianatos CN, Samollow PB, Iwase S, Kalantry S. Activation of Xist by an evolutionarily conserved function of KDM5C demethylase. Nat Commun 2022; 13:2602. [PMID: 35545632 PMCID: PMC9095838 DOI: 10.1038/s41467-022-30352-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 04/26/2022] [Indexed: 12/03/2022] Open
Abstract
XX female and XY male therian mammals equalize X-linked gene expression through the mitotically-stable transcriptional inactivation of one of the two X chromosomes in female somatic cells. Here, we describe an essential function of the X-linked homolog of an ancestral X-Y gene pair, Kdm5c-Kdm5d, in the expression of Xist lncRNA, which is required for stable X-inactivation. Ablation of Kdm5c function in females results in a significant reduction in Xist RNA expression. Kdm5c encodes a demethylase that enhances Xist expression by converting histone H3K4me2/3 modifications into H3K4me1. Ectopic expression of mouse and human KDM5C, but not the Y-linked homolog KDM5D, induces Xist in male mouse embryonic stem cells (mESCs). Similarly, marsupial (opossum) Kdm5c but not Kdm5d also upregulates Xist in male mESCs, despite marsupials lacking Xist, suggesting that the KDM5C function that activates Xist in eutherians is strongly conserved and predates the divergence of eutherian and metatherian mammals. In support, prototherian (platypus) Kdm5c also induces Xist in male mESCs. Together, our data suggest that eutherian mammals co-opted the ancestral demethylase KDM5C during sex chromosome evolution to upregulate Xist for the female-specific induction of X-inactivation.
Collapse
Affiliation(s)
- Milan Kumar Samanta
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
| | - Srimonta Gayen
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
- Department of Molecular Reproduction, Development and Genetics, Indian Institute of Science, Bangalore, Karnataka, 560012, India
| | - Clair Harris
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
| | - Emily Maclary
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
- Department of Biology, University of Utah, Salt Lake City, UT, 84112, USA
| | - Yumie Murata-Nakamura
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
| | - Rebecca M Malcore
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
| | - Robert S Porter
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
| | - Patricia M Garay
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
- Neuroscience Graduate Program, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
| | - Christina N Vallianatos
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
| | - Paul B Samollow
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, 77843-4458, USA
| | - Shigeki Iwase
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
- Neuroscience Graduate Program, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA
| | - Sundeep Kalantry
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, 48109-5618, USA.
| |
Collapse
|
23
|
Lukačišin M, Espinosa-Cantú A, Bollenbach T. Intron-mediated induction of phenotypic heterogeneity. Nature 2022; 605:113-118. [PMID: 35444278 PMCID: PMC9068511 DOI: 10.1038/s41586-022-04633-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Accepted: 03/11/2022] [Indexed: 11/12/2022]
Abstract
Intragenic regions that are removed during maturation of the RNA transcript-introns-are universally present in the nuclear genomes of eukaryotes1. The budding yeast, an otherwise intron-poor species, preserves two sets of ribosomal protein genes that differ primarily in their introns2,3. Although studies have shed light on the role of ribosomal protein introns under stress and starvation4-6, understanding the contribution of introns to ribosome regulation remains challenging. Here, by combining isogrowth profiling7 with single-cell protein measurements8, we show that introns can mediate inducible phenotypic heterogeneity that confers a clear fitness advantage. Osmotic stress leads to bimodal expression of the small ribosomal subunit protein Rps22B, which is mediated by an intron in the 5' untranslated region of its transcript. The two resulting yeast subpopulations differ in their ability to cope with starvation. Low levels of Rps22B protein result in prolonged survival under sustained starvation, whereas high levels of Rps22B enable cells to grow faster after transient starvation. Furthermore, yeasts growing at high concentrations of sugar, similar to those in ripe grapes, exhibit bimodal expression of Rps22B when approaching the stationary phase. Differential intron-mediated regulation of ribosomal protein genes thus provides a way to diversify the population when starvation threatens in natural environments. Our findings reveal a role for introns in inducing phenotypic heterogeneity in changing environments, and suggest that duplicated ribosomal protein genes in yeast contribute to resolving the evolutionary conflict between precise expression control and environmental responsiveness9.
Collapse
Affiliation(s)
- Martin Lukačišin
- Institute for Biological Physics, University of Cologne, Cologne, Germany
- IST Austria, Klosterneuburg, Austria
- Faculty of Medicine, Technion - Israel Institute of Technology, Haifa, Israel
| | | | - Tobias Bollenbach
- Institute for Biological Physics, University of Cologne, Cologne, Germany.
- Center for Data and Simulation Science, University of Cologne, Cologne, Germany.
| |
Collapse
|
24
|
Isolation and Characterization of a Novel Autographiviridae Phage and Its Combined Effect with Tigecycline in Controlling Multidrug-Resistant Acinetobacter baumannii-Associated Skin and Soft Tissue Infections. Viruses 2022; 14:v14020194. [PMID: 35215788 PMCID: PMC8878389 DOI: 10.3390/v14020194] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/14/2022] [Accepted: 01/17/2022] [Indexed: 12/24/2022] Open
Abstract
Multidrug-resistant Acinetobacter baumannii (MDR A. baumannii) is one of the ESKAPE pathogens that restricts available treatment options. MDR A. baumannii is responsible for a dramatic increase in case numbers of a wide variety of infections, including skin and soft tissue infections (SSTIs), resulting in pyoderma, surgical debridement, and necrotizing fasciitis. To investigate an alternative medical treatment for SSTIs, a broad range lytic Acinetobacter phage, vB _AbP_ABWU2101 (phage vABWU2101), for lysing MDR A. baumannii in associated SSTIs was isolated and the biological aspects of this phage were investigated. Morphological characterization and genomic analysis revealed that phage vABWU2101 was a new species in the Friunavirus, Beijerinckvirinae, family Autographiviridae, and order Caudovirales. Antibiofilm activity of phage vABWU2101 demonstrated good activity against both preformed biofilms and biofilm formation. The combination of phage vABWU2101 and tigecycline showed synergistic antimicrobial activities against planktonic and biofilm cells. Scanning electron microscopy confirmed that the antibacterial efficacy of the combination of phage vABWU2101 and tigecycline was more effective than the phage or antibiotic alone. Hence, our findings could potentially be used to develop a therapeutic option for the treatment of SSTIs caused by MDR A. baumannii.
Collapse
|
25
|
Tourasse NJ, Darfeuille F. T1TAdb: the database of type I toxin-antitoxin systems. RNA (NEW YORK, N.Y.) 2021; 27:1471-1481. [PMID: 34531327 PMCID: PMC8594479 DOI: 10.1261/rna.078802.121] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 09/03/2021] [Indexed: 05/11/2023]
Abstract
Type I toxin-antitoxin (T1TA) systems constitute a large class of genetic modules with antisense RNA (asRNA)-mediated regulation of gene expression. They are widespread in bacteria and consist of an mRNA coding for a toxic protein and a noncoding asRNA that acts as an antitoxin preventing the synthesis of the toxin by directly base-pairing to its cognate mRNA. The co- and post-transcriptional regulation of T1TA systems is intimately linked to RNA sequence and structure, therefore it is essential to have an accurate annotation of the mRNA and asRNA molecules to understand this regulation. However, most T1TA systems have been identified by means of bioinformatic analyses solely based on the toxin protein sequences, and there is no central repository of information on their specific RNA features. Here we present the first database dedicated to type I TA systems, named T1TAdb. It is an open-access web database (https://d-lab.arna.cnrs.fr/t1tadb) with a collection of ∼1900 loci in ∼500 bacterial strains in which a toxin-coding sequence has been previously identified. RNA molecules were annotated with a bioinformatic procedure based on key determinants of the mRNA structure and the genetic organization of the T1TA loci. Besides RNA and protein secondary structure predictions, T1TAdb also identifies promoter, ribosome-binding, and mRNA-asRNA interaction sites. It also includes tools for comparative analysis, such as sequence similarity search and computation of structural multiple alignments, which are annotated with covariation information. To our knowledge, T1TAdb represents the largest collection of features, sequences, and structural annotations on this class of genetic modules.
Collapse
Affiliation(s)
- Nicolas J Tourasse
- University of Bordeaux, CNRS, INSERM, ARNA, UMR 5320, U1212, F-33000 Bordeaux, France
| | - Fabien Darfeuille
- University of Bordeaux, CNRS, INSERM, ARNA, UMR 5320, U1212, F-33000 Bordeaux, France
| |
Collapse
|
26
|
Martin EJ, Meagher TR, Barker D. Using sound to understand protein sequence data: new sonification algorithms for protein sequences and multiple sequence alignments. BMC Bioinformatics 2021; 22:456. [PMID: 34556048 PMCID: PMC8459479 DOI: 10.1186/s12859-021-04362-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Accepted: 08/23/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The use of sound to represent sequence data-sonification-has great potential as an alternative and complement to visual representation, exploiting features of human psychoacoustic intuitions to convey nuance more effectively. We have created five parameter-mapping sonification algorithms that aim to improve knowledge discovery from protein sequences and small protein multiple sequence alignments. For two of these algorithms, we investigated their effectiveness at conveying information. To do this we focussed on subjective assessments of user experience. This entailed a focus group session and survey research by questionnaire of individuals engaged in bioinformatics research. RESULTS For single protein sequences, the success of our sonifications for conveying features was supported by both the survey and focus group findings. For protein multiple sequence alignments, there was limited evidence that the sonifications successfully conveyed information. Additional work is required to identify effective algorithms to render multiple sequence alignment sonification useful to researchers. Feedback from both our survey and focus groups suggests future directions for sonification of multiple alignments: animated visualisation indicating the column in the multiple alignment as the sonification progresses, user control of sequence navigation, and customisation of the sound parameters. CONCLUSIONS Sonification approaches undertaken in this work have shown some success in conveying information from protein sequence data. Feedback points out future directions to build on the sonification approaches outlined in this paper. The effectiveness assessment process implemented in this work proved useful, giving detailed feedback and key approaches for improvement based on end-user input. The uptake of similar user experience focussed effectiveness assessments could also help with other areas of bioinformatics, for example in visualisation.
Collapse
Affiliation(s)
- Edward J. Martin
- School of Informatics, Informatics Forum, University of Edinburgh, 10 Crichton Street, Edinburgh, EH8 9AB UK
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Charlotte Auerbach Road, The King’s Buildings, Edinburgh, EH9 3FL UK
| | - Thomas R. Meagher
- Centre for Biological Diversity, School of Biology, University of St Andrews, Sir Harold Mitchell Building, Greenside Place, St Andrews, KY16 9TH UK
| | - Daniel Barker
- Institute of Evolutionary Biology, School of Biological Sciences, University of Edinburgh, Charlotte Auerbach Road, The King’s Buildings, Edinburgh, EH9 3FL UK
| |
Collapse
|
27
|
Torun FM, Bilgin HI, Kaplan OI. MSABrowser: dynamic and fast visualization of sequence alignments, variations and annotations. BIOINFORMATICS ADVANCES 2021; 1:vbab009. [PMID: 36700112 PMCID: PMC9710668 DOI: 10.1093/bioadv/vbab009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/25/2021] [Indexed: 01/28/2023]
Abstract
Summary Sequence alignment is an excellent way to visualize the similarities and differences between DNA, RNA or protein sequences, yet it is currently difficult to jointly view sequence alignment data with genetic variations, modifications such as post-translational modifications and annotations (i.e. protein domains). Here, we present the MSABrowser tool that makes it easy to co-visualize genetic variations, modifications and annotations on the respective positions of amino acids or nucleotides in pairwise or multiple sequence alignments. MSABrowser is developed entirely in JavaScript and works on any modern web browser at any platform, including Linux, Mac OS X and Windows systems without any installation. MSABrowser is also freely available for the benefit of the scientific community. Availability and implementation MSABrowser is released as open-source and web-based software under MIT License. The visualizer, documentation, all source codes and examples are available at https://thekaplanlab.github.io/ and GitHub repository https://github.com/thekaplanlab/msabrowser. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Furkan M Torun
- Rare Disease Laboratory, School of Life and Natural Sciences, Abdullah Gul University, Kayseri 38080, Turkey
| | - Halil I Bilgin
- Department of Computer Engineering, Abdullah Gul University, Kayseri 38080, Turkey
| | - Oktay I Kaplan
- Rare Disease Laboratory, School of Life and Natural Sciences, Abdullah Gul University, Kayseri 38080, Turkey,To whom correspondence should be addressed.
| |
Collapse
|
28
|
Keseler IM, Gama-Castro S, Mackie A, Billington R, Bonavides-Martínez C, Caspi R, Kothari A, Krummenacker M, Midford PE, Muñiz-Rascado L, Ong WK, Paley S, Santos-Zavaleta A, Subhraveti P, Tierrafría VH, Wolfe AJ, Collado-Vides J, Paulsen IT, Karp PD. The EcoCyc Database in 2021. Front Microbiol 2021; 12:711077. [PMID: 34394059 PMCID: PMC8357350 DOI: 10.3389/fmicb.2021.711077] [Citation(s) in RCA: 149] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Accepted: 07/02/2021] [Indexed: 11/13/2022] Open
Abstract
The EcoCyc model-organism database collects and summarizes experimental data for Escherichia coli K-12. EcoCyc is regularly updated by the manual curation of individual database entries, such as genes, proteins, and metabolic pathways, and by the programmatic addition of results from select high-throughput analyses. Updates to the Pathway Tools software that supports EcoCyc and to the web interface that enables user access have continuously improved its usability and expanded its functionality. This article highlights recent improvements to the curated data in the areas of metabolism, transport, DNA repair, and regulation of gene expression. New and revised data analysis and visualization tools include an interactive metabolic network explorer, a circular genome viewer, and various improvements to the speed and usability of existing tools.
Collapse
Affiliation(s)
- Ingrid M. Keseler
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA, United States
| | - Socorro Gama-Castro
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Amanda Mackie
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, Australia
| | - Richard Billington
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA, United States
| | | | - Ron Caspi
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA, United States
| | - Anamika Kothari
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA, United States
| | - Markus Krummenacker
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA, United States
| | - Peter E. Midford
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA, United States
| | - Luis Muñiz-Rascado
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Wai Kit Ong
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA, United States
| | - Suzanne Paley
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA, United States
| | - Alberto Santos-Zavaleta
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México
- Instituto de Energías Renovables, Universidad Nacional Autónoma de México, Temixco, México
| | - Pallavi Subhraveti
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA, United States
| | - Víctor H. Tierrafría
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Alan J. Wolfe
- Department of Microbiology and Immunology, Stritch School of Medicine, Loyola University Chicago, Maywood, IL, United States
| | - Julio Collado-Vides
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Cuernavaca, México
- Department of Biomedical Engineering, Boston University, Boston, MA, United States
| | - Ian T. Paulsen
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, Australia
| | - Peter D. Karp
- Bioinformatics Research Group, Artificial Intelligence Center, SRI International, Menlo Park, CA, United States
| |
Collapse
|
29
|
Lemoine F, Blassel L, Voznica J, Gascuel O. COVID-Align: accurate online alignment of hCoV-19 genomes using a profile HMM. Bioinformatics 2021; 37:1761-1762. [PMID: 33045068 PMCID: PMC7745650 DOI: 10.1093/bioinformatics/btaa871] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2020] [Revised: 05/23/2020] [Accepted: 09/24/2020] [Indexed: 11/12/2022] Open
Abstract
Motivation The first cases of the COVID-19 pandemic emerged in December 2019. Until the end of February 2020, the number of available genomes was below 1000 and their multiple alignment was easily achieved using standard approaches. Subsequently, the availability of genomes has grown dramatically. Moreover, some genomes are of low quality with sequencing/assembly errors, making accurate re-alignment of all genomes nearly impossible on a daily basis. A more efficient, yet accurate approach was clearly required to pursue all subsequent bioinformatics analyses of this crucial data. Results hCoV-19 genomes are highly conserved, with very few indels and no recombination. This makes the profile HMM approach particularly well suited to align new genomes, add them to an existing alignment and filter problematic ones. Using a core of ∼2500 high quality genomes, we estimated a profile using HMMER, and implemented this profile in COVID-Align, a user-friendly interface to be used online or as standalone via Docker. The alignment of 1000 genomes requires ∼50 minutes on our cluster. Moreover, COVID-Align provides summary statistics, which can be used to determine the sequencing quality and evolutionary novelty of input genomes (e.g. number of new mutations and indels). Availability and implementation https://covalign.pasteur.cloud, hub.docker.com/r/evolbioinfo/covid-align. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Frédéric Lemoine
- Unité de Bioinformatique Evolutive, USR 3756 (DBC/C3BI), Institut Pasteur & CNRS, 75015 - Paris, France.,Hub de Bioinformatique et Biostatistique, USR 3756 (DBC/C3BI), Institut Pasteur & CNRS, 75015 - Paris, France
| | - Luc Blassel
- Unité de Bioinformatique Evolutive, USR 3756 (DBC/C3BI), Institut Pasteur & CNRS, 75015 - Paris, France.,ED515, Sorbonne Université, Collège Doctoral, 75006 - Paris, France
| | - Jakub Voznica
- Unité de Bioinformatique Evolutive, USR 3756 (DBC/C3BI), Institut Pasteur & CNRS, 75015 - Paris, France.,Université de Paris, 75006 Paris, France
| | - Olivier Gascuel
- Unité de Bioinformatique Evolutive, USR 3756 (DBC/C3BI), Institut Pasteur & CNRS, 75015 - Paris, France.,Académie des Sciences, USR 3756, CNRS, 75015 - Paris, France
| |
Collapse
|
30
|
Penev PI, McCann HM, Meade CD, Alvarez-Carreño C, Maddala A, Bernier CR, Chivukula VL, Ahmad M, Gulen B, Sharma A, Williams LD, Petrov AS. ProteoVision: web server for advanced visualization of ribosomal proteins. Nucleic Acids Res 2021; 49:W578-W588. [PMID: 33999189 PMCID: PMC8265156 DOI: 10.1093/nar/gkab351] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 04/11/2021] [Accepted: 04/21/2021] [Indexed: 11/26/2022] Open
Abstract
ProteoVision is a web server designed to explore protein structure and evolution through simultaneous visualization of multiple sequence alignments, topology diagrams and 3D structures. Starting with a multiple sequence alignment, ProteoVision computes conservation scores and a variety of physicochemical properties and simultaneously maps and visualizes alignments and other data on multiple levels of representation. The web server calculates and displays frequencies of amino acids. ProteoVision is optimized for ribosomal proteins but is applicable to analysis of any protein. ProteoVision handles internally generated and user uploaded alignments and connects them with a selected structure, found in the PDB or uploaded by the user. It can generate de novo topology diagrams from three-dimensional structures. All displayed data is interactive and can be saved in various formats as publication quality images or external datasets or PyMol Scripts. ProteoVision enables detailed study of protein fragments defined by Evolutionary Classification of protein Domains (ECOD) classification. ProteoVision is available at http://proteovision.chemistry.gatech.edu/.
Collapse
Affiliation(s)
- Petar I Penev
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA.,School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Holly M McCann
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Caeden D Meade
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Claudia Alvarez-Carreño
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Aparna Maddala
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Chad R Bernier
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Vasanta L Chivukula
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA.,School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Maria Ahmad
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Burak Gulen
- School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Aakash Sharma
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Loren Dean Williams
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA.,School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Anton S Petrov
- NASA Center for the Origin of Life, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA.,School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
31
|
Tohma K, Lepore CJ, Martinez M, Degiuseppe JI, Khamrin P, Saito M, Mayta H, Nwaba AUA, Ford-Siltz LA, Green KY, Galeano ME, Zimic M, Stupka JA, Gilman RH, Maneekarn N, Ushijima H, Parra GI. Genome-wide analyses of human noroviruses provide insights on evolutionary dynamics and evidence of coexisting viral populations evolving under recombination constraints. PLoS Pathog 2021; 17:e1009744. [PMID: 34255807 PMCID: PMC8318288 DOI: 10.1371/journal.ppat.1009744] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 07/28/2021] [Accepted: 06/23/2021] [Indexed: 12/14/2022] Open
Abstract
Norovirus is a major cause of acute gastroenteritis worldwide. Over 30 different genotypes, mostly from genogroup I (GI) and II (GII), have been shown to infect humans. Despite three decades of genome sequencing, our understanding of the role of genomic diversification across continents and time is incomplete. To close the spatiotemporal gap of genomic information of human noroviruses, we conducted a large-scale genome-wide analyses that included the nearly full-length sequencing of 281 archival viruses circulating since the 1970s in over 10 countries from four continents, with a major emphasis on norovirus genotypes that are currently underrepresented in public genome databases. We provided new genome information for 24 distinct genotypes, including the oldest genome information from 12 norovirus genotypes. Analyses of this new genomic information, together with those publicly available, showed that (i) noroviruses evolve at similar rates across genomic regions and genotypes; (ii) emerging viruses evolved from transiently-circulating intermediate viruses; (iii) diversifying selection on the VP1 protein was recorded in genotypes with multiple variants; (iv) non-structural proteins showed a similar branching on their phylogenetic trees; and (v) contrary to the current understanding, there are restrictions on the ability to recombine different genomic regions, which results in co-circulating populations of viruses evolving independently in human communities. This study provides a comprehensive genetic analysis of diverse norovirus genotypes and the role of non-structural proteins on viral diversification, shedding new light on the mechanisms of norovirus evolution and transmission. Norovirus is a highly diverse enteric pathogen. The large genomic database accumulated in the last three decades advanced our understanding of norovirus diversity; however, this information is limited by geographical bias, sporadic times of collection, and missing or incomplete genome sequences. In this multinational collaborative study, we mined archival samples collected since the 1970s and sequenced nearly full-length new genomes from 281 historical noroviruses, including the first full-length genomic sequences for three genotypes. Using this novel dataset, we found evidence for restrictions in the recombination of genetically disparate viruses and that diversifying selection results in new variants with different epidemiological profiles. These new insights on the diversification of noroviruses could provide baseline information for the study of future epidemics and ultimately the prevention of norovirus infections.
Collapse
Affiliation(s)
- Kentaro Tohma
- Division of Viral Products, CBER, FDA, Silver Spring, Maryland, United States of America
| | - Cara J. Lepore
- Division of Viral Products, CBER, FDA, Silver Spring, Maryland, United States of America
| | - Magaly Martinez
- Division of Viral Products, CBER, FDA, Silver Spring, Maryland, United States of America
- IICS, National University of Asuncion, Asuncion, Paraguay
| | | | - Pattara Khamrin
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
| | - Mayuko Saito
- Department of Virology, Tohoku University Graduate School of Medicine, Sendai, Japan
| | - Holger Mayta
- Department of Cellular and Molecular Sciences, Faculty of Sciences, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Amy U. Amanda Nwaba
- Division of Viral Products, CBER, FDA, Silver Spring, Maryland, United States of America
| | - Lauren A. Ford-Siltz
- Division of Viral Products, CBER, FDA, Silver Spring, Maryland, United States of America
| | - Kim Y. Green
- Laboratory of Infectious Diseases, NIAID, NIH, Bethesda, Maryland, United States of America
| | | | - Mirko Zimic
- Department of Cellular and Molecular Sciences, Faculty of Sciences, Universidad Peruana Cayetano Heredia, Lima, Peru
| | | | - Robert H. Gilman
- Department of International Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland, United States of America
| | - Niwat Maneekarn
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
| | - Hiroshi Ushijima
- Division of Microbiology, Department of Pathology and Microbiology, Nihon University School of Medicine, Tokyo, Japan
| | - Gabriel I. Parra
- Division of Viral Products, CBER, FDA, Silver Spring, Maryland, United States of America
- * E-mail:
| |
Collapse
|
32
|
Staton M, Cannon E, Sanderson LA, Wegrzyn J, Anderson T, Buehler S, Cobo-Simón I, Faaberg K, Grau E, Guignon V, Gunoskey J, Inderski B, Jung S, Lager K, Main D, Poelchau M, Ramnath R, Richter P, West J, Ficklin S. Tripal, a community update after 10 years of supporting open source, standards-based genetic, genomic and breeding databases. Brief Bioinform 2021; 22:6318561. [PMID: 34251419 PMCID: PMC8574961 DOI: 10.1093/bib/bbab238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 05/28/2021] [Accepted: 06/01/2021] [Indexed: 12/01/2022] Open
Abstract
Online, open access databases for biological knowledge serve as central repositories for research communities to store, find and analyze integrated, multi-disciplinary datasets. With increasing volumes, complexity and the need to integrate genomic, transcriptomic, metabolomic, proteomic, phenomic and environmental data, community databases face tremendous challenges in ongoing maintenance, expansion and upgrades. A common infrastructure framework using community standards shared by many databases can reduce development burden, provide interoperability, ensure use of common standards and support long-term sustainability. Tripal is a mature, open source platform built to meet this need. With ongoing improvement since its first release in 2009, Tripal provides full functionality for searching, browsing, loading and curating numerous types of data and is a primary technology powering at least 31 publicly available databases spanning plants, animals and human data, primarily storing genomics, genetics and breeding data. Tripal software development is managed by a shared, inclusive governance structure including both project management and advisory teams. Here, we report on the most important and innovative aspects of Tripal after 11 years development, including integration of diverse types of biological data, successful collaborative projects across member databases, and support for implementing FAIR principles.
Collapse
Affiliation(s)
| | - Ethalinda Cannon
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA USA
| | | | | | | | | | | | - Kay Faaberg
- USDA-ARS, National Animal Disease Center, Ames, IA, USA
| | - Emily Grau
- University of Connecticut, Storrs, CT USA
| | | | | | | | - Sook Jung
- Washington State University, Pullman, WA USA
| | - Kelly Lager
- USDA-ARS, National Animal Disease Center, Ames, IA, USA
| | - Dorrie Main
- Washington State University, Pullman, WA USA
| | - Monica Poelchau
- USDA-ARS, National Agricultural Library, Beltsville, MD, USA
| | | | | | - Joe West
- University of Tennessee, Knoxville, TN USA
| | | |
Collapse
|
33
|
Erdős G, Pajkos M, Dosztányi Z. IUPred3: prediction of protein disorder enhanced with unambiguous experimental annotation and visualization of evolutionary conservation. Nucleic Acids Res 2021; 49:W297-W303. [PMID: 34048569 PMCID: PMC8262696 DOI: 10.1093/nar/gkab408] [Citation(s) in RCA: 317] [Impact Index Per Article: 79.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/21/2021] [Accepted: 05/14/2021] [Indexed: 12/22/2022] Open
Abstract
Intrinsically disordered proteins and protein regions (IDPs/IDRs) exist without a single well-defined conformation. They carry out important biological functions with multifaceted roles which is also reflected in their evolutionary behavior. Computational methods play important roles in the characterization of IDRs. One of the commonly used disorder prediction methods is IUPred, which relies on an energy estimation approach. The IUPred web server takes an amino acid sequence or a Uniprot ID/accession as an input and predicts the tendency for each amino acid to be in a disordered region with an option to also predict context-dependent disordered regions. In this new iteration of IUPred, we added multiple novel features to enhance the prediction capabilities of the server. First, learning from the latest evaluation of disorder prediction methods we introduced multiple new smoothing functions to the prediction that decreases noise and increases the performance of the predictions. We constructed a dataset consisting of experimentally verified ordered/disordered regions with unambiguous annotations which were added to the prediction. We also introduced a novel tool that enables the exploration of the evolutionary conservation of protein disorder coupled to sequence conservation in model organisms. The web server is freely available to users and accessible at https://iupred3.elte.hu.
Collapse
Affiliation(s)
- Gábor Erdős
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Mátyás Pajkos
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, Eötvös Loránd University, Pázmány Péter stny 1/c, Budapest H-1117, Hungary
| |
Collapse
|
34
|
Anantharajah A, Helaers R, Defour JP, Olive N, Kabera F, Croonen L, Deldime F, Vaerman JL, Barbée C, Bodéus M, Scohy A, Verroken A, Rodriguez-Villalobos H, Kabamba-Mukadi B. How to choose the right real-time RT-PCR primer sets for the SARS-CoV-2 genome detection? J Virol Methods 2021; 295:114197. [PMID: 34033854 PMCID: PMC8141720 DOI: 10.1016/j.jviromet.2021.114197] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 05/12/2021] [Accepted: 05/21/2021] [Indexed: 01/10/2023]
Abstract
OBJECTIVES The SARS-CoV-2 pandemic has created an unprecedented need for rapid large-scale diagnostic testing to prompt clinical and public health interventions. Currently, several quantitative reverse-transcription polymerase chain reaction (RT-qPCR) assays recommended by the World Health Organization are being used by clinical and public health laboratories and typically target regions of the RNA-dependent RNA polymerase (RdRp), envelope (E) and nucleocapsid (N) coding region. However, it is currently unclear if results from different tests are comparable. This study aimed to clarify the clinical performances of the primer/probe sets designed by US CDC and Charité/Berlin to help clinical laboratories in assay selection for SARS-CoV-2 routine detection. METHODS We compared the clinical performances of the recommended primer/probe sets using one hundred nasopharyngeal swab specimens from patients who were clinically diagnosed with COVID-19. An additional 30 "pre-intervention screening" samples from patients who were not suspected of COVID-19 were also included in this study. We also performed sequence alignment between 31064 European SARS-CoV-2 and variants of concern genomes and the recommended primer/probe sets. RESULTS The present study demonstrates substantial differences in SARS-CoV-2 RNA detection sensitivity among the primer/probe sets recommended by the World Health Organization especially for low-level viral loads. The alignment of thousands of SARS-CoV-2 sequences reveals that the genetic diversity remains relatively low at the primer/probe binding sites. However, multiple nucleotide mismatches might contribute to false negatives. CONCLUSION An understanding of the limitations depending on the targeted genes and primer/probe sets may influence the selection of molecular detection assays by clinical laboratories.
Collapse
Affiliation(s)
- Ahalieyah Anantharajah
- Department of Microbiology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium; Department of Molecular Biology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium.
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, Université catholique de Louvain, Brussels, Belgium
| | - Jean-Philippe Defour
- Department of Hematology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium; Ludwig Institute for Cancer Research & de Duve Institute, Université catholique de Louvain, Brussels, Belgium
| | - Nathalie Olive
- Department of Molecular Biology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium
| | - Florence Kabera
- Department of Molecular Biology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium
| | - Luc Croonen
- Department of Molecular Biology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium
| | - Françoise Deldime
- Department of Molecular Biology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium
| | - Jean-Luc Vaerman
- Department of Molecular Biology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium
| | - Cindy Barbée
- Department of Molecular Biology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium
| | - Monique Bodéus
- Department of Microbiology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium
| | - Anais Scohy
- Department of Microbiology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium
| | - Alexia Verroken
- Department of Microbiology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium
| | - Hector Rodriguez-Villalobos
- Department of Microbiology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium
| | - Benoît Kabamba-Mukadi
- Department of Microbiology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium; Department of Molecular Biology, Cliniques universitaires Saint-Luc, Université catholique de Louvain, Brussels, Belgium
| |
Collapse
|
35
|
Theofanopoulou C, Gedman G, Cahill JA, Boeckx C, Jarvis ED. Universal nomenclature for oxytocin-vasotocin ligand and receptor families. Nature 2021; 592:747-755. [PMID: 33911268 PMCID: PMC8081664 DOI: 10.1038/s41586-020-03040-7] [Citation(s) in RCA: 71] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 05/29/2020] [Indexed: 02/02/2023]
Abstract
Oxytocin (OXT; hereafter OT) and arginine vasopressin or vasotocin (AVP or VT; hereafter VT) are neurotransmitter ligands that function through specific receptors to control diverse functions1,2. Here we performed genomic analyses on 35 species that span all major vertebrate lineages, including newly generated high-contiguity assemblies from the Vertebrate Genomes Project3,4. Our findings support the claim5 that OT (also known as OXT) and VT (also known as AVP) are adjacent paralogous genes that have resulted from a local duplication, which we infer was through DNA transposable elements near the origin of vertebrates and in which VT retained more of the parental sequence. We identified six major oxytocin-vasotocin receptors among vertebrates. We propose that all six of these receptors arose from a single receptor that was shared with the common ancestor of invertebrates, through a combination of whole-genome and large segmental duplications. We propose a universal nomenclature based on evolutionary relationships for the genes that encode these receptors, in which the genes are given the same orthologous names across vertebrates and paralogous names relative to each other. This nomenclature avoids confusion due to differential naming in the pre-genomic era and incomplete genome assemblies, furthers our understanding of the evolution of these genes, aids in the translation of findings across species and serves as a model for other gene families.
Collapse
Affiliation(s)
- Constantina Theofanopoulou
- Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY, USA. .,Section of General Linguistics, University of Barcelona, Barcelona, Spain. .,University of Barcelona Institute for Complex Systems, Barcelona, Spain.
| | - Gregory Gedman
- Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY, USA
| | - James A Cahill
- Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY, USA
| | - Cedric Boeckx
- Section of General Linguistics, University of Barcelona, Barcelona, Spain.,University of Barcelona Institute for Complex Systems, Barcelona, Spain.,ICREA, Barcelona, Spain
| | - Erich D Jarvis
- Laboratory of Neurogenetics of Language, Rockefeller University, New York, NY, USA. .,Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
36
|
Vornhagen J, Bassis CM, Ramakrishnan S, Hein R, Mason S, Bergman Y, Sunshine N, Fan Y, Holmes CL, Timp W, Schatz MC, Young VB, Simner PJ, Bachman MA. A plasmid locus associated with Klebsiella clinical infections encodes a microbiome-dependent gut fitness factor. PLoS Pathog 2021; 17:e1009537. [PMID: 33930099 PMCID: PMC8115787 DOI: 10.1371/journal.ppat.1009537] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 05/12/2021] [Accepted: 04/07/2021] [Indexed: 02/07/2023] Open
Abstract
Klebsiella pneumoniae (Kp) is an important cause of healthcare-associated infections, which increases patient morbidity, mortality, and hospitalization costs. Gut colonization by Kp is consistently associated with subsequent Kp disease, and patients are predominantly infected with their colonizing strain. Our previous comparative genomics study, between disease-causing and asymptomatically colonizing Kp isolates, identified a plasmid-encoded tellurite (TeO3-2)-resistance (ter) operon as strongly associated with infection. However, TeO3-2 is extremely rare and toxic to humans. Thus, we used a multidisciplinary approach to determine the biological link between ter and Kp infection. First, we used a genomic and bioinformatic approach to extensively characterize Kp plasmids encoding the ter locus. These plasmids displayed substantial variation in plasmid incompatibility type and gene content. Moreover, the ter operon was genetically independent of other plasmid-encoded virulence and antibiotic resistance loci, both in our original patient cohort and in a large set (n = 88) of publicly available ter operon-encoding Kp plasmids, indicating that the ter operon is likely playing a direct, but yet undescribed role in Kp disease. Next, we employed multiple mouse models of infection and colonization to show that 1) the ter operon is dispensable during bacteremia, 2) the ter operon enhances fitness in the gut, 3) this phenotype is dependent on the colony of origin of mice, and 4) antibiotic disruption of the gut microbiota eliminates the requirement for ter. Furthermore, using 16S rRNA gene sequencing, we show that the ter operon enhances Kp fitness in the gut in the presence of specific indigenous microbiota, including those predicted to produce short chain fatty acids. Finally, administration of exogenous short-chain fatty acids in our mouse model of colonization was sufficient to reduce fitness of a ter mutant. These findings indicate that the ter operon, strongly associated with human infection, encodes factors that resist stress induced by the indigenous gut microbiota during colonization. This work represents a substantial advancement in our molecular understanding of Kp pathogenesis and gut colonization, directly relevant to Kp disease in healthcare settings.
Collapse
Affiliation(s)
- Jay Vornhagen
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States of America
- Department of Microbiology & Immunology, University of Michigan, Ann Arbor, MI, United States of America
| | - Christine M. Bassis
- Department of Internal Medicine/Infectious Diseases Division, University of Michigan, Ann Arbor, MI, United States of America
| | - Srividya Ramakrishnan
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, United States of America
| | - Robert Hein
- Department of Internal Medicine/Infectious Diseases Division, University of Michigan, Ann Arbor, MI, United States of America
| | - Sophia Mason
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States of America
| | - Yehudit Bergman
- Division of Medical Microbiology, Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Nicole Sunshine
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States of America
| | - Yunfan Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America
| | - Caitlyn L. Holmes
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States of America
- Department of Microbiology & Immunology, University of Michigan, Ann Arbor, MI, United States of America
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, United States of America
- Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
- Department of Medicine, Division of Infectious Disease, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, United States of America
- Department of Biology, Johns Hopkins University, Baltimore, MD, United States of America
- Simons Center for Quantitative Biology, Cold Spring Harbor, NY, United States of America
| | - Vincent B. Young
- Department of Microbiology & Immunology, University of Michigan, Ann Arbor, MI, United States of America
- Department of Internal Medicine/Infectious Diseases Division, University of Michigan, Ann Arbor, MI, United States of America
| | - Patricia J. Simner
- Division of Medical Microbiology, Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, United States of America
| | - Michael A. Bachman
- Department of Pathology, University of Michigan, Ann Arbor, MI, United States of America
- Department of Microbiology & Immunology, University of Michigan, Ann Arbor, MI, United States of America
| |
Collapse
|
37
|
Krebs FS, Zoete V, Trottet M, Pouchon T, Bovigny C, Michielin O. Swiss-PO: a new tool to analyze the impact of mutations on protein three-dimensional structures for precision oncology. NPJ Precis Oncol 2021; 5:19. [PMID: 33737716 PMCID: PMC7973488 DOI: 10.1038/s41698-021-00156-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 02/04/2021] [Indexed: 12/12/2022] Open
Abstract
Swiss-PO is a new web tool to map gene mutations on the 3D structure of corresponding proteins and to intuitively assess the structural implications of protein variants for precision oncology. Swiss-PO is constructed around a manually curated database of 3D structures, variant annotations, and sequence alignments, for a list of 50 genes taken from the Ion AmpliSeqTM Custom Cancer Hotspot Panel. The website was designed to guide users in the choice of the most appropriate structure to analyze regarding the mutated residue, the role of the protein domain it belongs to, or the drug that could be selected to treat the patient. The importance of the mutated residue for the structure and activity of the protein can be assessed based on the molecular interactions exchanged with neighbor residues in 3D within the same protein or between different biomacromolecules, its conservation in orthologs, or the known effect of reported mutations in its 3D or sequence-based vicinity. Swiss-PO is available free of charge or login at https://www.swiss-po.ch .
Collapse
Affiliation(s)
- Fanny S Krebs
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland
| | - Vincent Zoete
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland.
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
| | - Maxence Trottet
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Timothée Pouchon
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Christophe Bovigny
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Olivier Michielin
- Computer-Aided Molecular Engineering, Department of Oncology, Ludwig Institute for Cancer Research Lausanne Branch, University of Lausanne, Lausanne, Switzerland.
- Molecular Modelling Group, Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland.
- Department of Oncology, Ludwig Institute for Cancer Research, University Hospital of Lausanne, Lausanne, Switzerland.
| |
Collapse
|
38
|
Torres MDT, Cao J, Franco OL, Lu TK, de la Fuente-Nunez C. Synthetic Biology and Computer-Based Frameworks for Antimicrobial Peptide Discovery. ACS NANO 2021; 15:2143-2164. [PMID: 33538585 PMCID: PMC8734659 DOI: 10.1021/acsnano.0c09509] [Citation(s) in RCA: 69] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Antibiotic resistance is one of the greatest challenges of our time. This global health problem originated from a paucity of truly effective antibiotic classes and an increased incidence of multi-drug-resistant bacterial isolates in hospitals worldwide. Indeed, it has been recently estimated that 10 million people will die annually from drug-resistant infections by the year 2050. Therefore, the need to develop out-of-the-box strategies to combat antibiotic resistance is urgent. The biological world has provided natural templates, called antimicrobial peptides (AMPs), which exhibit multiple intrinsic medical properties including the targeting of bacteria. AMPs can be used as scaffolds and, via engineering, can be reconfigured for optimized potency and targetability toward drug-resistant pathogens. Here, we review the recent development of tools for the discovery, design, and production of AMPs and propose that the future of peptide drug discovery will involve the convergence of computational and synthetic biology principles.
Collapse
Affiliation(s)
- Marcelo D T Torres
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| | - Jicong Cao
- Synthetic Biology Group, MIT Synthetic Biology Center, Department of Biological Engineering and Electrical Engineering and Computer Science, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Octavio L Franco
- Centro de Análises Proteômicas e Bioquímicas, Universidade Católica de Brasília, Brasília, DF 70790160, Brazil
- S-inova Biotech, Universidade Católica Dom Bosco, Campo Grande, MS 79117010, Brazil
| | - Timothy K Lu
- Synthetic Biology Group, MIT Synthetic Biology Center, Department of Biological Engineering and Electrical Engineering and Computer Science, Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
39
|
Blum M, Chang HY, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, Richardson L, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A, Finn RD. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res 2021; 49:D344-D354. [PMID: 33156333 PMCID: PMC7778928 DOI: 10.1093/nar/gkaa977] [Citation(s) in RCA: 1351] [Impact Index Per Article: 337.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/08/2020] [Accepted: 10/23/2020] [Indexed: 01/22/2023] Open
Abstract
The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.
Collapse
Affiliation(s)
- Matthias Blum
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Hsin-Yu Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Sara Chuguransky
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Swaathi Kandasaamy
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Alex Mitchell
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gift Nuka
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Matloob Qureshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Lorna Richardson
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Lowri Williams
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Peer Bork
- European Molecular Biology Laboratory, Structural and Computational Biology Unit, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Alan Bridge
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Julian Gough
- Medical Research Council Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Ave, Trumpington, Cambridge CB2 0QH, UK
| | - Daniel H Haft
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda MD 20894 USA
| | - Ivica Letunic
- Biobyte Solutions GmbH, Bothestr 142, 69126 Heidelberg, Germany
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda MD 20894 USA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Darren A Natale
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Marco Necci
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Christine A Orengo
- Department of Structural and Molecular Biology, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK
| | - Arun P Pandurangan
- Medical Research Council Laboratory of Molecular Biology, Cambridge Biomedical Campus, Francis Crick Ave, Trumpington, Cambridge CB2 0QH, UK
| | - Catherine Rivoire
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Christian J A Sigrist
- Swiss-Prot Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel Servet, CH-1211, Geneva 4, Switzerland
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, University College London, Gower St, Bloomsbury, London WC1E 6BT, UK
| | - Narmada Thanki
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda MD 20894 USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, via U. Bassi 58/b, 35131 Padua, Italy
| | - Cathy H Wu
- Protein Information Resource, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
40
|
Valentin G, Abdel T, Gaëtan D, Jean-François D, Matthieu C, Mathieu R. GreenPhylDB v5: a comparative pangenomic database for plant genomes. Nucleic Acids Res 2021; 49:D1464-D1471. [PMID: 33237299 PMCID: PMC7779052 DOI: 10.1093/nar/gkaa1068] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 10/19/2020] [Accepted: 10/21/2020] [Indexed: 12/28/2022] Open
Abstract
Comparative genomics is the analysis of genomic relationships among different species and serves as a significant base for evolutionary and functional genomic studies. GreenPhylDB (https://www.greenphyl.org) is a database designed to facilitate the exploration of gene families and homologous relationships among plant genomes, including staple crops critically important for global food security. GreenPhylDB is available since 2007, after the release of the Arabidopsis thaliana and Oryza sativa genomes and has undergone multiple releases. With the number of plant genomes currently available, it becomes challenging to select a single reference for comparative genomics studies but there is still a lack of databases taking advantage several genomes by species for orthology detection. GreenPhylDBv5 introduces the concept of comparative pangenomics by harnessing multiple genome sequences by species. We created 19 pangenes and processed them with other species still relying on one genome. In total, 46 plant species were considered to build gene families and predict their homologous relationships through phylogenetic-based analyses. In addition, since the previous publication, we rejuvenated the website and included a new set of original tools including protein-domain combination, tree topologies searches and a section for users to store their own results in order to support community curation efforts.
Collapse
Affiliation(s)
- Guignon Valentin
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier France
| | - Toure Abdel
- Syngenta Seeds SAS, 31790 Saint-Sauveur France
| | - Droc Gaëtan
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier France
- AGAP, Univ de Montpellier, CIRAD, INRAE, Montpellier SupAgro, F-34398 Montpellier, France
- CIRAD, UMR AGAP, F-34398 Montpellier, France
| | - Dufayard Jean-François
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier France
- AGAP, Univ de Montpellier, CIRAD, INRAE, Montpellier SupAgro, F-34398 Montpellier, France
- CIRAD, UMR AGAP, F-34398 Montpellier, France
| | | | - Rouard Mathieu
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier, France
- French Institute of Bioinformatics (IFB)—South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier France
| |
Collapse
|
41
|
Procter JB, Carstairs GM, Soares B, Mourão K, Ofoegbu TC, Barton D, Lui L, Menard A, Sherstnev N, Roldan-Martinez D, Duce S, Martin DMA, Barton GJ. Alignment of Biological Sequences with Jalview. Methods Mol Biol 2021; 2231:203-224. [PMID: 33289895 PMCID: PMC7116599 DOI: 10.1007/978-1-0716-1036-7_13] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In this chapter, we introduce core functionality of the Jalview interactive platform for the creation, analysis, and publication of multiple sequence alignments. A workflow is described based on Jalview's core functions: from data import to figure generation, including import of alignment reliability scores from T-Coffee and use of Jalview from the command line. The accompanying notes provide background information on the underlying methods and discuss additional options for working with Jalview to perform multiple sequence alignment, functional site analysis, and publication of alignments on the web.
Collapse
Affiliation(s)
| | | | - Ben Soares
- University of Dundee, Dundee, Scotland, UK
| | - Kira Mourão
- University of Dundee, Dundee, Scotland, UK
- Synpromics Ltd., Edinburgh, Scotland, UK
| | | | - Daniel Barton
- University of Dundee, Dundee, Scotland, UK
- Institute of Physics, Chinese Academy of Sciences, Beijing, China
| | - Lauren Lui
- University of Dundee, Dundee, Scotland, UK
- UC Santa Cruz, Santa Cruz, CA, USA
| | | | - Natasha Sherstnev
- University of Dundee, Dundee, Scotland, UK
- U. Paris Sud, Orsay, France
| | | | | | | | | |
Collapse
|
42
|
On the Identity of Species of Oreobates (Anura: Craugastoridae) from Central South America, with the Description of a New Species from Bolivia. J HERPETOL 2020. [DOI: 10.1670/20-001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
43
|
Iqbal S, Hoksza D, Pérez-Palma E, May P, Jespersen JB, Ahmed SS, Rifat ZT, Heyne HO, Rahman MS, Cottrell JR, Wagner FF, Daly MJ, Campbell AJ, Lal D. MISCAST: MIssense variant to protein StruCture Analysis web SuiTe. Nucleic Acids Res 2020; 48:W132-W139. [PMID: 32402084 PMCID: PMC7319582 DOI: 10.1093/nar/gkaa361] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 04/17/2020] [Accepted: 05/11/2020] [Indexed: 12/19/2022] Open
Abstract
Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like 'Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?', or 'Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?' are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community.
Collapse
Affiliation(s)
- Sumaiya Iqbal
- Center for Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - David Hoksza
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.,Department of Software Engineering, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| | - Eduardo Pérez-Palma
- Genomic Medicine Institute, Lerner Research Institute Cleveland Clinic, Cleveland, OH 44195, USA
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Jakob B Jespersen
- Department of Bio and Health Informatics, Technical University of Denmark, Lyngby, Denmark
| | - Shehab S Ahmed
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, ECE Building, West Palashi, Dhaka-1205, Bangladesh
| | - Zaara T Rifat
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, ECE Building, West Palashi, Dhaka-1205, Bangladesh
| | - Henrike O Heyne
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, 00100 Helsinki, Finland
| | - M Sohel Rahman
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, ECE Building, West Palashi, Dhaka-1205, Bangladesh
| | - Jeffrey R Cottrell
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Florence F Wagner
- Center for Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Mark J Daly
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, 00100 Helsinki, Finland
| | - Arthur J Campbell
- Center for Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Dennis Lal
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Genomic Medicine Institute, Lerner Research Institute Cleveland Clinic, Cleveland, OH 44195, USA.,Cologne Center for Genomics, University of Cologne, Cologne, Germany.,Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH 44195, USA
| |
Collapse
|
44
|
Coulton A, Edwards KJ. AutoCloner: automatic homologue-specific primer design for full-gene cloning in polyploids. BMC Bioinformatics 2020; 21:311. [PMID: 32677889 PMCID: PMC7364506 DOI: 10.1186/s12859-020-03601-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 06/11/2020] [Indexed: 12/02/2022] Open
Abstract
Background Polyploid organisms such as wheat complicate even the simplest of procedures in molecular biology. Whilst knowledge of genomic sequences in crops is increasing rapidly, the scientific community is still a long way from producing a full pan-genome for every species. Polymerase chain reaction and Sanger sequencing therefore remain widely used as methods for characterizing gene sequences in many varieties of crops. High sequence similarity between genomes in polyploids means that if primers are not homeologue-specific via the incorporation of a SNP at the 3’ tail, sequences other than the target sequence will also be amplified. Current consensus for gene cloning in wheat is to manually perform many steps in a long bioinformatics pipeline. Results Here we present AutoCloner (www.autocloner.com), a fully automated pipeline for crop gene cloning that includes a free-to-use web interface for users. AutoCloner takes a sequence of interest from the user and performs a basic local alignment search tool (BLAST) search against the genome assembly for their particular polyploid crop. Homologous sequences are then compiled with the input sequence into a multiple sequence alignment which is mined for single-nucleotide polymorphisms (SNPs). Various combinations of potential primers that cover the entire gene of interest are then created and evaluated by Primer3; the set of primers with the highest score, as well as all possible primers at every SNP location, are then returned to the user for polymerase chain reaction (PCR). We have successfully used AutoCloner to clone various genes of interest in the Apogee wheat variety, which has no current genome sequence. In addition, we have successfully run the pipeline on ~ 80,000 high-confidence gene models from a wheat genome assembly. Conclusion AutoCloner is the first tool to fully-automate primer design for gene cloning in polyploids, where previously the consensus within the wheat community was to perform this process manually. The web interface for AutoCloner provides a simple and effective polyploid primer-design method for gene cloning, with no need for researchers to download software or input any other details other than their sequence of interest.
Collapse
Affiliation(s)
- Alexander Coulton
- Biological Sciences Department, The University of Bristol, 24 Tyndall Avenue, Bristol, BS8 1TQ, UK.
| | - Keith J Edwards
- Biological Sciences Department, The University of Bristol, 24 Tyndall Avenue, Bristol, BS8 1TQ, UK
| |
Collapse
|
45
|
Bouyssié D, Lesne J, Locard-Paulet M, Albigot R, Burlet-Schiltz O, Marcoux J. HDX-Viewer: interactive 3D visualization of hydrogen-deuterium exchange data. Bioinformatics 2020; 35:5331-5333. [PMID: 31287496 PMCID: PMC6954641 DOI: 10.1093/bioinformatics/btz550] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 07/04/2019] [Accepted: 07/08/2019] [Indexed: 11/26/2022] Open
Abstract
Summary With the advent of fully automated sample preparation robots for Hydrogen–Deuterium eXchange coupled to Mass Spectrometry (HDX-MS), this method has become paramount for ligand binding or epitope mapping screening, both in academic research and biopharmaceutical industries. However, bridging the gap between commercial HDX-MS software (for raw data interpretation) and molecular viewers (to map experiment results onto a 3D structure for biological interpretation) remains laborious and requires simple but sometimes limiting coding skills. We solved this bottleneck by developing HDX-Viewer, an open-source web-based application that facilitates and quickens HDX-MS data analysis. This user-friendly application automatically incorporates HDX-MS data from a custom template or commercial HDX-MS software in PDB files, and uploads them to an online 3D molecular viewer, thereby facilitating their visualization and biological interpretation. Availability and implementation The HDX-Viewer web application is released under the CeCILL (http://www.cecill.info) and GNU LGPL licenses and can be found at https://masstools.ipbs.fr/hdx-viewer. The source code is available at https://github.com/david-bouyssie/hdx-viewer.
Collapse
Affiliation(s)
- David Bouyssié
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Jean Lesne
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Marie Locard-Paulet
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Renaud Albigot
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Odile Burlet-Schiltz
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| | - Julien Marcoux
- Institut de Pharmacologie et de Biologie Structurale, Université de Toulouse, CNRS, UPS, Toulouse, France
| |
Collapse
|
46
|
Palaniappan K, Chen IMA, Chu K, Ratner A, Seshadri R, Kyrpides NC, Ivanova NN, Mouncey NJ. IMG-ABC v.5.0: an update to the IMG/Atlas of Biosynthetic Gene Clusters Knowledgebase. Nucleic Acids Res 2020; 48:D422-D430. [PMID: 31665416 DOI: 10.1093/nar/gkz932] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 10/02/2019] [Accepted: 10/09/2019] [Indexed: 01/14/2023] Open
Abstract
Microbial secondary metabolism is a reservoir of bioactive compounds of immense biotechnological and biomedical potential. The biosynthetic machinery responsible for the production of these secondary metabolites (SMs) (also called natural products) is often encoded by collocated groups of genes called biosynthetic gene clusters (BGCs). High-throughput genome sequencing of both isolates and metagenomic samples combined with the development of specialized computational workflows is enabling systematic identification of BGCs and the discovery of novel SMs. In order to advance exploration of microbial secondary metabolism and its diversity, we developed the largest publicly available database of predicted BGCs combined with experimentally verified BGCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc-public). Here we describe the first major content update of the IMG-ABC knowledgebase, since its initial release in 2015, refreshing the BGC prediction pipeline with the latest version of antiSMASH (v5) as well as presenting the data in the context of underlying environmental metadata sourced from GOLD (https://gold.jgi.doe.gov/). This update has greatly improved the quality and expanded the types of predicted BGCs compared to the previous version.
Collapse
Affiliation(s)
- Krishnaveni Palaniappan
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - I-Min A Chen
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Ken Chu
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Anna Ratner
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Rekha Seshadri
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Nikos C Kyrpides
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Natalia N Ivanova
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Nigel J Mouncey
- Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
47
|
Lemoine F, Correia D, Lefort V, Doppelt-Azeroual O, Mareuil F, Cohen-Boulakia S, Gascuel O. NGPhylogeny.fr: new generation phylogenetic services for non-specialists. Nucleic Acids Res 2020; 47:W260-W265. [PMID: 31028399 PMCID: PMC6602494 DOI: 10.1093/nar/gkz303] [Citation(s) in RCA: 465] [Impact Index Per Article: 93.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 04/05/2019] [Accepted: 04/17/2019] [Indexed: 11/14/2022] Open
Abstract
Phylogeny.fr, created in 2008, has been designed to facilitate the execution of phylogenetic workflows, and is nowadays widely used. However, since its development, user needs have evolved, new tools and workflows have been published, and the number of jobs has increased dramatically, thus promoting new practices, which motivated its refactoring. We developed NGPhylogeny.fr to be more flexible in terms of tools and workflows, easily installable, and more scalable. It integrates numerous tools in their latest version (e.g. TNT, FastME, MrBayes, etc.) as well as new ones designed in the last ten years (e.g. PhyML, SMS, FastTree, trimAl, BOOSTER, etc.). These tools cover a large range of usage (sequence searching, multiple sequence alignment, model selection, tree inference and tree drawing) and a large panel of standard methods (distance, parsimony, maximum likelihood and Bayesian). They are integrated in workflows, which have been already configured ('One click'), can be customized ('Advanced'), or are built from scratch ('A la carte'). Workflows are managed and run by an underlying Galaxy workflow system, which makes workflows more scalable in terms of number of jobs and size of data. NGPhylogeny.fr is deployable on any server or personal computer, and is freely accessible at https://ngphylogeny.fr.
Collapse
Affiliation(s)
- Frédéric Lemoine
- Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France.,Hub Bioinformatique et Biostatistique, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - Damien Correia
- Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France.,Méthodes et Algorithmes pour la Bioinformatique, LIRMM UMR 5506, Université de Montpellier & CNRS, Montpellier, France.,Laboratoire de Recherche en Informatique, Université Paris-Sud, CNRS UMR 8623, Université Paris-Saclay, Orsay, France
| | - Vincent Lefort
- Méthodes et Algorithmes pour la Bioinformatique, LIRMM UMR 5506, Université de Montpellier & CNRS, Montpellier, France
| | - Olivia Doppelt-Azeroual
- Hub Bioinformatique et Biostatistique, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - Fabien Mareuil
- Hub Bioinformatique et Biostatistique, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France
| | - Sarah Cohen-Boulakia
- Laboratoire de Recherche en Informatique, Université Paris-Sud, CNRS UMR 8623, Université Paris-Saclay, Orsay, France
| | - Olivier Gascuel
- Unité Bioinformatique Evolutive, C3BI USR 3756, Institut Pasteur & CNRS, Paris, France.,Méthodes et Algorithmes pour la Bioinformatique, LIRMM UMR 5506, Université de Montpellier & CNRS, Montpellier, France
| |
Collapse
|
48
|
Kunzmann P, Mayer BE, Hamacher K. Substitution matrix based color schemes for sequence alignment visualization. BMC Bioinformatics 2020; 21:209. [PMID: 32448181 PMCID: PMC7245768 DOI: 10.1186/s12859-020-3526-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Accepted: 04/30/2020] [Indexed: 11/20/2022] Open
Abstract
Background Visualization of multiple sequence alignments often includes colored symbols, usually characters encoding amino acids, according to some (physical) properties, such as hydrophobicity or charge. Typically, color schemes are created manually, so that equal or similar colors are assigned to amino acids that share similar properties. However, this assessment is subjective and may not represent the similarity of symbols very well. Results In this article we propose a different approach for color scheme creation: We leverage the similarity information of a substitution matrix to derive an appropriate color scheme. Similar colors are assigned to high scoring pairs of symbols, distant colors are assigned to low scoring pairs. In order to find these optimal points in color space a simulated annealing algorithm is employed. Conclusions Using the substitution matrix as basis for a color scheme is consistent with the alignment, which itself is based on the very substitution matrix. This approach allows fully automatic generation of new color schemes, even for special purposes which have not been covered, yet, including schemes for structural alphabets or schemes that are adapted for people with color vision deficiency.
Collapse
Affiliation(s)
- Patrick Kunzmann
- Department of Computational Biology and Simulation, TU Darmstadt, Schnittspahnstraße 2, Darmstadt, 64287, Germany.
| | - Benjamin E Mayer
- Department of Computational Biology and Simulation, TU Darmstadt, Schnittspahnstraße 2, Darmstadt, 64287, Germany
| | - Kay Hamacher
- Department of Computational Biology and Simulation, TU Darmstadt, Schnittspahnstraße 2, Darmstadt, 64287, Germany
| |
Collapse
|
49
|
Zhu Z, Guan Z, Liu G, Wang Y, Zhang Z. SGID: a comprehensive and interactive database of the silkworm. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5677404. [PMID: 31836898 PMCID: PMC6911161 DOI: 10.1093/database/baz134] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 10/27/2019] [Accepted: 11/01/2019] [Indexed: 11/12/2022]
Abstract
Although the domestic silkworm (Bombyx mori) is an important model and economic animal, there is a lack of comprehensive database for this organism. Here, we developed the silkworm genome informatics database (SGID). It aims to bring together all silkworm-related biological data and provide an interactive platform for gene inquiry and analysis. The function annotation in SGID is thorough and covers 98% of the silkworm genes. The annotation details include function description, Gene Ontology, Kyoto Encyclopedia of Genes and Genomes pathway, subcellular location, transmembrane topology, protein secondary/tertiary structure, homologous group and transcription factor. SGID provides genome-scale visualization of population genetics test results based on high-depth resequencing data of 158 silkworm samples. It also provides interactive analysis tools of transcriptomic and epigenomic data from 79 NCBI BioProjects. SGID will be extremely useful to silkworm research in the future.
Collapse
Affiliation(s)
- Zhenglin Zhu
- School of Life Sciences, Chongqing University, No.55 Daxuecheng South Rd., Shapingba, Chongqing, 401331, China
| | - Zhufen Guan
- School of Life Sciences, Chongqing University, No.55 Daxuecheng South Rd., Shapingba, Chongqing, 401331, China
| | - Gexin Liu
- School of Life Sciences, Chongqing University, No.55 Daxuecheng South Rd., Shapingba, Chongqing, 401331, China
| | - Yawang Wang
- School of Life Sciences, Chongqing University, No.55 Daxuecheng South Rd., Shapingba, Chongqing, 401331, China.,Khoury College of Computer Sciences, Northeastern University, 401 Terry Ave N, Seattle, WA, 98109, USA
| | - Ze Zhang
- School of Life Sciences, Chongqing University, No.55 Daxuecheng South Rd., Shapingba, Chongqing, 401331, China
| |
Collapse
|
50
|
Dagan-Wiener A, Di Pizio A, Nissim I, Bahia MS, Dubovski N, Margulis E, Niv MY. BitterDB: taste ligands and receptors database in 2019. Nucleic Acids Res 2020; 47:D1179-D1185. [PMID: 30357384 PMCID: PMC6323989 DOI: 10.1093/nar/gky974] [Citation(s) in RCA: 149] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 10/09/2018] [Indexed: 01/22/2023] Open
Abstract
BitterDB (http://bitterdb.agri.huji.ac.il) was introduced in 2012 as a central resource for information on bitter-tasting molecules and their receptors. The information in BitterDB is frequently used for choosing suitable ligands for experimental studies, for developing bitterness predictors, for analysis of receptors promiscuity and more. Here, we describe a major upgrade of the database, including significant increase in content as well as new features. BitterDB now holds over 1000 bitter molecules, up from the initial 550. When available, quantitative sensory data on bitterness intensity as well as toxicity information were added. For 270 molecules, at least one associated bitter taste receptor (T2R) is reported. The overall number of ligand-T2R associations is now close to 800. BitterDB was extended to several species: in addition to human, it now holds information on mouse, cat and chicken T2Rs, and the compounds that activate them. BitterDB now provides a unique platform for structure-based studies with high-quality homology models, known ligands, and for the human receptors also data from mutagenesis experiments, information on frequently occurring single nucleotide polymorphisms and links to expression levels in different tissues.
Collapse
Affiliation(s)
- Ayana Dagan-Wiener
- The Institute of Biochemistry, Food and Nutrition, The Robert H Smith Faculty of Agriculture, Food and Environment, The Hebrew University, 76100 Rehovot, Israel.,The Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem 91904, Israel
| | - Antonella Di Pizio
- The Institute of Biochemistry, Food and Nutrition, The Robert H Smith Faculty of Agriculture, Food and Environment, The Hebrew University, 76100 Rehovot, Israel.,The Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem 91904, Israel
| | - Ido Nissim
- The Institute of Biochemistry, Food and Nutrition, The Robert H Smith Faculty of Agriculture, Food and Environment, The Hebrew University, 76100 Rehovot, Israel.,The Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem 91904, Israel
| | - Malkeet S Bahia
- The Institute of Biochemistry, Food and Nutrition, The Robert H Smith Faculty of Agriculture, Food and Environment, The Hebrew University, 76100 Rehovot, Israel.,The Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem 91904, Israel
| | - Nitzan Dubovski
- The Institute of Biochemistry, Food and Nutrition, The Robert H Smith Faculty of Agriculture, Food and Environment, The Hebrew University, 76100 Rehovot, Israel.,The Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem 91904, Israel
| | - Eitan Margulis
- The Institute of Biochemistry, Food and Nutrition, The Robert H Smith Faculty of Agriculture, Food and Environment, The Hebrew University, 76100 Rehovot, Israel.,The Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem 91904, Israel
| | - Masha Y Niv
- The Institute of Biochemistry, Food and Nutrition, The Robert H Smith Faculty of Agriculture, Food and Environment, The Hebrew University, 76100 Rehovot, Israel.,The Fritz Haber Center for Molecular Dynamics, The Hebrew University, Jerusalem 91904, Israel
| |
Collapse
|