1
|
Jebastin T, Syed Abuthakir M, Santhoshi I, Gnanaraj M, Gatasheh MK, Ahamed A, Sharmila V. Unveiling the mysteries: Functional insights into hypothetical proteins from Bacteroides fragilis 638R. Heliyon 2024; 10:e31713. [PMID: 38832264 PMCID: PMC11145332 DOI: 10.1016/j.heliyon.2024.e31713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 05/21/2024] [Accepted: 05/21/2024] [Indexed: 06/05/2024] Open
Abstract
Humans benefit from a vast community of microorganisms in their gastrointestinal tract, known as the gut microbiota, numbering in the tens of trillions. An imbalance in the gut microbiota known as dysbiosis, can lead to changes in the metabolite profile, elevating the levels of toxins like Bacteroides fragilis toxin (BFT), colibactin, and cytolethal distending toxin. These toxins are implicated in the process of oncogenesis. However, a significant portion of the Bacteroides fragilis genome consists of functionally uncharacterized and hypothetical proteins. This study delves into the functional characterization of hypothetical proteins (HPs) encoded by the Bacteroides fragilis genome, employing a systematic in silico approach. A total of 379 HPs were subjected to a BlastP homology search against the NCBI non-redundant protein sequence database, resulting in 162 HPs devoid of identity to known proteins. CDD-Blast identified 106 HPs with functional domains, which were then annotated using Pfam, InterPro, SUPERFAMILY, SCANPROSITE, SMART, and CATH. Physicochemical properties, such as molecular weight, isoelectric point, and stability indices, were assessed for 60 HPs whose functional domains were identified by at least three of the aforementioned bioinformatic tools. Subsequently, subcellular localization analysis was examined and the gene ontology analysis revealed diverse biological processes, cellular components, and molecular functions. Remarkably, E1WPR3 was identified as a virulent and essential gene among the HPs. This study presents a comprehensive exploration of B. fragilis HPs, shedding light on their potential roles and contributing to a deeper understanding of this organism's functional landscape.
Collapse
Affiliation(s)
- Thomas Jebastin
- Computer Aided Drug Designing Lab, Department of Bioinformatics, Bishop Heber College (Autonomous), Tiruchirappalli, 620017, Tamil Nadu, India
| | - M.H. Syed Abuthakir
- Department of Bioinformatics, Bharathiar University, Coimbatore, 641046, Tamil Nadu, India
- Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600, UKM Bangi, Selangor, Malaysia
| | - Ilangovan Santhoshi
- Computer Aided Drug Designing Lab, Department of Bioinformatics, Bishop Heber College (Autonomous), Tiruchirappalli, 620017, Tamil Nadu, India
| | - Muniraj Gnanaraj
- Department of Biotechnology, School of Life Sciences, St Joseph's University, 36 Lalbagh Road, Bengaluru, 560027, Karnataka, India
| | - Mansour K. Gatasheh
- Department of Biochemistry, College of Science, King Saud University, P.O. Box 2455, Riyadh, 11451, Saudi Arabia
| | - Anis Ahamed
- Department of Botany and Microbiology, College of Science, King Saud University, Saudi Arabia
| | - Velusamy Sharmila
- Department of Biotechnology, Nehru Arts and Science College (NASC), Thirumalayampalayam, Coimbatore, 641 105, Tamil Nadu, India
| |
Collapse
|
2
|
Zhang X, Liu M, Li Z, Zhuo L, Fu X, Zou Q. Fusion of multi-source relationships and topology to infer lncRNA-protein interactions. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102187. [PMID: 38706631 PMCID: PMC11066462 DOI: 10.1016/j.omtn.2024.102187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/03/2024] [Indexed: 05/07/2024]
Abstract
Long non-coding RNAs (lncRNAs) are important factors involved in biological regulatory networks. Accurately predicting lncRNA-protein interactions (LPIs) is vital for clarifying lncRNA's functions and pathogenic mechanisms. Existing deep learning models have yet to yield satisfactory results in LPI prediction. Recently, graph autoencoders (GAEs) have seen rapid development, excelling in tasks like link prediction and node classification. We employed GAE technology for LPI prediction, devising the FMSRT-LPI model based on path masking and degree regression strategies and thereby achieving satisfactory outcomes. This represents the first known integration of path masking and degree regression strategies into the GAE framework for potential LPI inference. The effectiveness of our FMSRT-LPI model primarily relies on four key aspects. First, within the GAE framework, our model integrates multi-source relationships of lncRNAs and proteins with LPN's topological data. Second, the implemented masking strategy efficiently identifies LPN's key paths, reconstructs the network, and reduces the impact of redundant or incorrect data. Third, the integrated degree decoder balances degree and structural information, enhancing node representation. Fourth, the PolyLoss function we introduced is more appropriate for LPI prediction tasks. The results on multiple public datasets further demonstrate our model's potential in LPI prediction.
Collapse
Affiliation(s)
- Xinyu Zhang
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Mingzhe Liu
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Zhen Li
- Institute of Computational Science and Technology, Guangzhou University, Guangzhou 510000, China
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou 325027, China
| | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410012, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611730, China
| |
Collapse
|
3
|
Elisée E, Ducrot L, Méheust R, Bastard K, Fossey-Jouenne A, Grogan G, Pelletier E, Petit JL, Stam M, de Berardinis V, Zaparucha A, Vallenet D, Vergne-Vaxelaire C. A refined picture of the native amine dehydrogenase family revealed by extensive biodiversity screening. Nat Commun 2024; 15:4933. [PMID: 38858403 PMCID: PMC11164908 DOI: 10.1038/s41467-024-49009-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 05/20/2024] [Indexed: 06/12/2024] Open
Abstract
Native amine dehydrogenases offer sustainable access to chiral amines, so the search for scaffolds capable of converting more diverse carbonyl compounds is required to reach the full potential of this alternative to conventional synthetic reductive aminations. Here we report a multidisciplinary strategy combining bioinformatics, chemoinformatics and biocatalysis to extensively screen billions of sequences in silico and to efficiently find native amine dehydrogenases features using computational approaches. In this way, we achieve a comprehensive overview of the initial native amine dehydrogenase family, extending it from 2,011 to 17,959 sequences, and identify native amine dehydrogenases with non-reported substrate spectra, including hindered carbonyls and ethyl ketones, and accepting methylamine and cyclopropylamine as amine donor. We also present preliminary model-based structural information to inform the design of potential (R)-selective amine dehydrogenases, as native amine dehydrogenases are mostly (S)-selective. This integrated strategy paves the way for expanding the resource of other enzyme families and in highlighting enzymes with original features.
Collapse
Affiliation(s)
- Eddy Elisée
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Laurine Ducrot
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Raphaël Méheust
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Karine Bastard
- School of Pharmacy, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, 2006, Australia
| | - Aurélie Fossey-Jouenne
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Gideon Grogan
- York Structural Biology Laboratory, Department of Chemistry, University of York, Heslington, York, YO10 5DD, UK
| | - Eric Pelletier
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Jean-Louis Petit
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Mark Stam
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Véronique de Berardinis
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Anne Zaparucha
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - David Vallenet
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
| | - Carine Vergne-Vaxelaire
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
| |
Collapse
|
4
|
Joglekar A, Hu W, Zhang B, Narykov O, Diekhans M, Marrocco J, Balacco J, Ndhlovu LC, Milner TA, Fedrigo O, Jarvis ED, Sheynkman G, Korkin D, Ross ME, Tilgner HU. Single-cell long-read sequencing-based mapping reveals specialized splicing patterns in developing and adult mouse and human brain. Nat Neurosci 2024; 27:1051-1063. [PMID: 38594596 PMCID: PMC11156538 DOI: 10.1038/s41593-024-01616-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2023] [Accepted: 03/07/2024] [Indexed: 04/11/2024]
Abstract
RNA isoforms influence cell identity and function. However, a comprehensive brain isoform map was lacking. We analyze single-cell RNA isoforms across brain regions, cell subtypes, developmental time points and species. For 72% of genes, full-length isoform expression varies along one or more axes. Splicing, transcription start and polyadenylation sites vary strongly between cell types, influence protein architecture and associate with disease-linked variation. Additionally, neurotransmitter transport and synapse turnover genes harbor cell-type variability across anatomical regions. Regulation of cell-type-specific splicing is pronounced in the postnatal day 21-to-postnatal day 28 adolescent transition. Developmental isoform regulation is stronger than regional regulation for the same cell type. Cell-type-specific isoform regulation in mice is mostly maintained in the human hippocampus, allowing extrapolation to the human brain. Conversely, the human brain harbors additional cell-type specificity, suggesting gain-of-function isoforms. Together, this detailed single-cell atlas of full-length isoform regulation across development, anatomical regions and species reveals an unappreciated degree of isoform variability across multiple axes.
Collapse
Affiliation(s)
- Anoushka Joglekar
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
- New York Genome Center, New York, NY, USA
| | - Wen Hu
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Bei Zhang
- Spatial Genomics, Inc., Pasadena, CA, USA
| | - Oleksandr Narykov
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Mark Diekhans
- UC Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jordan Marrocco
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Department of Biology, Touro University, New York, NY, USA
- Laboratory of Neuroendocrinology, The Rockefeller University, New York, NY, USA
| | - Jennifer Balacco
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Lishomwa C Ndhlovu
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Department of Medicine, Division of Infectious Diseases, Weill Cornell Medicine, New York, NY, USA
| | - Teresa A Milner
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA, USA
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - M Elizabeth Ross
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Hagen U Tilgner
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA.
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA.
| |
Collapse
|
5
|
Xu X, Yin K, Wu R. Systematic Investigation of the Trafficking of Glycoproteins on the Cell Surface. Mol Cell Proteomics 2024; 23:100761. [PMID: 38593903 PMCID: PMC11087972 DOI: 10.1016/j.mcpro.2024.100761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 03/30/2024] [Accepted: 04/03/2024] [Indexed: 04/11/2024] Open
Abstract
Glycoproteins located on the cell surface play a pivotal role in nearly every extracellular activity. N-glycosylation is one of the most common and important protein modifications in eukaryotic cells, and it often regulates protein folding and trafficking. Glycosylation of cell-surface proteins undergoes meticulous regulation by various enzymes in the endoplasmic reticulum (ER) and the Golgi, ensuring their proper folding and trafficking to the cell surface. However, the impacts of protein N-glycosylation, N-glycan maturity, and protein folding status on the trafficking of cell-surface glycoproteins remain to be explored. In this work, we comprehensively and site-specifically studied the trafficking of cell-surface glycoproteins in human cells. Integrating metabolic labeling, bioorthogonal chemistry, and multiplexed proteomics, we investigated 706 N-glycosylation sites on 396 cell-surface glycoproteins in monocytes, either by inhibiting protein N-glycosylation, disturbing N-glycan maturation, or perturbing protein folding in the ER. The current results reveal their distinct impacts on the trafficking of surface glycoproteins. The inhibition of protein N-glycosylation dramatically suppresses the trafficking of many cell-surface glycoproteins. The N-glycan immaturity has more substantial effects on proteins with high N-glycosylation site densities, while the perturbation of protein folding in the ER exerts a more pronounced impact on surface glycoproteins with larger sizes. Furthermore, for N-glycosylated proteins, their trafficking to the cell surface is related to the secondary structures and adjacent amino acid residues of glycosylation sites. Systematic analysis of surface glycoprotein trafficking advances our understanding of the mechanisms underlying protein secretion and surface presentation.
Collapse
Affiliation(s)
- Xing Xu
- School of Chemistry and Biochemistry and the Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Kejun Yin
- School of Chemistry and Biochemistry and the Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Ronghu Wu
- School of Chemistry and Biochemistry and the Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia, USA.
| |
Collapse
|
6
|
Caetano-Anollés K, Aziz MF, Mughal F, Caetano-Anollés G. On Protein Loops, Prior Molecular States and Common Ancestors of Life. J Mol Evol 2024:10.1007/s00239-024-10167-y. [PMID: 38652291 DOI: 10.1007/s00239-024-10167-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]
Abstract
The principle of continuity demands the existence of prior molecular states and common ancestors responsible for extant macromolecular structure. Here, we focus on the emergence and evolution of loop prototypes - the elemental architects of protein domain structure. Phylogenomic reconstruction spanning superkingdoms and viruses generated an evolutionary chronology of prototypes with six distinct evolutionary phases defining a most parsimonious evolutionary progression of cellular life. Each phase was marked by strategic prototype accumulation shaping the structures and functions of common ancestors. The last universal common ancestor (LUCA) of cells and viruses and the last universal cellular ancestor (LUCellA) defined stem lines that were structurally and functionally complex. The evolutionary saga highlighted transformative forces. LUCA lacked biosynthetic ribosomal machinery, while the pivotal LUCellA lacked essential DNA biosynthesis and modern transcription. Early proteins therefore relied on RNA for genetic information storage but appeared initially decoupled from it, hinting at transformative shifts of genetic processing. Urancestral loop types suggest advanced folding designs were present at an early evolutionary stage. An exploration of loop geometric properties revealed gradual replacement of prototypes with α-helix and β-strand bracing structures over time, paving the way for the dominance of other loop types. AlphFold2-generated atomic models of prototype accretion described patterns of fold emergence. Our findings favor a ‛processual' model of evolving stem lines aligned with Woese's vision of a communal world. This model prompts discussing the 'problem of ancestors' and the challenges that lie ahead for research in taxonomy, evolution and complexity.
Collapse
Affiliation(s)
- Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Callout Biotech, Albuquerque, NM, 87112, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
7
|
Peters DL, Gaudreault F, Chen W. Functional domains of Acinetobacter bacteriophage tail fibers. Front Microbiol 2024; 15:1230997. [PMID: 38690360 PMCID: PMC11058221 DOI: 10.3389/fmicb.2024.1230997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 03/08/2024] [Indexed: 05/02/2024] Open
Abstract
A rapid increase in antimicrobial resistant bacterial infections around the world is causing a global health crisis. The Gram-negative bacterium Acinetobacter baumannii is categorized as a Priority 1 pathogen for research and development of new antimicrobials by the World Health Organization due to its numerous intrinsic antibiotic resistance mechanisms and ability to quickly acquire new resistance determinants. Specialized phage enzymes, called depolymerases, degrade the bacterial capsule polysaccharide layer and show therapeutic potential by sensitizing the bacterium to phages, select antibiotics, and serum killing. The functional domains responsible for the capsule degradation activity are often found in the tail fibers of select A. baumannii phages. To further explore the functional domains associated with depolymerase activity, tail-associated proteins of 71 sequenced and fully characterized phages were identified from published literature and analyzed for functional domains using InterProScan. Multisequence alignments and phylogenetic analyses were conducted on the domain groups and assessed in the context of noted halo formation or depolymerase characterization. Proteins derived from phages noted to have halo formation or a functional depolymerase, but no functional domain hits, were modeled with AlphaFold2 Multimer, and compared to other protein models using the DALI server. The domains associated with depolymerase function were pectin lyase-like (SSF51126), tailspike binding (cd20481), (Trans)glycosidases (SSF51445), and potentially SGNH hydrolases. These findings expand our knowledge on phage depolymerases, enabling researchers to better exploit these enzymes for therapeutic use in combating the antimicrobial resistance crisis.
Collapse
Affiliation(s)
- Danielle L. Peters
- Human Health Therapeutics (HHT) Research Center, National Research Council Canada, Ottawa, ON, Canada
| | | | - Wangxue Chen
- Human Health Therapeutics (HHT) Research Center, National Research Council Canada, Ottawa, ON, Canada
- Department of Biology, Brock University, St. Catharines, ON, Canada
| |
Collapse
|
8
|
Álvarez-Campos P, García-Castro H, Emili E, Pérez-Posada A, Del Olmo I, Peron S, Salamanca-Díaz DA, Mason V, Metzger B, Bely AE, Kenny NJ, Özpolat BD, Solana J. Annelid adult cell type diversity and their pluripotent cellular origins. Nat Commun 2024; 15:3194. [PMID: 38609365 PMCID: PMC11014941 DOI: 10.1038/s41467-024-47401-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 03/27/2024] [Indexed: 04/14/2024] Open
Abstract
Many annelids can regenerate missing body parts or reproduce asexually, generating all cell types in adult stages. However, the putative adult stem cell populations involved in these processes, and the diversity of cell types generated by them, are still unknown. To address this, we recover 75,218 single cell transcriptomes of the highly regenerative and asexually-reproducing annelid Pristina leidyi. Our results uncover a rich cell type diversity including annelid specific types as well as novel types. Moreover, we characterise transcription factors and gene networks that are expressed specifically in these populations. Finally, we uncover a broadly abundant cluster of putative stem cells with a pluripotent signature. This population expresses well-known stem cell markers such as vasa, piwi and nanos homologues, but also shows heterogeneous expression of differentiated cell markers and their transcription factors. We find conserved expression of pluripotency regulators, including multiple chromatin remodelling and epigenetic factors, in piwi+ cells. Finally, lineage reconstruction analyses reveal computational differentiation trajectories from piwi+ cells to diverse adult types. Our data reveal the cell type diversity of adult annelids by single cell transcriptomics and suggest that a piwi+ cell population with a pluripotent stem cell signature is associated with adult cell type differentiation.
Collapse
Affiliation(s)
- Patricia Álvarez-Campos
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK.
- Centro de Investigación en Biodiversidad y Cambio Global (CIBC-UAM) & Departamento de Biología (Zoología), Facultad de Ciencias, Universidad Autónoma de Madrid, Madrid, Spain.
| | - Helena García-Castro
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
- Living Systems Institute, University of Exeter, Exeter, UK
| | - Elena Emili
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
| | - Alberto Pérez-Posada
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
- Living Systems Institute, University of Exeter, Exeter, UK
| | - Irene Del Olmo
- Centro de Investigación en Biodiversidad y Cambio Global (CIBC-UAM) & Departamento de Biología (Zoología), Facultad de Ciencias, Universidad Autónoma de Madrid, Madrid, Spain
| | - Sophie Peron
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
- Living Systems Institute, University of Exeter, Exeter, UK
| | - David A Salamanca-Díaz
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
- Living Systems Institute, University of Exeter, Exeter, UK
| | - Vincent Mason
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
| | - Bria Metzger
- Eugene Bell Center for Regenerative Biology and Tissue Engineering, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA, 05432, USA
- Department of Biology, Washington University in St. Louis. 1 Brookings Dr. Saint Louis, Saint Louis, MO, 63130, USA
| | - Alexandra E Bely
- Department of Biology, University of Maryland, College Park, MD, 20742, USA
| | - Nathan J Kenny
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
- Department of Biochemistry, University of Otago, P.O. Box 56, Dunedin, Aotearoa, New Zealand
| | - B Duygu Özpolat
- Eugene Bell Center for Regenerative Biology and Tissue Engineering, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA, 05432, USA.
- Department of Biology, Washington University in St. Louis. 1 Brookings Dr. Saint Louis, Saint Louis, MO, 63130, USA.
| | - Jordi Solana
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK.
- Living Systems Institute, University of Exeter, Exeter, UK.
| |
Collapse
|
9
|
Frey B, Aiesi M, Rast BM, Rüthi J, Julmi J, Stierli B, Qi W, Brunner I. Searching for new plastic-degrading enzymes from the plastisphere of alpine soils using a metagenomic mining approach. PLoS One 2024; 19:e0300503. [PMID: 38578779 PMCID: PMC10997104 DOI: 10.1371/journal.pone.0300503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 02/28/2024] [Indexed: 04/07/2024] Open
Abstract
Plastic materials, including microplastics, accumulate in all types of ecosystems, even in remote and cold environments such as the European Alps. This pollution poses a risk for the environment and humans and needs to be addressed. Using shotgun DNA metagenomics of soils collected in the eastern Swiss Alps at about 3,000 m a.s.l., we identified genes and their proteins that potentially can degrade plastics. We screened the metagenomes of the plastisphere and the bulk soil with a differential abundance analysis, conducted similarity-based screening with specific databases dedicated to putative plastic-degrading genes, and selected those genes with a high probability of signal peptides for extracellular export and a high confidence for functional domains. This procedure resulted in a final list of nine candidate genes. The lengths of the predicted proteins were between 425 and 845 amino acids, and the predicted genera producing these proteins belonged mainly to Caballeronia and Bradyrhizobium. We applied functional validation, using heterologous expression followed by enzymatic assays of the supernatant. Five of the nine proteins tested showed significantly increased activities when we used an esterase assay, and one of these five proteins from candidate genes, a hydrolase-type esterase, clearly had the highest activity, by more than double. We performed the fluorescence assays for plastic degradation of the plastic types BI-OPL and ecovio® only with proteins from the five candidate genes that were positively active in the esterase assay, but like the negative controls, these did not show any significantly increased activity. In contrast, the activity of the positive control, which contained a PLA-degrading gene insert known from the literature, was more than 20 times higher than that of the negative controls. These findings suggest that in silico screening followed by functional validation is suitable for finding new plastic-degrading enzymes. Although we only found one new esterase enzyme, our approach has the potential to be applied to any type of soil and to plastics in various ecosystems to search rapidly and efficiently for new plastic-degrading enzymes.
Collapse
Affiliation(s)
- Beat Frey
- Swiss Federal Institute for Forest, Forest Soils and Biogeochemistry, Snow and Landscape Research WSL, Birmensdorf, Switzerland
| | - Margherita Aiesi
- Swiss Federal Institute for Forest, Forest Soils and Biogeochemistry, Snow and Landscape Research WSL, Birmensdorf, Switzerland
- Facoltà de Science Agrarie e Alimentari, University Degli Studi di Milano, Milano, Italy
| | - Basil M. Rast
- Swiss Federal Institute for Forest, Forest Soils and Biogeochemistry, Snow and Landscape Research WSL, Birmensdorf, Switzerland
| | - Joel Rüthi
- Swiss Federal Institute for Forest, Forest Soils and Biogeochemistry, Snow and Landscape Research WSL, Birmensdorf, Switzerland
| | - Jérôme Julmi
- Swiss Federal Institute for Forest, Forest Soils and Biogeochemistry, Snow and Landscape Research WSL, Birmensdorf, Switzerland
| | - Beat Stierli
- Swiss Federal Institute for Forest, Forest Soils and Biogeochemistry, Snow and Landscape Research WSL, Birmensdorf, Switzerland
| | - Weihong Qi
- Functional Genomics Center Zürich, ETH Zürich and University of Zürich, Zürich, Switzerland
- Swiss Institute of Bioinformatics SIB, Geneva, Switzerland
| | - Ivano Brunner
- Swiss Federal Institute for Forest, Forest Soils and Biogeochemistry, Snow and Landscape Research WSL, Birmensdorf, Switzerland
| |
Collapse
|
10
|
Shen C, Mao D, Tang J, Liao Z, Chen S. Prediction of LncRNA-Protein Interactions Based on Kernel Combinations and Graph Convolutional Networks. IEEE J Biomed Health Inform 2024; 28:1937-1948. [PMID: 37327093 DOI: 10.1109/jbhi.2023.3286917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
The complexes of long non-coding RNAs bound to proteins can be involved in regulating life activities at various stages of organisms. However, in the face of the growing number of lncRNAs and proteins, verifying LncRNA-Protein Interactions (LPI) based on traditional biological experiments is time-consuming and laborious. Therefore, with the improvement of computing power, predicting LPI has met new development opportunity. In virtue of the state-of-the-art works, a framework called LncRNA-Protein Interactions based on Kernel Combinations and Graph Convolutional Networks (LPI-KCGCN) has been proposed in this article. We first construct kernel matrices by taking advantage of extracting both the lncRNAs and protein concerning the sequence features, sequence similarity features, expression features, and gene ontology. Then reconstruct the existent kernel matrices as the input of the next step. Combined with known LPI interactions, the reconstructed similarity matrices, which can be used as features of the topology map of the LPI network, are exploited in extracting potential representations in the lncRNA and protein space using a two-layer Graph Convolutional Network. The predicted matrix can be finally obtained by training the network to produce scoring matrices w.r.t. lncRNAs and proteins. Different LPI-KCGCN variants are ensemble to derive the final prediction results and testify on balanced and unbalanced datasets. The 5-fold cross-validation shows that the optimal feature information combination on a dataset with 15.5% positive samples has an AUC value of 0.9714 and an AUPR value of 0.9216. On another highly unbalanced dataset with only 5% positive samples, LPI-KCGCN also has outperformed the state-of-the-art works, which achieved an AUC value of 0.9907 and an AUPR value of 0.9267.
Collapse
|
11
|
Rondón JJ, Pisarenco VA, Ramón Pardos-Blas J, Sánchez-Gracia A, Zardoya R, Rozas J. Comparative genomic analysis of chemosensory-related gene families in gastropods. Mol Phylogenet Evol 2024; 192:107986. [PMID: 38142794 DOI: 10.1016/j.ympev.2023.107986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/24/2023] [Accepted: 12/07/2023] [Indexed: 12/26/2023]
Abstract
Chemoreception is critical for the survival and reproduction of animals. Except for a reduced group of insects and chelicerates, the molecular identity of chemosensory proteins is poorly understood in invertebrates. Gastropoda is the extant mollusk class with the greatest species richness, including marine, freshwater, and terrestrial lineages, and likely, highly diverse chemoreception systems. Here, we performed a comprehensive comparative genome analysis taking advantage of the chromosome-level information of two Gastropoda species, one of which belongs to a lineage that underwent a whole genome duplication event. We identified thousands of previously uncharacterized chemosensory-related genes, the majority of them encoding G protein-coupled receptors (GPCR), mostly organized into clusters distributed across all chromosomes. We also detected gene families encoding degenerin epithelial sodium channels (DEG-ENaC), ionotropic receptors (IR), sensory neuron membrane proteins (SNMP), Niemann-Pick type C2 (NPC2) proteins, and lipocalins, although with a lower number of members. Our phylogenetic analysis of the GPCR gene family across protostomes revealed: (i) remarkable gene family expansions in Gastropoda; (ii) clades including members from all protostomes; and (iii) species-specific clades with a substantial number of receptors. For the first time, we provide new and valuable knowledge into the evolution of the chemosensory gene families in invertebrates other than arthropods.
Collapse
Affiliation(s)
- Johnma José Rondón
- Fundación Instituto Leloir, Instituto de Investigaciones Bioquímicas de Buenos Aires (IIBBA-CONICET), Buenos Aires, Argentina; Instituto de Ecología, Genética y Evolución de Buenos Aires (IEGEBA-CONICET) Buenos Aires, Argentina
| | - Vadim A Pisarenco
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona (UB), Barcelona, Spain; Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona, Spain
| | - José Ramón Pardos-Blas
- Departamento de Biodiversidad y Biologı́a Evolutiva, Museo Nacional de Ciencias Naturales (MNCN-CSIC), Madrid, Spain
| | - Alejandro Sánchez-Gracia
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona (UB), Barcelona, Spain; Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona, Spain
| | - Rafael Zardoya
- Departamento de Biodiversidad y Biologı́a Evolutiva, Museo Nacional de Ciencias Naturales (MNCN-CSIC), Madrid, Spain.
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona (UB), Barcelona, Spain; Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona (UB), Barcelona, Spain.
| |
Collapse
|
12
|
Botkin JR, Farmer AD, Young ND, Curtin SJ. Genome assembly of Medicago truncatula accession SA27063 provides insight into spring black stem and leaf spot disease resistance. BMC Genomics 2024; 25:204. [PMID: 38395768 PMCID: PMC10885650 DOI: 10.1186/s12864-024-10112-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 02/10/2024] [Indexed: 02/25/2024] Open
Abstract
Medicago truncatula, model legume and alfalfa relative, has served as an essential resource for advancing our understanding of legume physiology, functional genetics, and crop improvement traits. Necrotrophic fungus, Ascochyta medicaginicola, the causal agent of spring black stem (SBS) and leaf spot is a devasting foliar disease of alfalfa affecting stand survival, yield, and forage quality. Host resistance to SBS disease is poorly understood, and control methods rely on cultural practices. Resistance has been observed in M. truncatula accession SA27063 (HM078) with two recessively inherited quantitative-trait loci (QTL), rnpm1 and rnpm2, previously reported. To shed light on host resistance, we carried out a de novo genome assembly of HM078. The genome, referred to as MtHM078 v1.0, is comprised of 23 contigs totaling 481.19 Mbp. Notably, this assembly contains a substantial amount of novel centromere-related repeat sequences due to deep long-read sequencing. Genome annotation resulted in 98.4% of BUSCO fabales proteins being complete. The assembly enabled sequence-level analysis of rnpm1 and rnpm2 for gene content, synteny, and structural variation between SBS-resistant accession SA27063 (HM078) and SBS-susceptible accession A17 (HM101). Fourteen candidate genes were identified, and some have been implicated in resistance to necrotrophic fungi. Especially interesting candidates include loss-of-function events in HM078 because they fit the inverse gene-for-gene model, where resistance is recessively inherited. In rnpm1, these include a loss-of-function in a disease resistance gene due to a premature stop codon, and a 10.85 kbp retrotransposon-like insertion disrupting a ubiquitin conjugating E2. In rnpm2, we identified a frameshift mutation causing a loss-of-function in a glycosidase, as well as a missense and frameshift mutation altering an F-box family protein. This study generated a high-quality genome of HM078 and has identified promising candidates, that once validated, could be further studied in alfalfa to enhance disease resistance.
Collapse
Affiliation(s)
- Jacob R Botkin
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, 55108, USA
| | - Andrew D Farmer
- National Center for Genome Resources, Santa Fe, NM, 87505, USA
| | - Nevin D Young
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, 55108, USA
| | - Shaun J Curtin
- United States Department of Agriculture, Plant Science Research Unit, St Paul, MN, 55108, USA.
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, 55108, USA.
- Center for Plant Precision Genomics, University of Minnesota, St. Paul, MN, 55108, USA.
- Center for Genome Engineering, University of Minnesota, St. Paul, MN, 55108, USA.
| |
Collapse
|
13
|
Schaeffer RD, Zhang J, Medvedev KE, Kinch LN, Cong Q, Grishin NV. ECOD domain classification of 48 whole proteomes from AlphaFold Structure Database using DPAM2. PLoS Comput Biol 2024; 20:e1011586. [PMID: 38416793 PMCID: PMC10927120 DOI: 10.1371/journal.pcbi.1011586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 03/11/2024] [Accepted: 02/20/2024] [Indexed: 03/01/2024] Open
Abstract
Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.
Collapse
Affiliation(s)
- R. Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Jing Zhang
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Kirill E. Medvedev
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Lisa N. Kinch
- Department of Molecular Biology, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Qian Cong
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Eugene McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| |
Collapse
|
14
|
Viner C, Ishak CA, Johnson J, Walker NJ, Shi H, Sjöberg-Herrera MK, Shen SY, Lardo SM, Adams DJ, Ferguson-Smith AC, De Carvalho DD, Hainer SJ, Bailey TL, Hoffman MM. Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet. Genome Biol 2024; 25:11. [PMID: 38191487 PMCID: PMC10773111 DOI: 10.1186/s13059-023-03070-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 09/21/2023] [Indexed: 01/10/2024] Open
Abstract
BACKGROUND Transcription factors bind DNA in specific sequence contexts. In addition to distinguishing one nucleobase from another, some transcription factors can distinguish between unmodified and modified bases. Current models of transcription factor binding tend not to take DNA modifications into account, while the recent few that do often have limitations. This makes a comprehensive and accurate profiling of transcription factor affinities difficult. RESULTS Here, we develop methods to identify transcription factor binding sites in modified DNA. Our models expand the standard A/C/G/T DNA alphabet to include cytosine modifications. We develop Cytomod to create modified genomic sequences and we also enhance the MEME Suite, adding the capacity to handle custom alphabets. We adapt the well-established position weight matrix (PWM) model of transcription factor binding affinity to this expanded DNA alphabet. Using these methods, we identify modification-sensitive transcription factor binding motifs. We confirm established binding preferences, such as the preference of ZFP57 and C/EBPβ for methylated motifs and the preference of c-Myc for unmethylated E-box motifs. CONCLUSIONS Using known binding preferences to tune model parameters, we discover novel modified motifs for a wide array of transcription factors. Finally, we validate our binding preference predictions for OCT4 using cleavage under targets and release using nuclease (CUT&RUN) experiments across conventional, methylation-, and hydroxymethylation-enriched sequences. Our approach readily extends to other DNA modifications. As more genome-wide single-base resolution modification data becomes available, we expect that our method will yield insights into altered transcription factor binding affinities across many different modifications.
Collapse
Affiliation(s)
- Coby Viner
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Charles A Ishak
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - James Johnson
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, Australia
| | - Nicolas J Walker
- Department of Genetics, University of Cambridge, Cambridge, England
| | - Hui Shi
- Department of Genetics, University of Cambridge, Cambridge, England
| | - Marcela K Sjöberg-Herrera
- Wellcome Sanger Institute, Cambridge, England
- Faculty of Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Shu Yi Shen
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Santana M Lardo
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | | | | | - Daniel D De Carvalho
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada
| | - Sarah J Hainer
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA
| | - Timothy L Bailey
- Department of Pharmacology, University of Nevada, Reno, Reno, NV, USA
| | - Michael M Hoffman
- Department of Computer Science, University of Toronto, Toronto, ON, Canada.
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada.
| |
Collapse
|
15
|
Dobson L, Gerdán C, Tusnády S, Szekeres L, Kuffa K, Langó T, Zeke A, Tusnády GE. UniTmp: unified resources for transmembrane proteins. Nucleic Acids Res 2024; 52:D572-D578. [PMID: 37870462 PMCID: PMC10767979 DOI: 10.1093/nar/gkad897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/03/2023] [Accepted: 10/04/2023] [Indexed: 10/24/2023] Open
Abstract
The UNIfied database of TransMembrane Proteins (UniTmp) is a comprehensive and freely accessible resource of transmembrane protein structural information at different levels, from localization of protein segments, through the topology of the protein to the membrane-embedded 3D structure. We not only annotated tens of thousands of new structures and experiments, but we also developed a new system that can serve these resources in parallel. UniTmp is a unified platform that merges TOPDB (Topology Data Bank of Transmembrane Proteins), TOPDOM (database of conservatively located domains and motifs in proteins), PDBTM (Protein Data Bank of Transmembrane Proteins) and HTP (Human Transmembrane Proteome) databases and provides interoperability between the incorporated resources and an easy way to keep them regularly updated. The current update contains 9235 membrane-embedded structures, 9088 sequences with 536 035 topology-annotated segments and 8692 conservatively localized protein domains or motifs as well as 5466 annotated human transmembrane proteins. The UniTmp database can be accessed at https://www.unitmp.org.
Collapse
Affiliation(s)
- László Dobson
- Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Magyar Tudósok körútja 2, H-1117, Hungary
- Department of Bioinformatics, Semmelweis University, Budapest, Tűzoltó u. 7, H-1094, Hungary
| | - Csongor Gerdán
- Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Magyar Tudósok körútja 2, H-1117, Hungary
| | - Simon Tusnády
- Department of Bioinformatics, Semmelweis University, Budapest, Tűzoltó u. 7, H-1094, Hungary
| | - Levente Szekeres
- Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Magyar Tudósok körútja 2, H-1117, Hungary
| | - Katalin Kuffa
- Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Magyar Tudósok körútja 2, H-1117, Hungary
- Doctoral School of Biology, Institute of Biology, ELTE Eötvös Loránd University, Budapest, Pázmány P. stny. 1/C, H-1117, Hungary
| | - Tamás Langó
- Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Magyar Tudósok körútja 2, H-1117, Hungary
| | - András Zeke
- Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Magyar Tudósok körútja 2, H-1117, Hungary
| | - Gábor E Tusnády
- Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Magyar Tudósok körútja 2, H-1117, Hungary
- Department of Bioinformatics, Semmelweis University, Budapest, Tűzoltó u. 7, H-1094, Hungary
| |
Collapse
|
16
|
Ali A, Unar A, Muhammad Z, Dil S, Zhang B, Sadaf H, Khan M, Ali M, Khan R, Shah KMB, Ma A, Jiang X, Zhang Y, Zhang H, Shi Q. A novel NPHP4 homozygous missense variant identified in infertile brothers with multiple morphological abnormalities of the sperm flagella. J Assist Reprod Genet 2024; 41:109-120. [PMID: 37831349 PMCID: PMC10789708 DOI: 10.1007/s10815-023-02966-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 10/03/2023] [Indexed: 10/14/2023] Open
Abstract
PURPOSE Asthenozoospermia is an important cause of male infertility, and the most serious type is characterized by multiple morphological abnormalities of the sperm flagella (MMAF). However, the precise etiology of MMAF remains unknown. In the current study, we recruited a consanguineous Pakistani family with two infertile brothers suffering from primary infertility due to MMAF without obvious signs of PCD. METHODS We performed whole-exome sequencing on DNAs of the patients, their parents, and a fertile brother and identified the homozygous missense variant (c.1490C > G (p.P497R) in NPHP4 as the candidate mutation for male infertility in this family. RESULTS Sanger sequencing confirmed that this mutation recessively co-segregated with the MMAF in this family. In silico analysis revealed that the mutation site is conserved across different species, and the identified mutation also causes abnormalities in the structure and hydrophobic interactions of the NPHP4 protein. Different bioinformatics tools predict that NPHP4p.P497R mutation is pathogenic. Furthermore, Papanicolaou staining and scanning electron microscopy of sperm revealed that affected individuals displayed typical MMAF phenotype with a high percentage of coiled, bent, short, absent, and/or irregular flagella. Transmission electron microscopy images of the patient's spermatozoa revealed significant anomalies in the sperm flagella with the absence of a central pair of microtubules (9 + 0) in every section scored. CONCLUSIONS Taken together, these results show that the homozygous missense mutation in NPHP4 is associated with MMAF.
Collapse
Affiliation(s)
- Asim Ali
- Division of Reproduction and Genetics, The First Affiliated Hospital of University of Science and Technology of China, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, China.
- Department of Biotechnology, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, Pakistan.
| | - Ahsanullah Unar
- Division of Reproduction and Genetics, The First Affiliated Hospital of University of Science and Technology of China, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, China
| | - Zubair Muhammad
- Division of Reproduction and Genetics, The First Affiliated Hospital of University of Science and Technology of China, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, China
| | - Sobia Dil
- Division of Reproduction and Genetics, The First Affiliated Hospital of University of Science and Technology of China, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, China
| | - Beibei Zhang
- Division of Reproduction and Genetics, The First Affiliated Hospital of University of Science and Technology of China, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, China
| | - Humaira Sadaf
- Department of Obstetrics and Gynecology, Ayub Medical Hospital Complex, Abbottabad, Pakistan
| | - Manan Khan
- Department of Biotechnology and Genetic Engineering, Hazara University, Mansehra, Pakistan
| | - Muhammad Ali
- Department of Biotechnology, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, Pakistan
| | - Ranjha Khan
- Division of Reproduction and Genetics, The First Affiliated Hospital of University of Science and Technology of China, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, China
| | - Kakakhel Mian Basit Shah
- Department of Biotechnology, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, Pakistan
| | - Ao Ma
- Division of Reproduction and Genetics, The First Affiliated Hospital of University of Science and Technology of China, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, China
| | - Xiaohua Jiang
- Division of Reproduction and Genetics, The First Affiliated Hospital of University of Science and Technology of China, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, China
| | - Yuanwei Zhang
- Division of Reproduction and Genetics, The First Affiliated Hospital of University of Science and Technology of China, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, China
| | - Huan Zhang
- Division of Reproduction and Genetics, The First Affiliated Hospital of University of Science and Technology of China, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, China
| | - Qinghua Shi
- Division of Reproduction and Genetics, The First Affiliated Hospital of University of Science and Technology of China, School of Basic Medical Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, 230027, China.
| |
Collapse
|
17
|
Romei M, Carpentier M, Chomilier J, Lecointre G. Origins and Functional Significance of Eukaryotic Protein Folds. J Mol Evol 2023; 91:854-864. [PMID: 38060007 DOI: 10.1007/s00239-023-10136-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 10/03/2023] [Indexed: 12/08/2023]
Abstract
Folds are the architecture and topology of a protein domain. Categories of folds are very few compared to the astronomical number of sequences. Eukaryotes have more protein folds than Archaea and Bacteria. These folds are of two types: shared with Archaea and/or Bacteria on one hand and specific to eukaryotic clades on the other hand. The first kind of folds is inherited from the first endosymbiosis and confirms the mixed origin of eukaryotes. In a dataset of 1073 folds whose presence or absence has been evidenced among 210 species equally distributed in the three super-kingdoms, we have identified 28 eukaryotic folds unambiguously inherited from Bacteria and 40 eukaryotic folds unambiguously inherited from Archaea. Compared to previous studies, the repartition of informational function is higher than expected for folds originated from Bacteria and as high as expected for folds inherited from Archaea. The second type of folds is specifically eukaryotic and associated with an increase of new folds within eukaryotes distributed in particular clades. Reconstructed ancestral states coupled with dating of each node on the tree of life provided fold appearance rates. The rate is on average twice higher within Eukaryota than within Bacteria or Archaea. The highest rates are found in the origins of eukaryotes, holozoans, metazoans, metazoans stricto sensu, and vertebrates: the roots of these clades correspond to bursts of fold evolution. We could correlate the functions of some of the fold synapomorphies within eukaryotes with significant evolutionary events. Among them, we find evidence for the rise of multicellularity, adaptive immune system, or virus folds which could be linked to an ecological shift made by tetrapods.
Collapse
Affiliation(s)
- Martin Romei
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205), Sorbonne Université, MNHN, CNRS, EPHE, UA, Paris, France
- IMPMC (UMR 7590), BiBiP, Sorbonne Université, CNRS, MNHN, Paris, France
| | - Mathilde Carpentier
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205), Sorbonne Université, MNHN, CNRS, EPHE, UA, Paris, France.
| | - Jacques Chomilier
- IMPMC (UMR 7590), BiBiP, Sorbonne Université, CNRS, MNHN, Paris, France
| | - Guillaume Lecointre
- Institut Systématique Evolution Biodiversité (ISYEB UMR 7205), Sorbonne Université, MNHN, CNRS, EPHE, UA, Paris, France
| |
Collapse
|
18
|
Si D, Sun J, Guo L, Yang F, Tian X, He S, Li J. Hypothetical Proteins of Mycoplasma synoviae Reannotation and Expression Changes Identified via RNA-Sequencing. Microorganisms 2023; 11:2716. [PMID: 38004728 PMCID: PMC10673309 DOI: 10.3390/microorganisms11112716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 10/25/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Mycoplasma synoviae infection rates in chickens are increasing worldwide. Genomic studies have considerably improved our understanding of M. synoviae biology and virulence. However, approximately 20% of the predicted proteins have unknown functions. In particular, the M. synoviae ATCC 25204 genome has 663 encoding DNA sequences, among which 155 are considered encoding hypothetical proteins (HPs). Several of these genes may encode unknown virulence factors. This study aims to reannotate all 155 proteins in M. synoviae ATCC 25204 to predict new potential virulence factors using currently available databases and bioinformatics tools. Finally, 125 proteins were reannotated, including enzymes (39%), lipoproteins (10%), DNA-binding proteins (6%), phase-variable hemagglutinin (19%), and other protein types (26%). Among 155 proteins, 28 proteins associated with virulence were detected, five of which were reannotated. Furthermore, HP expression was compared before and after the M. synoviae infection of cells to identify potential virulence-related proteins. The expression of 14 HP genes was upregulated, including that of five virulence-related genes. Our study improved the functional annotation of M. synoviae ATCC 25204 from 76% to 95% and enabled the discovery of potential virulence factors in the genome. Moreover, 14 proteins that may be involved in M. synoviae infection were identified, providing candidate proteins and facilitating the exploration of the infection mechanism of M. synoviae.
Collapse
Affiliation(s)
| | | | | | | | | | - Shenghu He
- College of Animal Science and Technology, Clinical Veterinary Laboratory, Ningxia University, Yinchuan 750021, China; (D.S.); (J.S.); (L.G.); (F.Y.); (X.T.)
| | - Jidong Li
- College of Animal Science and Technology, Clinical Veterinary Laboratory, Ningxia University, Yinchuan 750021, China; (D.S.); (J.S.); (L.G.); (F.Y.); (X.T.)
| |
Collapse
|
19
|
Aziz MF, Mughal F, Caetano-Anollés G. Tracing the birth of structural domains from loops during protein evolution. Sci Rep 2023; 13:14688. [PMID: 37673948 PMCID: PMC10482863 DOI: 10.1038/s41598-023-41556-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Accepted: 08/28/2023] [Indexed: 09/08/2023] Open
Abstract
The structures and functions of proteins are embedded into the loop scaffolds of structural domains. Their origin and evolution remain mysterious. Here, we use a novel graph-theoretical approach to describe how modular and non-modular loop prototypes combine to form folded structures in protein domain evolution. Phylogenomic data-driven chronologies reoriented a bipartite network of loops and domains (and its projections) into 'waterfalls' depicting an evolving 'elementary functionome' (EF). Two primordial waves of functional innovation involving founder 'p-loop' and 'winged-helix' domains were accompanied by an ongoing emergence and reuse of structural and functional novelty. Metabolic pathways expanded before translation functionalities. A dual hourglass recruitment pattern transferred scale-free properties from loop to domain components of the EF network in generative cycles of hierarchical modularity. Modeling the evolutionary emergence of the oldest P-loop and winged-helix domains with AlphFold2 uncovered rapid convergence towards folded structure, suggesting that a folding vocabulary exists in loops for protein fold repurposing and design.
Collapse
Affiliation(s)
- M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, 61801, USA.
- C.R. Woese Institute for Genomic Biology, University of Illinois, Urbana, IL, 61801, USA.
| |
Collapse
|
20
|
Mayo-Pérez S, Gama-Martínez Y, Dávila S, Rivera N, Hernández-Lucas I. LysR-type transcriptional regulators: state of the art. Crit Rev Microbiol 2023:1-33. [PMID: 37635411 DOI: 10.1080/1040841x.2023.2247477] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 08/03/2023] [Accepted: 08/08/2023] [Indexed: 08/29/2023]
Abstract
The LysR-type transcriptional regulators (LTTRs) are DNA-binding proteins present in bacteria, archaea, and in algae. Knowledge about their distribution, abundance, evolution, structural organization, transcriptional regulation, fundamental roles in free life, pathogenesis, and bacteria-plant interaction has been generated. This review focuses on these aspects and provides a current picture of LTTR biology.
Collapse
Affiliation(s)
- S Mayo-Pérez
- Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Y Gama-Martínez
- Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - S Dávila
- Centro de Investigación en Dinámica Celular, Universidad Autónoma del Estado de Morelos, Cuernavaca, Mexico
| | - N Rivera
- IPN: CICATA, Unidad Morelos del Instituto Politécnico Nacional, Atlacholoaya, Mexico
| | - I Hernández-Lucas
- Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| |
Collapse
|
21
|
Riley R, Bowers RM, Camargo AP, Campbell A, Egan R, Eloe-Fadrosh EA, Foster B, Hofmeyr S, Huntemann M, Kellom M, Kimbrel JA, Oliker L, Yelick K, Pett-Ridge J, Salamov A, Varghese NJ, Clum A. Terabase-Scale Coassembly of a Tropical Soil Microbiome. Microbiol Spectr 2023; 11:e0020023. [PMID: 37310219 PMCID: PMC10434106 DOI: 10.1128/spectrum.00200-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 05/24/2023] [Indexed: 06/14/2023] Open
Abstract
Petabases of environmental metagenomic data are publicly available, presenting an opportunity to characterize complex environments and discover novel lineages of life. Metagenome coassembly, in which many metagenomic samples from an environment are simultaneously analyzed to infer the underlying genomes' sequences, is an essential tool for achieving this goal. We applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 terabases (Tbp) of metagenome data from a tropical soil in the Luquillo Experimental Forest (LEF), Puerto Rico. The resulting coassembly yielded 39 high-quality (>90% complete, <5% contaminated, with predicted 23S, 16S, and 5S rRNA genes and ≥18 tRNAs) metagenome-assembled genomes (MAGs), including two from the candidate phylum Eremiobacterota. Another 268 medium-quality (≥50% complete, <10% contaminated) MAGs were extracted, including the candidate phyla Dependentiae, Dormibacterota, and Methylomirabilota. In total, 307 medium- or higher-quality MAGs were assigned to 23 phyla, compared to 294 MAGs assigned to nine phyla in the same samples individually assembled. The low-quality (<50% complete, <10% contaminated) MAGs from the coassembly revealed a 49% complete rare biosphere microbe from the candidate phylum FCPU426 among other low-abundance microbes, an 81% complete fungal genome from the phylum Ascomycota, and 30 partial eukaryotic MAGs with ≥10% completeness, possibly representing protist lineages. A total of 22,254 viruses, many of them low abundance, were identified. Estimation of metagenome coverage and diversity indicates that we may have characterized ≥87.5% of the sequence diversity in this humid tropical soil and indicates the value of future terabase-scale sequencing and coassembly of complex environments. IMPORTANCE Petabases of reads are being produced by environmental metagenome sequencing. An essential step in analyzing these data is metagenome assembly, the computational reconstruction of genome sequences from microbial communities. "Coassembly" of metagenomic sequence data, in which multiple samples are assembled together, enables more complete detection of microbial genomes in an environment than "multiassembly," in which samples are assembled individually. To demonstrate the potential for coassembling terabases of metagenome data to drive biological discovery, we applied MetaHipMer2, a distributed metagenome assembler that runs on supercomputing clusters, to coassemble 3.4 Tbp of reads from a humid tropical soil environment. The resulting coassembly, its functional annotation, and analysis are presented here. The coassembly yielded more, and phylogenetically more diverse, microbial, eukaryotic, and viral genomes than the multiassembly of the same data. Our resource may facilitate the discovery of novel microbial biology in tropical soils and demonstrates the value of terabase-scale metagenome sequencing.
Collapse
Affiliation(s)
- Robert Riley
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Robert M. Bowers
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Antonio Pedro Camargo
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Ashley Campbell
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, USA
| | - Rob Egan
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | | | - Brian Foster
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Steven Hofmeyr
- Applied Math and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Marcel Huntemann
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Matthew Kellom
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Jeffrey A. Kimbrel
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, USA
| | - Leonid Oliker
- Applied Math and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Katherine Yelick
- Applied Math and Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California, USA
| | - Jennifer Pett-Ridge
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California, USA
- Life & Environmental Sciences Department, University of California Merced, Merced, California, USA
| | - Asaf Salamov
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Neha J. Varghese
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| | - Alicia Clum
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley California, USA
| |
Collapse
|
22
|
Bao C, Lu C, Lin J, Gough J, Fang H. The dcGO Domain-Centric Ontology Database in 2023: New Website and Extended Annotations for Protein Structural Domains. J Mol Biol 2023; 435:168093. [PMID: 37061086 PMCID: PMC7614987 DOI: 10.1016/j.jmb.2023.168093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 03/24/2023] [Accepted: 04/06/2023] [Indexed: 04/17/2023]
Abstract
Protein structural domains have been less studied than full-length proteins in terms of ontology annotations. The dcGO database has filled this gap by providing mappings from protein domains to ontologies. The dcGO update in 2023 extends annotations for protein domains of multiple definitions (SCOP, Pfam, and InterPro) with commonly used ontologies that are categorised into functions, phenotypes, diseases, drugs, pathways, regulators, and hallmarks. This update adds new dimensions to the utility of both ontology and protein domain resources. A newly designed website at http://www.protdomainonto.pro/dcGO offers a more centralised and user-friendly way to access the dcGO database, with enhanced faceted search returning term- and domain-specific information pages. Users can navigate both ontology terms and annotated domains through improved ontology hierarchy browsing. A newly added facility enables domain-based ontology enrichment analysis.
Collapse
Affiliation(s)
- Chaohui Bao
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
| | - Chang Lu
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK; MRC London Institute of Medical Sciences, Imperial College London, London W12 0HS, UK
| | - James Lin
- High Performance Computing Center, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Julian Gough
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK
| | - Hai Fang
- Shanghai Institute of Hematology, State Key Laboratory of Medical Genomics, National Research Center for Translational Medicine at Shanghai, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China.
| |
Collapse
|
23
|
Masum MHU, Rajia S, Bristi UP, Akter MS, Amin MR, Shishir TA, Ferdous J, Ahmed F, Rahaman MM, Saha O. In Silico Functional Characterization of a Hypothetical Protein From Pasteurella Multocida Reveals a Novel S-Adenosylmethionine-Dependent Methyltransferase Activity. Bioinform Biol Insights 2023; 17:11779322231184024. [PMID: 37424709 PMCID: PMC10328030 DOI: 10.1177/11779322231184024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 06/06/2023] [Indexed: 07/11/2023] Open
Abstract
Genomes may now be sequenced in a matter of weeks, leading to an influx of "hypothetical" proteins (HP) whose activities remain a mystery in GenBank. The information included inside these genes has quickly grown in prominence. Thus, we selected to look closely at the structure and function of an HP (AFF25514.1; 246 residues) from Pasteurella multocida (PM) subsp. multocida str. HN06. Possible insights into bacterial adaptation to new environments and metabolic changes might be gained by studying the functions of this protein. The PM HN06 2293 gene encodes an alkaline cytoplasmic protein with a molecular weight of 28352.60 Da, an isoelectric point (pI) of 9.18, and an overall average hydropathicity of around -0.565. One of its functional domains, tRNA (adenine (37)-N6)-methyltransferase TrmO, is a S-adenosylmethionine (SAM)-dependent methyltransferase (MTase), suggesting that it belongs to the Class VIII SAM-dependent MTase family. The tertiary structures represented by HHpred and I-TASSER models were found to be flawless. We predicted the model's active site using the Computed Atlas of Surface Topography of Proteins (CASTp) and FTSite servers, and then displayed it in 3 dimensional (3D) using PyMOL and BIOVIA Discovery Studio. Based on molecular docking (MD) results, we know that HP interacts with SAM and S-adenosylhomocysteine (SAH), 2 crucial metabolites in the tRNA methylation process, with binding affinities of 7.4 and 7.5 kcal/mol, respectively. Molecular dynamic simulations (MDS) of the docked complex, which included only modest structural adjustments, corroborated the strong binding affinity of SAM and SAH to the HP. Evidence for HP's possible role as an SAM-dependent MTase was therefore given by the findings of Multiple sequence alignment (MSA), MD, and molecular dynamic modeling. These in silico data suggest that the investigated HP might be used as a useful adjunct in the investigation of Pasteurella infections and the development of drugs to treat zoonotic pasteurellosis.
Collapse
Affiliation(s)
- Md. Habib Ullah Masum
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh
| | - Sultana Rajia
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh
| | - Uditi Paul Bristi
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh
| | - Mir Salma Akter
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh
| | - Mohammad Ruhul Amin
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh
| | - Tushar Ahmed Shishir
- Department of Mathematics and Natural Sciences, BRAC University, Dhaka, Bangladesh
| | - Jannatul Ferdous
- Department of Medicine, Abdul Malek Ukil Medical College, Noakhali, Bangladesh
| | - Firoz Ahmed
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh
| | | | - Otun Saha
- Department of Microbiology, Noakhali Science and Technology University, Noakhali, Bangladesh
| |
Collapse
|
24
|
Wang Y, Zhao D, Zhang W, Wang S, Wu Y, Wang S, Yang Y, Guo B. Four PQQ-Dependent Alcohol Dehydrogenases Responsible for the Oxidative Detoxification of Deoxynivalenol in a Novel Bacterium Ketogulonicigenium vulgare D3_3 Originated from the Feces of Tenebrio molitor Larvae. Toxins (Basel) 2023; 15:367. [PMID: 37368668 PMCID: PMC10301637 DOI: 10.3390/toxins15060367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 05/25/2023] [Accepted: 05/26/2023] [Indexed: 06/29/2023] Open
Abstract
Deoxynivalenol (DON) is frequently detected in cereals and cereal-based products and has a negative impact on human and animal health. In this study, an unprecedented DON-degrading bacterial isolate D3_3 was isolated from a sample of Tenebrio molitor larva feces. A 16S rRNA-based phylogenetic analysis and genome-based average nucleotide identity comparison clearly revealed that strain D3_3 belonged to the species Ketogulonicigenium vulgare. This isolate D3_3 could efficiently degrade 50 mg/L of DON under a broad range of conditions, such as pHs of 7.0-9.0 and temperatures of 18-30 °C, as well as during aerobic or anaerobic cultivation. 3-keto-DON was identified as the sole and finished DON metabolite using mass spectrometry. In vitro toxicity tests revealed that 3-keto-DON had lower cytotoxicity to human gastric epithelial cells and higher phytotoxicity to Lemna minor than its parent mycotoxin DON. Additionally, four genes encoding pyrroloquinoline quinone (PQQ)-dependent alcohol dehydrogenases in the genome of isolate D3_3 were identified as being responsible for the DON oxidation reaction. Overall, as a highly potent DON-degrading microbe, a member of the genus Ketogulonicigenium is reported for the first time in this study. The discovery of this DON-degrading isolate D3_3 and its four dehydrogenases will allow microbial strains and enzyme resources to become available for the future development of DON-detoxifying agents for food and animal feed.
Collapse
Affiliation(s)
- Yang Wang
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; (Y.W.)
| | - Donglei Zhao
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Wei Zhang
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; (Y.W.)
| | - Songshan Wang
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; (Y.W.)
| | - Yu Wu
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; (Y.W.)
| | - Songxue Wang
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; (Y.W.)
| | - Yongtan Yang
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; (Y.W.)
| | - Baoyuan Guo
- Academy of National Food and Strategic Reserves Administration, Beijing 100037, China; (Y.W.)
| |
Collapse
|
25
|
Álvarez-Campos P, García-Castro H, Emili E, Pérez-Posada A, Salamanca-Díaz DA, Mason V, Metzger B, Bely AE, Kenny N, Özpolat BD, Solana J. Annelid adult cell type diversity and their pluripotent cellular origins. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.25.537979. [PMID: 37163014 PMCID: PMC10168269 DOI: 10.1101/2023.04.25.537979] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Annelids are a broadly distributed, highly diverse, economically and environmentally important group of animals. Most species can regenerate missing body parts, and many are able to reproduce asexually. Therefore, many annelids can generate all adult cell types in adult stages. However, the putative adult stem cell populations involved in these processes, as well as the diversity of adult cell types generated by them, are still unknown. Here, we recover 75,218 single cell transcriptomes of Pristina leidyi, a highly regenerative and asexually-reproducing freshwater annelid. We characterise all major annelid adult cell types, and validate many of our observations by HCR in situ hybridisation. Our results uncover complex patterns of regionally expressed genes in the annelid gut, as well as neuronal, muscle and epidermal specific genes. We also characterise annelid-specific cell types such as the chaetal sacs and globin+ cells, and novel cell types of enigmatic affinity, including a vigilin+ cell type, a lumbrokinase+ cell type, and a diverse set of metabolic cells. Moreover, we characterise transcription factors and gene networks that are expressed specifically in these populations. Finally, we uncover a broadly abundant cluster of putative stem cells with a pluripotent signature. This population expresses well-known stem cell markers such as vasa, piwi and nanos homologues, but also shows heterogeneous expression of differentiated cell markers and their transcription factors. In these piwi+ cells, we also find conserved expression of pluripotency regulators, including multiple chromatin remodelling and epigenetic factors. Finally, lineage reconstruction analyses reveal the existence of differentiation trajectories from piwi+ cells to diverse adult types. Our data reveal the cell type diversity of adult annelids for the first time and serve as a resource for studying annelid cell types and their evolution. On the other hand, our characterisation of a piwi+ cell population with a pluripotent stem cell signature will serve as a platform for the study of annelid stem cells and their role in regeneration.
Collapse
Affiliation(s)
- Patricia Álvarez-Campos
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
- Centro de Investigación en Biodiversidad y Cambio Global (CIBC-UAM) & Departamento de Biología (Zoología), Facultad de Ciencias, Universidad Autónoma de Madrid, Madrid, Spain
| | - Helena García-Castro
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
| | - Elena Emili
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
| | - Alberto Pérez-Posada
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
| | | | - Vincent Mason
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
| | - Bria Metzger
- Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA, USA, 05432
- Department of Biology, Washington University in St. Louis. 1 Brookings Dr. Saint Louis, MO, USA, 63130
| | | | - Nathan Kenny
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
- Department of Biochemistry, University of Otago, P.O. Box 56, Dunedin, Aotearoa New Zealand
| | - B Duygu Özpolat
- Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA, USA, 05432
- Department of Biology, Washington University in St. Louis. 1 Brookings Dr. Saint Louis, MO, USA, 63130
| | - Jordi Solana
- Department of Biological and Medical Sciences, Oxford Brookes University, Oxford, UK
| |
Collapse
|
26
|
Rahman A, Sarker MT, Islam MA, Hossain MU, Hasan M, Susmi TF. Targeting Essential Hypothetical Proteins of Pseudomonas aeruginosa PAO1 for Mining of Novel Therapeutics: An In Silico Approach. BIOMED RESEARCH INTERNATIONAL 2023; 2023:1787485. [PMID: 37090194 PMCID: PMC10119676 DOI: 10.1155/2023/1787485] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 01/24/2023] [Accepted: 02/06/2023] [Indexed: 04/25/2023]
Abstract
As an omnipresent opportunistic bacterium, Pseudomonas aeruginosa PAO1 is responsible for acute and chronic infection in immunocompromised individuals. Currently, this bacterium is on WHO's red list where new antibiotics are urgently required for the treatment. Finding essential genes and essential hypothetical proteins (EHP) can be crucial in identifying novel druggable targets and therapeutics. This study is aimed at characterizing these EHPs and analyzing subcellular and physiochemical properties, PPI network, nonhomologous analysis against humans, virulence factor and novel drug target prediction, and finally structural analysis of the identified target employing around 42 robust bioinformatics tools/databases, the output of which was evaluated using the ROC analysis. The study discovered 18 EHPs from 336 essential genes, with domain and functional annotation revealing that 50% of these proteins belong to the enzyme category. The majority are cytoplasmic and cytoplasmic membrane proteins, with half being stable proteins subjected to PPIs network analysis. The network contains 261 nodes and 269 edges for 9 proteins of interest, with 11 hubs containing at least three nodes each. Finally, a pipeline builder predicts 7 proteins with novel drug targets, 5 nonhomologous proteins against human proteome, human antitargets, and human gut flora, and 3 virulent proteins. Among these, homology modeling of NP_249450 and NP_251676 was done, and the Ramachandran plot analysis revealed that more than 94% of the residues were in the preferred region. By analyzing functional attributes and virulence characteristics, the findings of this study may facilitate the development of innovative antibacterial drug targets and drugs of Pseudomonas aeruginosa PAO1.
Collapse
Affiliation(s)
- Atikur Rahman
- Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore 7408, Bangladesh
| | - Md. Takim Sarker
- Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore 7408, Bangladesh
| | - Md Ashiqul Islam
- Department of Chemistry and Biochemistry, University of Windsor, Canada
| | - Mohammad Uzzal Hossain
- Bioinformatics Division, National Institute of Biotechnology, Ganakbari, Ashulia, Savar, Dhaka 1349, Bangladesh
| | - Mahmudul Hasan
- Department of Pharmaceuticals and Industrial Biotechnology, Sylhet Agricultural University, Sylhet 3100, Bangladesh
| | - Tasmina Ferdous Susmi
- Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore 7408, Bangladesh
| |
Collapse
|
27
|
Joglekar A, Hu W, Zhang B, Narykov O, Diekhans M, Balacco J, Ndhlovu LC, Milner TA, Fedrigo O, Jarvis ED, Sheynkman G, Korkin D, Ross ME, Tilgner HU. Single-cell long-read mRNA isoform regulation is pervasive across mammalian brain regions, cell types, and development. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.02.535281. [PMID: 37066387 PMCID: PMC10103983 DOI: 10.1101/2023.04.02.535281] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]
Abstract
RNA isoforms influence cell identity and function. Until recently, technological limitations prevented a genome-wide appraisal of isoform influence on cell identity in various parts of the brain. Using enhanced long-read single-cell isoform sequencing, we comprehensively analyze RNA isoforms in multiple mouse brain regions, cell subtypes, and developmental timepoints from postnatal day 14 (P14) to adult (P56). For 75% of genes, full-length isoform expression varies along one or more axes of phenotypic origin, underscoring the pervasiveness of isoform regulation across multiple scales. As expected, splicing varies strongly between cell types. However, certain gene classes including neurotransmitter release and reuptake as well as synapse turnover, harbor significant variability in the same cell type across anatomical regions, suggesting differences in network activity may influence cell-type identity. Glial brain-region specificity in isoform expression includes strong poly(A)-site regulation, whereas neurons have stronger TSS regulation. Furthermore, developmental patterns of cell-type specific splicing are especially pronounced in the murine adolescent transition from P21 to P28. The same cell type traced across development shows more isoform variability than across adult anatomical regions, indicating a coordinated modulation of functional programs dictating neural development. As most cell-type specific exons in P56 mouse hippocampus behave similarly in newly generated data from human hippocampi, these principles may be extrapolated to human brain. However, human brains have evolved additional cell-type specificity in splicing, suggesting gain-of-function isoforms. Taken together, we present a detailed single-cell atlas of full-length brain isoform regulation across development and anatomical regions, providing a previously unappreciated degree of isoform variability across multiple scales of the brain.
Collapse
Affiliation(s)
- Anoushka Joglekar
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Wen Hu
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | | | - Oleksandr Narykov
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Lishomwa C Ndhlovu
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Department of Medicine, Division of Infectious Diseases, Weill Cornell Medicine, New York, NY, USA
| | - Teresa A Milner
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
| | - Olivier Fedrigo
- Vertebrate Genome Lab, the Rockefeller University, New York, NY
| | - Erich D Jarvis
- Vertebrate Genome Lab, the Rockefeller University, New York, NY
- Laboratory of Neurogenetics of Language, the Rockefeller University, New York, NY
- Howard Hughes Medical Institute, Chevy Chase, MD
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, Virginia, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, Virginia, USA
| | - Dmitry Korkin
- Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
- Computer Science Department, Worcester Polytechnic Institute, Worcester, MA, USA
- Data Science Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - M Elizabeth Ross
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Hagen U Tilgner
- Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
28
|
Makafe GG, Cole L, Roberts A, Muncil S, Patwardhan A, Bernacki D, Chojnacki M, Weinrick B, Sheinerman F. A novel chemogenomic discovery platform identifies bioactive hits with rapid bactericidal activity against Mycobacteroides Abscessus. Tuberculosis (Edinb) 2023; 139:102317. [PMID: 36736037 DOI: 10.1016/j.tube.2023.102317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 01/16/2023] [Accepted: 01/21/2023] [Indexed: 01/26/2023]
Abstract
Mycobacteroides abscessus (M. ab) infections are innately resistant to most currently available antibiotics and present a growing, poorly addressed medical need. The existing treatment regimens are lengthy and produce inadequate outcomes for many patients. Importantly, most clinically used drugs and drug candidates against M. ab are either bacteriostatic, or only weakly bactericidal. New strategies exploring a broader chemical space are urgently needed, as innovative agents in development are scarce and hit rates in large unbiased screens against the mycobacterium have been discouragingly low. Here we present a computational chemogenomics-driven approach to discovery of novel antibacterials that effectively reveals drug-like compounds active against M. ab, paired with small sets of predicted molecular targets for the compounds. Several of the bioactive hits identified exhibited rapid bactericidal, including sterilizing, activity against the mycobacterium, indicating that there are currently unexploited chemically tractable molecular mechanisms for rapid sterilization of M. ab. Interestingly, starvation, which typically induces drug tolerance, sensitized M. ab to some of the compounds, resulting in potencies similar to those of drugs in clinical use. The presented drug discovery platform has potential to identify highly differentiated prototype anti-infective molecules and thereby contribute to development of regimens for shorter treatment and improved outcomes for non-tuberculous mycobacterial infections.
Collapse
Affiliation(s)
| | - Laura Cole
- Trudeau Institute, 154 Algonquin Ave, Saranac Lake, NY, 12983, USA
| | - Alan Roberts
- Trudeau Institute, 154 Algonquin Ave, Saranac Lake, NY, 12983, USA
| | - Shania Muncil
- Trudeau Institute, 154 Algonquin Ave, Saranac Lake, NY, 12983, USA
| | | | - Derek Bernacki
- Trudeau Institute, 154 Algonquin Ave, Saranac Lake, NY, 12983, USA
| | | | - Brian Weinrick
- Trudeau Institute, 154 Algonquin Ave, Saranac Lake, NY, 12983, USA.
| | - Felix Sheinerman
- Trudeau Institute, 154 Algonquin Ave, Saranac Lake, NY, 12983, USA.
| |
Collapse
|
29
|
Genomic Survey of Flavin Monooxygenases in Wild and Cultivated Rice Provides Insight into Evolution and Functional Diversities. Int J Mol Sci 2023; 24:ijms24044190. [PMID: 36835601 PMCID: PMC9960948 DOI: 10.3390/ijms24044190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 01/08/2023] [Accepted: 01/12/2023] [Indexed: 02/22/2023] Open
Abstract
The flavin monooxygenase (FMO) enzyme was discovered in mammalian liver cells that convert a carcinogenic compound, N-N'-dimethylaniline, into a non-carcinogenic compound, N-oxide. Since then, many FMOs have been reported in animal systems for their primary role in the detoxification of xenobiotic compounds. In plants, this family has diverged to perform varied functions like pathogen defense, auxin biosynthesis, and S-oxygenation of compounds. Only a few members of this family, primarily those involved in auxin biosynthesis, have been functionally characterized in plant species. Thus, the present study aims to identify all the members of the FMO family in 10 different wild and cultivated Oryza species. Genome-wide analysis of the FMO family in different Oryza species reveals that each species has multiple FMO members in its genome and that this family is conserved throughout evolution. Taking clues from its role in pathogen defense and its possible function in ROS scavenging, we have also assessed the involvement of this family in abiotic stresses. A detailed in silico expression analysis of the FMO family in Oryza sativa subsp. japonica revealed that only a subset of genes responds to different abiotic stresses. This is supported by the experimental validation of a few selected genes using qRT-PCR in stress-sensitive Oryza sativa subsp. indica and stress-sensitive wild rice Oryza nivara. The identification and comprehensive in silico analysis of FMO genes from different Oryza species carried out in this study will serve as the foundation for further structural and functional studies of FMO genes in rice as well as other crop types.
Collapse
|
30
|
Liu J, Maxwell M, Cuddihy T, Crawford T, Bassetti M, Hyde C, Peigneur S, Tytgat J, Undheim EAB, Mobli M. ScrepYard: An online resource for disulfide-stabilized tandem repeat peptides. Protein Sci 2023; 32:e4566. [PMID: 36644825 PMCID: PMC9885460 DOI: 10.1002/pro.4566] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 01/05/2023] [Accepted: 01/12/2023] [Indexed: 01/17/2023]
Abstract
Receptor avidity through multivalency is a highly sought-after property of ligands. While readily available in nature in the form of bivalent antibodies, this property remains challenging to engineer in synthetic molecules. The discovery of several bivalent venom peptides containing two homologous and independently folded domains (in a tandem repeat arrangement) has provided a unique opportunity to better understand the underpinning design of multivalency in multimeric biomolecules, as well as how naturally occurring multivalent ligands can be identified. In previous work, we classified these molecules as a larger class termed secreted cysteine-rich repeat-proteins (SCREPs). Here, we present an online resource; ScrepYard, designed to assist researchers in identification of SCREP sequences of interest and to aid in characterizing this emerging class of biomolecules. Analysis of sequences within the ScrepYard reveals that two-domain tandem repeats constitute the most abundant SCREP domain architecture, while the interdomain "linker" regions connecting the functional domains are found to be abundant in amino acids with short or polar sidechains and contain an unusually high abundance of proline residues. Finally, we demonstrate the utility of ScrepYard as a virtual screening tool for discovery of putatively multivalent peptides, by using it as a resource to identify a previously uncharacterized serine protease inhibitor and confirm its predicted activity using an enzyme assay.
Collapse
Affiliation(s)
- Junyu Liu
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Michael Maxwell
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Thom Cuddihy
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia,Centre for Clinical ResearchThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Theo Crawford
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| | - Madeline Bassetti
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation Ltd.The University of QueenslandSt. LuciaQueenslandAustralia,University of the Sunshine CoastMaroochydoreQueenslandAustralia
| | - Steve Peigneur
- Toxicology and PharmacologyUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Jan Tytgat
- Toxicology and PharmacologyUniversity of Leuven (KU Leuven)LeuvenBelgium
| | - Eivind A. B. Undheim
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia,Centre for Ecological and Evolutionary Synthesis, Department of BiosciencesUniversity of OsloOsloNorway
| | - Mehdi Mobli
- Centre for Advanced ImagingThe University of QueenslandSt. LuciaQueenslandAustralia
| |
Collapse
|
31
|
Addressing the pervasive scarcity of structural annotation in eukaryotic algae. Sci Rep 2023; 13:1687. [PMID: 36717613 PMCID: PMC9886943 DOI: 10.1038/s41598-023-27881-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 01/09/2023] [Indexed: 02/01/2023] Open
Abstract
Despite a continuous increase in algal genome sequencing, structural annotations of most algal genome assemblies remain unavailable. This pervasive scarcity of genome annotation has restricted rigorous investigation of these genomic resources and may have precipitated misleading biological interpretations. However, the annotation process for eukaryotic algal species is often challenging as genomic resources and transcriptomic evidence are not always available. To address this challenge, we benchmark the cutting-edge gene prediction methods that can be generalized for a broad range of non-model eukaryotes. Using the most accurate methods selected based on high-quality algal genomes, we predict structural annotations for 135 unannotated algal genomes. Using previously available genomic data pooled together with new data obtained in this study, we identified the core orthologous genes and the multi-gene phylogeny of eukaryotic algae, including of previously unexplored algal species. This study not only provides a benchmark for the use of structural annotation methods on a variety of non-model eukaryotes, but also compensates for missing data in the current spectrum of algal genomic resources. These results bring us one step closer to the full potential of eukaryotic algal genomics.
Collapse
|
32
|
Billaud M, Petit MA, Lossouarn J. The Clostridium-infecting filamentous phage CAK1 genome analysis allows to define a new potential clade of Tubulavirales. FEMS Microbiol Lett 2023; 370:fnad099. [PMID: 37791400 DOI: 10.1093/femsle/fnad099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 09/21/2023] [Accepted: 10/02/2023] [Indexed: 10/05/2023] Open
Abstract
What we know about Tubulavirales, i.e. filamentous phages, essentially comes from Gram-negative-infecting Inoviridae. However, metagenomics recently suggests filamentous phages are much more widespread and diverse. Here, we report the complete sequence and functional annotation of CAK1, a 6.6 kb filamentous phage that was shown to chronically infect Clostridium beijerinckii 30 years ago and only represents the second filamentous phage cultivated on a Gram-positive bacterium. CAK1 has a typical filamentous phage modular genome with no homologs in databases and we were interested to compare it with a pig gut filamentous phage metagenomics dataset that we previously assembled and for which many filamentous phages were predicted to infect Clostridium species by bioinformatics means. CAK1 is distantly related to nine of these sequences, two of which have been predicted as Clostridium-associated. In itself, this small cluster of CAK1-connected sequences sheds light on the diversity of filamentous phages that putatively infect Clostridium species, and probably many other Gram-positive genera.
Collapse
Affiliation(s)
- Maud Billaud
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350 Jouy-en-Josas, France
| | - Marie-Agnès Petit
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350 Jouy-en-Josas, France
| | - Julien Lossouarn
- Université Paris-Saclay, INRAE, AgroParisTech, Micalis Institute, 78350 Jouy-en-Josas, France
| |
Collapse
|
33
|
Hagadorn MA, Hunter FK, DeLory T, Johnson MM, Pitts-Singer TL, Kapheim KM. Maternal body condition and season influence RNA deposition in the oocytes of alfalfa leafcutting bees ( Megachile rotundata). Front Genet 2023; 13:1064332. [PMID: 36685934 PMCID: PMC9845908 DOI: 10.3389/fgene.2022.1064332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 11/28/2022] [Indexed: 01/06/2023] Open
Abstract
Maternal effects are an important source of phenotypic variance, whereby females influence offspring developmental trajectory beyond direct genetic contributions, often in response to changing environmental conditions. However, relatively little is known about the mechanisms by which maternal experience is translated into molecular signals that shape offspring development. One such signal may be maternal RNA transcripts (mRNAs and miRNAs) deposited into maturing oocytes. These regulate the earliest stages of development of all animals, but are understudied in most insects. Here we investigated the effects of female internal (body condition) and external (time of season) environmental conditions on maternal RNA in the maturing oocytes and 24-h-old eggs (24-h eggs) of alfalfa leafcutting bees. Using gene expression and WGCNA analysis, we found that females adjust the quantity of mRNAs related to protein phosphorylation, transcriptional regulation, and nuclease activity deposited into maturing oocytes in response to both poor body condition and shorter day lengths that accompany the late season. However, the magnitude of these changes was higher for time of season. Females also adjusted miRNA deposition in response to seasonal changes, but not body condition. We did not observe significant changes in maternal RNAs in response to either body condition or time of season in 24-h eggs, which were past the maternal-to-zygotic transition. Our results suggest that females adjust the RNA transcripts they provide for offspring to regulate development in response to both internal and external environmental cues. Variation in maternal RNAs may, therefore, be important for regulating offspring phenotype in response to environmental change.
Collapse
Affiliation(s)
- Mallory A. Hagadorn
- Department of Biology, Department of Biology, Utah State University, Logan, UT, United States
| | - Frances K. Hunter
- Department of Biology, Department of Biology, Utah State University, Logan, UT, United States
| | - Tim DeLory
- Department of Biology, Department of Biology, Utah State University, Logan, UT, United States
| | - Makenna M. Johnson
- Department of Biology, Department of Biology, Utah State University, Logan, UT, United States,United States Department of Agriculture, Agricultural Research Service, Pollinating Insects Research Unit, Logan, UT, United States
| | - Theresa L. Pitts-Singer
- United States Department of Agriculture, Agricultural Research Service, Pollinating Insects Research Unit, Logan, UT, United States
| | - Karen M. Kapheim
- Department of Biology, Department of Biology, Utah State University, Logan, UT, United States,*Correspondence: Karen M. Kapheim ,
| |
Collapse
|
34
|
Nambiar A, Liu S, Heflin M, Forsyth JM, Maslov S, Hopkins M, Ritz A. Transformer Neural Networks for Protein Family and Interaction Prediction Tasks. J Comput Biol 2023; 30:95-111. [PMID: 35950958 DOI: 10.1089/cmb.2022.0132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-trains task-agnostic sequence representations. This model is fine-tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction. Our method is comparable to existing state-of-the-art approaches for protein family classification while being much more general than other architectures. Further, our method outperforms other approaches for protein interaction prediction for two out of three different scenarios that we generated. These results offer a promising framework for fine-tuning the pre-trained sequence representations for other protein prediction tasks.
Collapse
Affiliation(s)
- Ananthan Nambiar
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Simon Liu
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Department of Computer Science, and University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Maeve Heflin
- Department of Computer Science, and University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - John Malcolm Forsyth
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Department of Computer Science, and University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Sergei Maslov
- Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.,Department of Computer Science, and University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | - Mark Hopkins
- Department of Computer Science and Reed College, Portland, Oregon, USA
| | - Anna Ritz
- Department of Biology, Reed College, Portland, Oregon, USA
| |
Collapse
|
35
|
Genome-wide subcellular protein map for the flagellate parasite Trypanosoma brucei. Nat Microbiol 2023; 8:533-547. [PMID: 36804636 PMCID: PMC9981465 DOI: 10.1038/s41564-022-01295-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 11/21/2022] [Indexed: 02/22/2023]
Abstract
Trypanosoma brucei is a model trypanosomatid, an important group of human, animal and plant unicellular parasites. Understanding their complex cell architecture and life cycle is challenging because, as with most eukaryotic microbes, ~50% of genome-encoded proteins have completely unknown functions. Here, using fluorescence microscopy and cell lines expressing endogenously tagged proteins, we mapped the subcellular localization of 89% of the T. brucei proteome, a resource we call TrypTag. We provide clues to function and define lineage-specific organelle adaptations for parasitism, mapping the ultraconserved cellular architecture of eukaryotes, including the first comprehensive 'cartographic' analysis of the eukaryotic flagellum, which is vital for morphogenesis and pathology. To demonstrate the power of this resource, we identify novel organelle subdomains and changes in molecular composition through the cell cycle. TrypTag is a transformative resource, important for hypothesis generation for both eukaryotic evolutionary molecular cell biology and fundamental parasite cell biology.
Collapse
|
36
|
Patel VK, Das A, Kumari R, Kajla S. In silico Analysis of Diverse Endo-β-1,4-glucanases Reveals Their Molecular Evolution. J EVOL BIOCHEM PHYS+ 2023. [DOI: 10.1134/s0022093023010088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/16/2023]
|
37
|
Ben Boubaker R, Tiss A, Henrion D, Chabbert M. Homology Modeling in the Twilight Zone: Improved Accuracy by Sequence Space Analysis. Methods Mol Biol 2023; 2627:1-23. [PMID: 36959439 DOI: 10.1007/978-1-0716-2974-1_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The analysis of the relationship between sequence and structure similarities during the evolution of a protein family has revealed a limit of sequence divergence for which structural conservation can be confidently assumed and homology modeling is reliable. Below this limit, the twilight zone corresponds to sequence divergence for which homology modeling becomes increasingly difficult and requires specific methods. Either with conventional threading methods or with recent deep learning methods, such as AlphaFold, the challenge relies on the identification of a template that shares not only a common ancestor (homology) but also a conserved structure with the query. As both homology and structural conservation are transitive properties, mining of sequence databases followed by multidimensional scaling (MDS) of the query sequence space can reveal intermediary sequences to infer homology and structural conservation between the query and the template. Here, as a case study, we studied the plethodontid receptivity factor isoform 1 (PRF1) from Plethodon jordani, a member of a pheromone protein family present only in lungless salamanders and weakly related to cytokines of the IL6 family. A variety of conventional threading methods led to the cytokine CNTF as a template. Sequence mining, followed by phylogenetic and MDS analysis, provided missing links between PRF1 and CNTF and allowed reliable homology modeling. In addition, we compared automated models obtained from web servers to a customized model to show how modeling can be improved by expert information.
Collapse
Affiliation(s)
- Rym Ben Boubaker
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France
| | - Asma Tiss
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France
| | - Daniel Henrion
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France
| | - Marie Chabbert
- UMR CNRS 6015 - INSERM 1083, Laboratoire MITOVASC, Université d'Angers, Angers, France.
| |
Collapse
|
38
|
Kaur H, Singh V, Kalia M, Mohan B, Taneja N. Identification and functional annotation of hypothetical proteins of uropathogenic Escherichia coli strain CFT073 towards designing antimicrobial drug targets. J Biomol Struct Dyn 2022; 40:14084-14095. [PMID: 34751095 DOI: 10.1080/07391102.2021.2000499] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Urinary tract infections are a serious health concern worldwide, especially in developing countries. Escherichia coli strain CFT073 is a highly virulent pathogenic bacterial strain. CFT073 proteome contains 4897 proteins, out of which 992 have been classified as hypothetical proteins. Identification and characterization of hypothetical proteins can aid in the selection of targets for drug design. In this study, we studied the hypothetical proteins from the UPEC strain CFT073 using various computational tools. By NCBI-CDD, 376 protein sequences showed conserved domains. Based on the functional motifs in their primary sequences, we classified these 376 hypothetical proteins into 7 functional categories. Further KEGG database was used to find the roles of these hypothetical proteins in several pathways. Protein interaction network analysis of hypothetical proteins identified 53 proteins as highly interacting metabolic proteins. Virulence factor analysis of the proteins identified 8 proteins as virulent. We conducted a non-homology search for the identified proteins of UPEC in the available human proteome. We observed that 35 proteins are non-homologous to humans and hence could be selected for drug designing targets. Qualitative characterization of the selected 35 non-homologous hypothetical proteins including essentiality analysis and evaluation of druggability by similarity search against drug bank database was performed. Out of these 35 proteins, three-dimensional structures of six proteins (NP_752562.1, NP_756345.1, NP_754893.1, NP_756600.2, NP_755264.1 and NP_752994.1) could be successfully modelled. These new annotations can help to better understand disease mechanisms at the molecular level, as well as provide new targets for drug development against the UPEC strain CFT073.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Harpreet Kaur
- Department of Medical Microbiology, Postgraduate Institute of Medical Education and Research, Chandigarh, India
| | - Vikram Singh
- Center of Computational Biology and Bioinformatics, Central University of Himachal Pradesh, Dharamshala, India
| | - Manmohit Kalia
- Department of Biology, State University of New York, Binghamton, NY, USA
| | - Balvinder Mohan
- Department of Medical Microbiology, Postgraduate Institute of Medical Education and Research, Chandigarh, India
| | - Neelam Taneja
- Department of Medical Microbiology, Postgraduate Institute of Medical Education and Research, Chandigarh, India
| |
Collapse
|
39
|
Miller J, Zimin AV, Gordus A. Chromosome-level genome and the identification of sex chromosomes in Uloborus diversus. Gigascience 2022; 12:giad002. [PMID: 36762707 PMCID: PMC9912274 DOI: 10.1093/gigascience/giad002] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 11/18/2022] [Accepted: 01/03/2023] [Indexed: 02/11/2023] Open
Abstract
The orb web is a remarkable example of animal architecture that is observed in families of spiders that diverged over 200 million years ago. While several genomes exist for araneid orb-weavers, none exist for other orb-weaving families, hampering efforts to investigate the genetic basis of this complex behavior. Here we present a chromosome-level genome assembly for the cribellate orb-weaving spider Uloborus diversus. The assembly reinforces evidence of an ancient arachnid genome duplication and identifies complete open reading frames for every class of spidroin gene, which encode the proteins that are the key structural components of spider silks. We identified the 2 X chromosomes for U. diversus and identify candidate sex-determining loci. This chromosome-level assembly will be a valuable resource for evolutionary research into the origins of orb-weaving, spidroin evolution, chromosomal rearrangement, and chromosomal sex determination in spiders.
Collapse
Affiliation(s)
- Jeremiah Miller
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Aleksey V Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Andrew Gordus
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Solomon H. Snyder Department of Neuroscience, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
40
|
Genomic basis of the giga-chromosomes and giga-genome of tree peony Paeonia ostii. Nat Commun 2022; 13:7328. [PMID: 36443323 PMCID: PMC9705720 DOI: 10.1038/s41467-022-35063-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 11/17/2022] [Indexed: 11/29/2022] Open
Abstract
Tree peony (Paeonia ostii) is an economically important ornamental plant native to China. It is also notable for its seed oil, which is abundant in unsaturated fatty acids such as α-linolenic acid (ALA). Here, we report chromosome-level genome assembly (12.28 Gb) of P. ostii. In contrast to monocots with giant genomes, tree peony does not appear to have undergone lineage-specific whole-genome duplication. Instead, explosive LTR expansion in the intergenic regions within a short period (~ two million years) may have contributed to the formation of its giga-genome. In addition, expansion of five types of histone encoding genes may have helped maintain the giga-chromosomes. Further, we conduct genome-wide association studies (GWAS) on 448 accessions and show expansion and high expression of several genes in the key nodes of fatty acid biosynthetic pathway, including SAD, FAD2 and FAD3, may function in high level of ALAs synthesis in tree peony seeds. Moreover, by comparing with cultivated tree peony (P. suffruticosa), we show that ectopic expression of class A gene AP1 and reduced expression of class C gene AG may contribute to the formation of petaloid stamens. Genomic resources reported in this study will be valuable for studying chromosome/genome evolution and tree peony breeding.
Collapse
|
41
|
Stam M, Lelièvre P, Hoebeke M, Corre E, Barbeyron T, Michel G. SulfAtlas, the sulfatase database: state of the art and new developments. Nucleic Acids Res 2022; 51:D647-D653. [PMID: 36318251 PMCID: PMC9825549 DOI: 10.1093/nar/gkac977] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 10/14/2022] [Accepted: 10/17/2022] [Indexed: 11/06/2022] Open
Abstract
SulfAtlas (https://sulfatlas.sb-roscoff.fr/) is a knowledge-based resource dedicated to a sequence-based classification of sulfatases. Currently four sulfatase families exist (S1-S4) and the largest family (S1, formylglycine-dependent sulfatases) is divided into subfamilies by a phylogenetic approach, each subfamily corresponding to either a single characterized specificity (or few specificities in some cases) or to unknown substrates. Sequences are linked to their biochemical and structural information according to an expert scrutiny of the available literature. Database browsing was initially made possible both through a keyword search engine and a specific sequence similarity (BLAST) server. In this article, we will briefly summarize the experimental progresses in the sulfatase field in the last 6 years. To improve and speed up the (sub)family assignment of sulfatases in (meta)genomic data, we have developed a new, freely-accessible search engine using Hidden Markov model (HMM) for each (sub)family. This new tool (SulfAtlas HMM) is also a key part of the internal pipeline used to regularly update the database. SulfAtlas resource has indeed significantly grown since its creation in 2016, from 4550 sequences to 162 430 sequences in August 2022.
Collapse
Affiliation(s)
| | | | - Mark Hoebeke
- Sorbonne Université, CNRS, FR2424, ABiMS, Station Biologique de Roscoff, 29680, Roscoff, Bretagne, France
| | - Erwan Corre
- Sorbonne Université, CNRS, FR2424, ABiMS, Station Biologique de Roscoff, 29680, Roscoff, Bretagne, France
| | - Tristan Barbeyron
- Correspondence may also be addressed to Tristan Barbeyron. Tel: +33 298 29 23 30; Fax: +33 298 29 23 24;
| | - Gurvan Michel
- To whom correspondence should be addressed. Tel: +33 298 29 23 30; Fax: +33 298 29 23 24;
| |
Collapse
|
42
|
Rahman MA, Heme UH, Parvez MAK. In silico functional annotation of hypothetical proteins from the Bacillus paralicheniformis strain Bac84 reveals proteins with biotechnological potentials and adaptational functions to extreme environments. PLoS One 2022; 17:e0276085. [PMID: 36228026 PMCID: PMC9560612 DOI: 10.1371/journal.pone.0276085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 09/28/2022] [Indexed: 11/26/2022] Open
Abstract
Members of the Bacillus genus are industrial cell factories due to their capacity to secrete significant quantities of biomolecules with industrial applications. The Bacillus paralicheniformis strain Bac84 was isolated from the Red Sea and it shares a close evolutionary relationship with Bacillus licheniformis. However, a significant number of proteins in its genome are annotated as functionally uncharacterized hypothetical proteins. Investigating these proteins' functions may help us better understand how bacteria survive extreme environmental conditions and to find novel targets for biotechnological applications. Therefore, the purpose of our research was to functionally annotate the hypothetical proteins from the genome of B. paralicheniformis strain Bac84. We employed a structured in-silico approach incorporating numerous bioinformatics tools and databases for functional annotation, physicochemical characterization, subcellular localization, protein-protein interactions, and three-dimensional structure determination. Sequences of 414 hypothetical proteins were evaluated and we were able to successfully attribute a function to 37 hypothetical proteins. Moreover, we performed receiver operating characteristic analysis to assess the performance of various tools used in this present study. We identified 12 proteins having significant adaptational roles to unfavorable environments such as sporulation, formation of biofilm, motility, regulation of transcription, etc. Additionally, 8 proteins were predicted with biotechnological potentials such as coenzyme A biosynthesis, phenylalanine biosynthesis, rare-sugars biosynthesis, antibiotic biosynthesis, bioremediation, and others. Evaluation of the performance of the tools showed an accuracy of 98% which represented the rationality of the tools used. This work shows that this annotation strategy will make the functional characterization of unknown proteins easier and can find the target for further investigation. The knowledge of these hypothetical proteins' potential functions aids B. paralicheniformis strain Bac84 in effectively creating a new biotechnological target. In addition, the results may also facilitate a better understanding of the survival mechanisms in harsh environmental conditions.
Collapse
Affiliation(s)
- Md. Atikur Rahman
- Institute of Microbiology, Friedrich Schiller University Jena, Thuringia, Germany
| | - Uzma Habiba Heme
- Faculty of Biological Sciences, Friedrich Schiller University Jena, Thuringia, Germany
| | | |
Collapse
|
43
|
Banik A, Ahmed SR, Sajib EH, Deb A, Sinha S, Azim KF. Identification of potential inhibitory analogs of metastasis tumor antigens (MTAs) using bioactive compounds: revealing therapeutic option to prevent malignancy. Mol Divers 2022; 26:2473-2502. [PMID: 34743299 DOI: 10.1007/s11030-021-10345-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 10/24/2021] [Indexed: 12/31/2022]
Abstract
The deeper understanding of metastasis phenomenon and detection of drug targets could be a potential approach to minimize cancer mortality. In this study, attempts were taken to unmask novel therapeutics to prevent metastasis and cancer progression. Initially, we explored the physiochemical, structural and functional insights of three metastasis tumor antigens (MTAs) and evaluated some plant-based bioactive compounds as potent MTA inhibitors. From 50 plant metabolites screened, isoflavone, gingerol, citronellal and asiatic acid showed maximum binding affinity with all three MTA proteins. The ADME analysis detected no undesirable toxicity that could reduce the drug likeness properties of top plant metabolites. Moreover, molecular dynamics studies revealed that the complexes were stable and showed minimum fluctuation at molecular level. We further performed ligand-based virtual screening to identify similar drug molecules using a large collection of 376,342 compounds from DrugBank. The results suggested that several structural analogs (e.g., tramadol, nabumetone, DGLA and hydrocortisone) may act as agonist to block the MTA proteins and inhibit cancer progression at early stage. The study could be useful to develop effective medications against cancer metastasis in future. Due to encouraging results, we highly recommend further in vitro and in vivo trials for the experimental validation of the findings.
Collapse
Affiliation(s)
- Anik Banik
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
- Department of Plant and Environmental Biotechnology, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Sheikh Rashel Ahmed
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
- Department of Plant and Environmental Biotechnology, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Emran Hossain Sajib
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Anamika Deb
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Shiuly Sinha
- Faculty of Biotechnology and Genetic Engineering, Sylhet Agricultural University, Sylhet, 3100, Bangladesh
| | - Kazi Faizul Azim
- Department of Microbial Biotechnology, Sylhet Agricultural University, Sylhet, 3100, Bangladesh.
- Faculté de Pharmacie, Université de Tours, 37200, Tours, France.
| |
Collapse
|
44
|
Organizing the bacterial annotation space with amino acid sequence embeddings. BMC Bioinformatics 2022; 23:385. [PMID: 36151519 PMCID: PMC9502642 DOI: 10.1186/s12859-022-04930-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 08/11/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Due to the ever-expanding gap between the number of proteins being discovered and their functional characterization, protein function inference remains a fundamental challenge in computational biology. Currently, known protein annotations are organized in human-curated ontologies, however, all possible protein functions may not be organized accurately. Meanwhile, recent advancements in natural language processing and machine learning have developed models which embed amino acid sequences as vectors in n-dimensional space. So far, these embeddings have primarily been used to classify protein sequences using manually constructed protein classification schemes. RESULTS In this work, we describe the use of amino acid sequence embeddings as a systematic framework for studying protein ontologies. Using a sequence embedding, we show that the bacterial carbohydrate metabolism class within the SEED annotation system contains 48 clusters of embedded sequences despite this class containing 29 functional labels. Furthermore, by embedding Bacillus amino acid sequences with unknown functions, we show that these unknown sequences form clusters that are likely to have similar biological roles. CONCLUSIONS This study demonstrates that amino acid sequence embeddings may be a powerful tool for developing more robust ontologies for annotating protein sequence data. In addition, embeddings may be beneficial for clustering protein sequences with unknown functions and selecting optimal candidate proteins to characterize experimentally.
Collapse
|
45
|
Gagalova KK, Warren RL, Coombe L, Wong J, Nip KM, Yuen MMS, Whitehill JGA, Celedon JM, Ritland C, Taylor GA, Cheng D, Plettner P, Hammond SA, Mohamadi H, Zhao Y, Moore RA, Mungall AJ, Boyle B, Laroche J, Cottrell J, Mackay JJ, Lamothe M, Gérardi S, Isabel N, Pavy N, Jones SJM, Bohlmann J, Bousquet J, Birol I. Spruce giga-genomes: structurally similar yet distinctive with differentially expanding gene families and rapidly evolving genes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:1469-1485. [PMID: 35789009 DOI: 10.1111/tpj.15889] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Revised: 06/22/2022] [Accepted: 06/27/2022] [Indexed: 06/15/2023]
Abstract
Spruces (Picea spp.) are coniferous trees widespread in boreal and mountainous forests of the northern hemisphere, with large economic significance and enormous contributions to global carbon sequestration. Spruces harbor very large genomes with high repetitiveness, hampering their comparative analysis. Here, we present and compare the genomes of four different North American spruces: the genome assemblies for Engelmann spruce (Picea engelmannii) and Sitka spruce (Picea sitchensis) together with improved and more contiguous genome assemblies for white spruce (Picea glauca) and for a naturally occurring introgress of these three species known as interior spruce (P. engelmannii × glauca × sitchensis). The genomes were structurally similar, and a large part of scaffolds could be anchored to a genetic map. The composition of the interior spruce genome indicated asymmetric contributions from the three ancestral genomes. Phylogenetic analysis of the nuclear and organelle genomes revealed a topology indicative of ancient reticulation. Different patterns of expansion of gene families among genomes were observed and related with presumed diversifying ecological adaptations. We identified rapidly evolving genes that harbored high rates of non-synonymous polymorphisms relative to synonymous ones, indicative of positive selection and its hitchhiking effects. These gene sets were mostly distinct between the genomes of ecologically contrasted species, and signatures of convergent balancing selection were detected. Stress and stimulus response was identified as the most frequent function assigned to expanding gene families and rapidly evolving genes. These two aspects of genomic evolution were complementary in their contribution to divergent evolution of presumed adaptive nature. These more contiguous spruce giga-genome sequences should strengthen our understanding of conifer genome structure and evolution, as their comparison offers clues into the genetic basis of adaptation and ecology of conifers at the genomic level. They will also provide tools to better monitor natural genetic diversity and improve the management of conifer forests. The genomes of four closely related North American spruces indicate that their high similarity at the morphological level is paralleled by the high conservation of their physical genome structure. Yet, the evidence of divergent evolution is apparent in their rapidly evolving genomes, supported by differential expansion of key gene families and large sets of genes under positive selection, largely in relation to stimulus and environmental stress response.
Collapse
Affiliation(s)
- Kristina K Gagalova
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - Lauren Coombe
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - Johnathan Wong
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - Macaire Man Saint Yuen
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Justin G A Whitehill
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Jose M Celedon
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Carol Ritland
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Greg A Taylor
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - Dean Cheng
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - Patrick Plettner
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - S Austin Hammond
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
- Next-Generation Sequencing Facility, University of Saskatchewan, Saskatoon, SK, S7N 5E5, Canada
| | - Hamid Mohamadi
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - Yongjun Zhao
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - Richard A Moore
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - Andrew J Mungall
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - Brian Boyle
- Institute for Systems and Integrative Biology, Université Laval, Québec, QC, GIV 0A6, Canada
| | - Jérôme Laroche
- Institute for Systems and Integrative Biology, Université Laval, Québec, QC, GIV 0A6, Canada
| | - Joan Cottrell
- Forest Research, U.K. Forestry Commission, Northern Research Station, Roslin, EH25 9SY, Midlothian, UK
| | - John J Mackay
- Department of Plant Sciences, University of Oxford, Oxford, OX1 3RB, UK
| | - Manuel Lamothe
- Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre, Québec, QC, G1V 4C7, Canada
| | - Sébastien Gérardi
- Institute for Systems and Integrative Biology, Université Laval, Québec, QC, GIV 0A6, Canada
- Canada Research Chair in Forest Genomics, Forest Research Centre, Université Laval, Québec, QC, G1V 0A6, Canada
| | - Nathalie Isabel
- Natural Resources Canada, Canadian Forest Service, Laurentian Forestry Centre, Québec, QC, G1V 4C7, Canada
- Canada Research Chair in Forest Genomics, Forest Research Centre, Université Laval, Québec, QC, G1V 0A6, Canada
| | - Nathalie Pavy
- Institute for Systems and Integrative Biology, Université Laval, Québec, QC, GIV 0A6, Canada
- Canada Research Chair in Forest Genomics, Forest Research Centre, Université Laval, Québec, QC, G1V 0A6, Canada
| | - Steven J M Jones
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| | - Joerg Bohlmann
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Jean Bousquet
- Institute for Systems and Integrative Biology, Université Laval, Québec, QC, GIV 0A6, Canada
- Canada Research Chair in Forest Genomics, Forest Research Centre, Université Laval, Québec, QC, G1V 0A6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, V5Z 4S6, Canada
| |
Collapse
|
46
|
Garg P, Vanamamalai VK, Jali I, Sharma S. In silico prediction of the animal susceptibility and virtual screening of natural compounds against SARS-CoV-2: Molecular dynamics simulation based analysis. Front Genet 2022; 13:906955. [PMID: 36110222 PMCID: PMC9468858 DOI: 10.3389/fgene.2022.906955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/03/2022] [Indexed: 11/17/2022] Open
Abstract
COVID-19 is an infectious disease caused by the SARS-CoV-2 virus. It has six open reading frames (orf1ab, orf3a, orf6, orf7a, orf8, and orf10), a spike protein, a membrane protein, an envelope small membrane protein, and a nucleocapsid protein, out of which, orf1ab is the largest ORF coding different important non-structural proteins. In this study, an effort was made to evaluate the susceptibility of different animals against SARS-CoV-2 by analyzing the interactions of Spike and ACE2 proteins of the animals and propose a list of potential natural compounds binding to orf1ab of SARS-CoV-2. Here, we analyzed structural interactions between spike proteins of SARS-CoV-2 and the ACE2 receptor of 16 different hosts. A simulation for 50 ns was performed on these complexes. Based on post-simulation analysis, Chelonia mydas was found to have a more stable complex, while Bubalus bubalis, Aquila chrysaetos chrysaetos, Crocodylus porosus, and Loxodonta africana were found to have the least stable complexes with more fluctuations than all other organisms. Apart from that, we performed domain assignment of orf1ab of SARS-CoV-2 and identified 14 distinct domains. Out of these, Domain 3 (DNA/RNA polymerases) was selected as a target, as it showed no similarities with host proteomes and was validated in silico. Then, the top 10 molecules were selected from the virtual screening of ∼1.8 lakh molecules from the ZINC database, based on binding energy, and validated for ADME and toxicological properties. Three molecules were selected and analyzed further. The structural analysis showed that these molecules were residing within the pocket of the receptor. Finally, a simulation for 200 ns was performed on complexes with three selected molecules. Based on post-simulation analysis (RMSD, RMSF, Rg, SASA, and energies), the molecule ZINC000103666966 was found as the most suitable inhibitory compound against Domain 3. As this is an in silico prediction, further experimental studies could unravel the potential of the proposed molecule against SARS-CoV-2.
Collapse
|
47
|
Climate-Endangered Arctic Epishelf Lake Harbors Viral Assemblages with Distinct Genetic Repertoires. Appl Environ Microbiol 2022; 88:e0022822. [PMID: 36005820 PMCID: PMC9469726 DOI: 10.1128/aem.00228-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Milne Fiord, located on the coastal margin of the Last Ice Area (LIA) in the High Arctic (82°N, Canada), harbors an epishelf lake, a rare type of ice-dependent ecosystem in which a layer of freshwater overlies marine water connected to the open ocean. This microbe-dominated ecosystem faces catastrophic change due to the deterioration of its ice environment related to warming temperatures. We produced the first assessment of viral abundance, diversity, and distribution in this vulnerable ecosystem and explored the niches available for viral taxa and the functional genes underlying their distribution. We found that the viral community in the freshwater layer was distinct from, and more diverse than, the community in the underlying seawater and contained a different set of putative auxiliary metabolic genes, including the sulfur starvation-linked gene tauD and the gene coding for patatin-like phospholipase. The halocline community resembled the freshwater more than the marine community, but harbored viral taxa unique to this layer. We observed distinct viral assemblages immediately below the halocline, at a depth that was associated with a peak of prasinophyte algae and the viral family Phycodnaviridae. We also assembled 15 complete circular genomes, including a putative Pelagibacter phage with a marine distribution. It appears that despite its isolated and precarious situation, the varied niches in this epishelf lake support a diverse viral community, highlighting the importance of characterizing underexplored microbiota in the Last Ice Area before these ecosystems undergo irreversible change. IMPORTANCE Viruses are key to understanding polar aquatic ecosystems, which are dominated by microorganisms. However, studies of viral communities are challenging to interpret because the vast majority of viruses are known only from sequence fragments, and their taxonomy, hosts, and genetic repertoires are unknown. Our study establishes a basis for comparison that will advance understanding of viral ecology in diverse global environments, particularly in the High Arctic. Rising temperatures in this region mean that researchers have limited time remaining to understand the biodiversity and biogeochemical cycles of ice-dependent environments and the consequences of these rapid, irreversible changes. The case of the Milne Fiord epishelf lake has special urgency because of the rarity of this type of “floating lake” ecosystem and its location in the Last Ice Area, a region of thick sea ice with global importance for conservation efforts.
Collapse
|
48
|
Dwivedi A, Moirangthem A, Pandey H, Sharma P, Srivastava P, Yadav P, Saxena D, Phadke S, Dabadghao P, Gupta N, Kabra M, Goyal R, Biswas R, Mangaraj S, Bhar D, Chowdhury S, Agarwal A, Mandal K. Von Hippel–Lindau (VHL) disease and VHL-associated tumors in Indian subjects: VHL gene testing in a resource constraint setting. EGYPTIAN JOURNAL OF MEDICAL HUMAN GENETICS 2022. [DOI: 10.1186/s43042-022-00338-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Von Hippel–Lindau (VHL) syndrome is a familial cancer syndrome caused by mutations in VHL gene. It is characterized by the formation of benign and malignant tumors like retinal angioma, cerebellar hemangioblastoma, spinal hemangioblastoma, renal cell carcinoma, pheochromocytoma, pancreatic and renal cysts, and endolymphatic sac tumors. Germline mutations in VHL gene have also been reported in isolated VHL-associated tumors. VHL gene is a small gene with 3 coding exons and can be easily tested even in a resource constraint setting.
Objective
To describe clinical presentation and estimate the diagnostic yield of in VHL and VHL-associated tumors.
Methods
This is a descriptive study in a hospital setting. Here, we describe the clinical and molecular data of 69 patients with suspected VHL or having VHL-associated tumors. Sanger sequencing of coding sequences and conserved splice sites of VHL gene were done in all patients. Multiplex ligation-dependent probe amplification (MLPA) of VHL gene to detect large deletions/duplications was performed for 18 patients with no pathogenic sequence variations.
Results
Among tumor types at presentation, pheochromocytoma was seen in 49% (34/69), hemangioblastoma was seen in 30% (21/69), and renal cell carcinoma was seen in 7% (5/69). Rest had other tumors like paraganglioma, endolymphatic sac papillary tumors, cerebellar astrocytoma and pancreatic cyst. Seven patients (10%) had more than one tumor at the time of diagnosis. Pathogenic variations in VHL gene were identified in 31probands by Sanger sequencing; 18 were missense, 2 nonsense and 2 small indels. A heterozygous deletion of exon 3 was detected by MLPA in one patient among 18 patients for whom MLPA was done. Overall, the molecular yield was 46% cases (32/69). Family history was present in 7 mutation positive cases (22%). Overall, 11 families (16%) opted for pre-symptomatic mutation testing in the family.
Conclusions
Mutation testing is indicated in VHL and VHL-associated tumors. The testing facility is easy and can be adopted easily in developing countries like India. The yield is good, and with fairly high incidence of familial cases, molecular testing can help in pre-symptomatic testing and surveillance.
Collapse
|
49
|
Ferrer-Bonsoms JA, Gimeno M, Olaverri D, Sacristan P, Lobato C, Castilla C, Carazo F, Rubio A. EventPointer 3.0: flexible and accurate splicing analysis that includes studying the differential usage of protein-domains. NAR Genom Bioinform 2022; 4:lqac067. [PMID: 36128425 PMCID: PMC9477077 DOI: 10.1093/nargab/lqac067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 07/29/2022] [Accepted: 09/07/2022] [Indexed: 12/05/2022] Open
Abstract
Alternative splicing (AS) plays a key role in cancer: all its hallmarks have been associated with different mechanisms of abnormal AS. The improvement of the human transcriptome annotation and the availability of fast and accurate software to estimate isoform concentrations has boosted the analysis of transcriptome profiling from RNA-seq. The statistical analysis of AS is a challenging problem not yet fully solved. We have included in EventPointer (EP), a Bioconductor package, a novel statistical method that can use the bootstrap of the pseudoaligners. We compared it with other state-of-the-art algorithms to analyze AS. Its performance is outstanding for shallow sequencing conditions. The statistical framework is very flexible since it is based on design and contrast matrices. EP now includes a convenient tool to find the primers to validate the discoveries using PCR. We also added a statistical module to study alteration in protein domain related to AS. Applying it to 9514 patients from TCGA and TARGET in 19 different tumor types resulted in two conclusions: i) aberrant alternative splicing alters the relative presence of Protein domains and, ii) the number of enriched domains is strongly correlated with the age of the patients.
Collapse
Affiliation(s)
- Juan A Ferrer-Bonsoms
- Biomedical Engineering and Science Department, TECNUN, Universidad de Navarra , San Sebastián , Spain
| | - Marian Gimeno
- Biomedical Engineering and Science Department, TECNUN, Universidad de Navarra , San Sebastián , Spain
| | - Danel Olaverri
- Biomedical Engineering and Science Department, TECNUN, Universidad de Navarra , San Sebastián , Spain
| | - Pablo Sacristan
- Biomedical Engineering and Science Department, TECNUN, Universidad de Navarra , San Sebastián , Spain
| | - César Lobato
- Biomedical Engineering and Science Department, TECNUN, Universidad de Navarra , San Sebastián , Spain
| | - Carlos Castilla
- Biomedical Engineering and Science Department, TECNUN, Universidad de Navarra , San Sebastián , Spain
| | - Fernando Carazo
- Biomedical Engineering and Science Department, TECNUN, Universidad de Navarra , San Sebastián , Spain
| | - Angel Rubio
- Biomedical Engineering and Science Department, TECNUN, Universidad de Navarra , San Sebastián , Spain
| |
Collapse
|
50
|
Bashiri R, Curtis TP, Ofiţeru ID. The limitations of the current protein classification tools in identifying lipolytic features in putative bacterial lipase sequences. J Biotechnol 2022; 351:30-37. [PMID: 35523393 DOI: 10.1016/j.jbiotec.2022.04.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 04/26/2022] [Accepted: 04/26/2022] [Indexed: 11/19/2022]
Abstract
Metagenomics sequencing has generated millions of new protein sequences, most of them with unknown functions. A relatively quick first step for function assignment is to use the existing public protein databases and their scanning tools. However, to date these tools are not able to identify all sequence features like conserved motifs or patterns. In this study we evaluated the capability of several protein public databases (e.g., InterPro, PROSITE, ESTHER, pfam, AlphaFold etc) and their scanning tools for identifying lipolytic features in 78 putative cold-adapted bacterial lipase sequences. Novel lipases that can tolerate extreme conditions have great biotechnological importance. We obtained the putative cold-adapted lipolytic sequences from the metagenomic study of anaerobic psychrophilic microbial community treating domestic wastewater at 4 and 15 ℃. Both newer and conventional protein classifiers failed to find lipolytic features for most of the putative lipases. InterProScan predicted lipase family membership for only 18 of the putative lipase sequences. For more than half of them (41 out of 78) InterProScan could not predict any protein family membership, let alone find lipolytic features in them. However, when the Lipase Engineering Database and AlphaFold were used, half of those sequences were classified. Conventional databases like PROSITE could find lipolytic patterns for 9 of the putative lipolytic sequences of which only one was identified by InterProScan as a lipase. Moreover, different scanning tools made different and inconsistent predictions for a certain putative lipase sequence. Even InterProScan, which integrates predictions from 13 protein member databases, did not have a consensus prediction for a certain lipase sequence. Our study shows that there is lack of information in public protein databases about bacterial lipase sequences and this limits their lipolytic feature prediction and biotechnological application. The integration of AlphaFold within the InterPro can improve the lipase identification and classification significantly.
Collapse
Affiliation(s)
- Reihaneh Bashiri
- School of Engineering, Newcastle University, Newcastle-upon-Tyne NE1 7RU, UK
| | - Thomas P Curtis
- School of Engineering, Newcastle University, Newcastle-upon-Tyne NE1 7RU, UK
| | - Irina D Ofiţeru
- School of Engineering, Newcastle University, Newcastle-upon-Tyne NE1 7RU, UK.
| |
Collapse
|