1
|
Yan Z, Shi X, Cai Y, Sun W, He P, Wu L, Zhang J, Guo X, Wang B, Yu F, Liu W. Chromosome-level genome assemblies of Verpa bohemica and Verpa conica. Sci Data 2025; 12:880. [PMID: 40425600 PMCID: PMC12117090 DOI: 10.1038/s41597-025-05224-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2025] [Accepted: 05/16/2025] [Indexed: 05/29/2025] Open
Abstract
Verpa, commonly known as "early morel" or "false morel", plays an important ecological role and offers considerable economic and medicinal potential. Despite their significance, research on Verpa species, particularly V. bohemica and V. conica, remains limited. In this study, we assembled high-quality sub-chromosomal genomes of six Verpa strains using Nanopore and Illumina sequencing, with average sizes of 44.38 Mb for V. bohemica and 45.40 Mb for V. conica. Specifically, the assemblies of V. bohemica strain 21108 and V. conica strain 21120 were anchored to 26 and 25 chromosomes with Hi-C technologies, respectively. The consensus quality value (QV) of both V. bohemica and V. conica exceeded 40. In addition, an average of 11,024 and 11,052 protein-coding genes were identified for V. bohemica and V. conica, respectively, with BUSCO completeness scores ranging from 98.71% to 99.24%. Overall, these reported genomes will provide valuable genomic resources for the evolution and ecological roles research of Verpa.
Collapse
Affiliation(s)
- Zhuyue Yan
- The Germplasm Bank of Wild Species & Yunnan Key Laboratory for Fungal Diversity and Green Development, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Xiaofei Shi
- The Germplasm Bank of Wild Species & Yunnan Key Laboratory for Fungal Diversity and Green Development, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
- Key Laboratory of Chemistry in Ethnic Medicinal Resources, School of Ethnic Medicine, Yunnan Minzu University Kunming, Kunming, 650500, China
| | - Yingli Cai
- Key Laboratory of Chemistry in Ethnic Medicinal Resources, School of Ethnic Medicine, Yunnan Minzu University Kunming, Kunming, 650500, China
| | - Wenhua Sun
- College of Food and Biological Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China
| | - Peixin He
- College of Food and Biological Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China
| | - Liyuan Wu
- The Germplasm Bank of Wild Species & Yunnan Key Laboratory for Fungal Diversity and Green Development, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Jin Zhang
- The Germplasm Bank of Wild Species & Yunnan Key Laboratory for Fungal Diversity and Green Development, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Xing Guo
- Yichun Branch of Heilongjiang Academy of Forestry Sciences, Yichun, 153000, China
| | - Bo Wang
- Gansu Province Xiaolong mountains forestry protect center's Dangchuan forest farm, Tianshui, 741020, China
| | - Fuqiang Yu
- The Germplasm Bank of Wild Species & Yunnan Key Laboratory for Fungal Diversity and Green Development, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China.
| | - Wei Liu
- The Germplasm Bank of Wild Species & Yunnan Key Laboratory for Fungal Diversity and Green Development, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China.
- Key Laboratory of Chemistry in Ethnic Medicinal Resources, School of Ethnic Medicine, Yunnan Minzu University Kunming, Kunming, 650500, China.
| |
Collapse
|
2
|
Zheng W, Wuyun Q, Li Y, Liu Q, Zhou X, Peng C, Zhu Y, Freddolino L, Zhang Y. Deep-learning-based single-domain and multidomain protein structure prediction with D-I-TASSER. Nat Biotechnol 2025:10.1038/s41587-025-02654-4. [PMID: 40410405 DOI: 10.1038/s41587-025-02654-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Accepted: 03/26/2025] [Indexed: 05/25/2025]
Abstract
The dominant success of deep learning techniques on protein structure prediction has challenged the necessity and usefulness of traditional force field-based folding simulations. We proposed a hybrid approach, deep-learning-based iterative threading assembly refinement (D-I-TASSER), which constructs atomic-level protein structural models by integrating multisource deep learning potentials with iterative threading fragment assembly simulations. D-I-TASSER introduces a domain splitting and assembly protocol for the automated modeling of large multidomain protein structures. Benchmark tests and the most recent critical assessment of protein structure prediction, 15 experiments demonstrate that D-I-TASSER outperforms AlphaFold2 and AlphaFold3 on both single-domain and multidomain proteins. Large-scale folding experiments further show that D-I-TASSER could fold 81% of protein domains and 73% of full-chain sequences in the human proteome with results highly complementary to recently released models by AlphaFold2. These results highlight a new avenue to integrate deep learning with classical physics-based folding simulations for high-accuracy protein structure and function predictions that are usable in genome-wide applications.
Collapse
Affiliation(s)
- Wei Zheng
- NITFID, School of Statistics and Data Science, AAIS, LPMC and KLMDASR, Nankai University, Tianjin, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Yang Li
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Quancheng Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Chunxiang Peng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Yiheng Zhu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| | - Yang Zhang
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.
- Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
3
|
López-Pérez J, Cortés P, Campoy S, Erill I, Llagostera M. Deciphering the Causes of IbfA-Mediated Abortive Infection in the P22-like Phage UAB_Phi20. Int J Mol Sci 2025; 26:4918. [PMID: 40430055 PMCID: PMC12111858 DOI: 10.3390/ijms26104918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2025] [Revised: 05/07/2025] [Accepted: 05/15/2025] [Indexed: 05/29/2025] Open
Abstract
The study of bacterial defense mechanisms against phages is becoming increasingly relevant due to their impact on the effectiveness of phage therapy. Employing a multifaceted approach that combines bioinformatics, molecular microbiology, TEM microscopy, and conventional microbiology techniques, here, we identify the ibfA gene as a novel defense factor targeting the virulent phage UAB_Phi20, acquired by Salmonella Typhimurium through lateral transfer on the IncI1α conjugative plasmid pUA1135 after oral phage therapy in broilers. IbfA, a two-domain protein containing ATPase and TOPRIM domains, significantly reduces UAB_Phi20 productivity, as indicated by decreased EOP, ECOI, and a diminished burst size, potentially reducing cellular viability without causing observable lysis. Our results indicate that IbfA enhances the transcription of early genes, including the antirepressor ant, which inhibits the C2 repressor of the lytic cycle. This may cause an imbalance in Cro/C2 concentration, leading to the observed reduction in the transcription of late genes encoding structural and cellular lysis proteins, and resulting in the abortion of UAB_Phi20 infection.
Collapse
Affiliation(s)
- Júlia López-Pérez
- Molecular Microbiology Group, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Barcelona, Spain; (J.L.-P.); (S.C.); (M.L.)
| | - Pilar Cortés
- Molecular Microbiology Group, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Barcelona, Spain; (J.L.-P.); (S.C.); (M.L.)
| | - Susana Campoy
- Molecular Microbiology Group, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Barcelona, Spain; (J.L.-P.); (S.C.); (M.L.)
| | - Ivan Erill
- Departament d’Enginyeria de la Informació i de les Comunicacions Àrea de Ciències de la Computació i Intel·ligència Artificial, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Barcelona, Spain;
| | - Montserrat Llagostera
- Molecular Microbiology Group, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Barcelona, Spain; (J.L.-P.); (S.C.); (M.L.)
| |
Collapse
|
4
|
Policarpo M, Salzburger W, Maumus F, Gilbert C. Multiple Horizontal Transfers of Immune Genes Between Distantly Related Teleost Fishes. Mol Biol Evol 2025; 42:msaf107. [PMID: 40378191 PMCID: PMC12107551 DOI: 10.1093/molbev/msaf107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 04/29/2025] [Accepted: 05/02/2025] [Indexed: 05/18/2025] Open
Abstract
Horizontal gene transfer (HGT) is less frequent in eukaryotes than in prokaryotes, yet can have strong functional implications and was proposed as a causal factor for major adaptations in several eukaryotic lineages. Most cases of eukaryote HGT reported to date are inter-domain transfers, and few studies have investigated eukaryote-to-eukaryote HGTs. Here, we performed a large-scale survey of HGT among 242 species of ray-finned fishes. We found multiple lines of evidence supporting 19 teleost-to-teleost HGT events that involve 17 different genes in 11 teleost fish orders. The genes involved in these transfers show lower synonymous divergence than expected under vertical transmission, their phylogeny is inconsistent with that of teleost fishes, and they occur at non-syntenic positions in donor and recipient lineages. The distribution of HGT events in the teleost tree is heterogenous, with 8 of the 19 transfers occurring between the same two orders (Osmeriformes and Clupeiformes). Though we favor a scenario involving multiple HGT events, future work should evaluate whether hybridization between species belonging to different teleost orders may generate HGT-like patterns. Besides the previously reported transfer of an antifreeze protein, most transferred genes play roles in immunity or are pore-forming proteins, suggesting that such genes may be more likely than others to confer a strong selective advantage to the recipient species. Overall, our work shows that teleost-to-teleost HGT has occurred on multiple occasions, and it will be worth further quantifying these transfers and evaluating their impact on teleost evolution as more genomes are sequenced.
Collapse
Affiliation(s)
- Maxime Policarpo
- Zoological Institute, Department of Environmental Sciences, University of Basel, Basel, Switzerland
| | - Walter Salzburger
- Zoological Institute, Department of Environmental Sciences, University of Basel, Basel, Switzerland
| | - Florian Maumus
- URGI, INRAE, Université Paris-Saclay, Versailles 78026, France
| | - Clément Gilbert
- Université Paris-Saclay, CNRS, IRD, UMR Évolution, Génomes, Comportement et Écologie, Gif-sur-Yvette 91198, France
| |
Collapse
|
5
|
Campoy A, Gomez-Lucia E, Garcia T, Crespo E, Olmeda S, Valcarcel F, Fandiño S, Domenech A. First Description of a Carnivore Protoparvovirus Associated with a Clinical Case in the Iberian Lynx ( Lynx pardinus). Animals (Basel) 2025; 15:1026. [PMID: 40218419 PMCID: PMC11988045 DOI: 10.3390/ani15071026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2025] [Revised: 03/22/2025] [Accepted: 03/28/2025] [Indexed: 04/14/2025] Open
Abstract
One of the main threats for the survival of the Iberian lynx is infectious disease. Feline parvoviruses cause often fatal diseases in cats and have been isolated from different species of Felidae and other carnivores. The present study is the first description of a parvoviral sequence isolated from the brain of an Iberian lynx which died four weeks after being transferred to a quarantine centre from a hunting estate in Castilla-La-Mancha (southern border of the Iberian plateau). Four days prior to death, it had developed anorexia and muscle weakness. The nucleotide sequence, at 4589 nt long (GenBank PP781551), was most proximal to that isolated from a Eurasian badger in Italy but also showed great homology with others from cats and other carnivores isolated in Spain and Italy, including that from a cat sequenced by us to elucidate the origin of the infection, which has not been clarified. The phylogenetic analysis of the capsid protein, VP2, which determines tropism and host range, confirmed that the lynx sequence was most proximal to feline than to canine parvoviruses, and was thus typed as Protoparvovirus carnivoran1. More studies, including serology, are needed to understand the pathogenesis of this infection.
Collapse
Affiliation(s)
- Almudena Campoy
- Department of Animal Health, Faculty of Veterinary Medicine, Complutense University of Madrid, 28040 Madrid, Spain; (A.C.); (E.G.-L.); (T.G.); (S.O.); (S.F.)
- Research Group of “Animal Viruses”, Complutense University of Madrid, 28040 Madrid, Spain
| | - Esperanza Gomez-Lucia
- Department of Animal Health, Faculty of Veterinary Medicine, Complutense University of Madrid, 28040 Madrid, Spain; (A.C.); (E.G.-L.); (T.G.); (S.O.); (S.F.)
- Research Group of “Animal Viruses”, Complutense University of Madrid, 28040 Madrid, Spain
| | - Tania Garcia
- Department of Animal Health, Faculty of Veterinary Medicine, Complutense University of Madrid, 28040 Madrid, Spain; (A.C.); (E.G.-L.); (T.G.); (S.O.); (S.F.)
| | - Elena Crespo
- Wildlife Recovery Centre “El Chaparrillo”, 13071 Ciudad Real, Castilla-La-Mancha, Spain;
| | - Sonia Olmeda
- Department of Animal Health, Faculty of Veterinary Medicine, Complutense University of Madrid, 28040 Madrid, Spain; (A.C.); (E.G.-L.); (T.G.); (S.O.); (S.F.)
| | - Felix Valcarcel
- Group of Animal Parasitology, Department of Animal Reproduction, INIA-CSIC, 28040 Madrid, Spain;
| | - Sergio Fandiño
- Department of Animal Health, Faculty of Veterinary Medicine, Complutense University of Madrid, 28040 Madrid, Spain; (A.C.); (E.G.-L.); (T.G.); (S.O.); (S.F.)
- Research Group of “Animal Viruses”, Complutense University of Madrid, 28040 Madrid, Spain
| | - Ana Domenech
- Department of Animal Health, Faculty of Veterinary Medicine, Complutense University of Madrid, 28040 Madrid, Spain; (A.C.); (E.G.-L.); (T.G.); (S.O.); (S.F.)
- Research Group of “Animal Viruses”, Complutense University of Madrid, 28040 Madrid, Spain
| |
Collapse
|
6
|
Isbilir B, Yeates A, Alva V, Bharat TAM. Mapping the ultrastructural topology of the corynebacterial cell surface. PLoS Biol 2025; 23:e3003130. [PMID: 40233127 PMCID: PMC12021427 DOI: 10.1371/journal.pbio.3003130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2025] [Revised: 04/24/2025] [Accepted: 03/25/2025] [Indexed: 04/17/2025] Open
Abstract
Corynebacterium glutamicum is a diderm bacterium extensively used in the industrial-scale production of amino acids. Corynebacteria belong to the bacterial family Mycobacteriaceae, which is characterized by a highly unusual cell envelope with an outer membrane consisting of mycolic acids, called mycomembrane. The mycomembrane is further coated by a surface (S-)layer array in C. glutamicum, making this cell envelope highly distinctive. Despite the biotechnological significance of C. glutamicum and biomedical significance of mycomembrane-containing pathogens, ultrastructural and molecular details of its distinctive cell envelope remain poorly characterized. To address this, we investigated the cell envelope of C. glutamicum using electron cryotomography and cryomicroscopy of focused ion beam-milled single and dividing cells. Our cellular imaging allowed us to map the different components of the cell envelope onto the tomographic density. Our data reveal that C. glutamicum has a variable cell envelope, with the S-layer decorating the mycomembrane in a patchy manner. We further isolated and resolved the structure of the S-layer at 3.1 Å-resolution using single particle electron cryomicroscopy. Our structure shows that the S-layer of C. glutamicum is composed of a hexagonal array of the PS2 protein, which interacts directly with the mycomembrane via an anchoring segment containing a coiled-coil motif. Bioinformatic analyses revealed that the PS2 S-layer is sparsely yet exclusively present within the Corynebacterium genus and absent in other genera of the Mycobacteriaceae family, suggesting distinct evolutionary pathways in the development of their cell envelopes. Our structural and cellular data collectively provide a topography of the unusual C. glutamicum cell surface, features of which are shared by many pathogenic and microbiome-associated bacteria, as well as by several industrially significant bacterial species.
Collapse
Affiliation(s)
- Buse Isbilir
- Structural Studies Division, MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Anna Yeates
- Structural Studies Division, MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Vikram Alva
- Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Tanmay A. M. Bharat
- Structural Studies Division, MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| |
Collapse
|
7
|
Hermanns T, Kolek S, Uthoff M, de Heiden RA, Mulder MPC, Baumann U, Hofmann K. A family of bacterial Josephin-like deubiquitinases with an irreversible cleavage mode. Mol Cell 2025; 85:1202-1215.e5. [PMID: 40037356 DOI: 10.1016/j.molcel.2025.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 12/05/2024] [Accepted: 02/04/2025] [Indexed: 03/06/2025]
Abstract
Many intracellular bacteria secrete deubiquitinase (DUB) effectors into eukaryotic host cells to keep the bacterial surface or the enclosing vesicle membrane free of ubiquitin marks. This study describes a family of DUBs from several bacterial genera, including Simkania, Parachlamydia, Burkholderia, and Pigmentiphaga, which is structurally related to eukaryotic Josephin-type DUBs but contains members that catalyze a unique destructive substrate deubiquitination. These ubiquitin C-terminal clippases (UCCs) cleave ubiquitin before the C-terminal diGly motif, thereby truncating the modifier and leaving a remnant on the substrate. By comparing the crystal structures of substrate-bound clippases and a closely related conventional DUB, we identified the factors causing this shift and found them to be conserved in other clippases, including one highly specific for M1-linked ubiquitin chains. This enzyme class has great potential to serve as tools for studying the ubiquitin system, particularly aspects involving branched chains.
Collapse
Affiliation(s)
- Thomas Hermanns
- Institute for Genetics, University of Cologne, Zülpicher Straße 47a, 50674 Cologne, Germany
| | - Susanne Kolek
- Institute for Genetics, University of Cologne, Zülpicher Straße 47a, 50674 Cologne, Germany
| | - Matthias Uthoff
- Institute of Biochemistry, University of Cologne, Zülpicher Straße 47, 50674 Cologne, Germany
| | - Richard A de Heiden
- Department of Cell and Chemical Biology, Leiden University Medical Center (LUMC), Einthovenweg 20, 2333ZC Leiden, the Netherlands
| | - Monique P C Mulder
- Department of Cell and Chemical Biology, Leiden University Medical Center (LUMC), Einthovenweg 20, 2333ZC Leiden, the Netherlands
| | - Ulrich Baumann
- Institute of Biochemistry, University of Cologne, Zülpicher Straße 47, 50674 Cologne, Germany
| | - Kay Hofmann
- Institute for Genetics, University of Cologne, Zülpicher Straße 47a, 50674 Cologne, Germany.
| |
Collapse
|
8
|
Aceves-Ewing NM, Lanza DG, Marcogliese PC, Lu D, Hsu CW, Gonzalez M, Christiansen AE, Rasmussen TL, Ho AJ, Gaspero A, Seavitt J, Dickinson ME, Yuan B, Shayota BJ, Pachter S, Hu X, Day-Salvatore DL, Mackay L, Kanca O, Wangler MF, Potocki L, Rosenfeld JA, Lewis RA, Chao HT, Lee B, Lee S, Yamamoto S, Bellen HJ, Burrage LC, Heaney JD. Uncovering Phenotypic Expansion in AXIN2-Related Disorders through Precision Animal Modeling. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2024.12.05.24318524. [PMID: 39677486 PMCID: PMC11643287 DOI: 10.1101/2024.12.05.24318524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Heterozygous pathogenic variants in AXIN2 are associated with oligodontia-colorectal cancer syndrome (ODCRCS), a disorder characterized by oligodontia, colorectal cancer, and in some cases, sparse hair and eyebrows. We have identified four individuals with one of two de novo , heterozygous variants (NM_004655.4:c.196G>A, p.(Glu66Lys) and c.199G>A, p.(Gly67Arg)) in AXIN2 whose presentations expand the phenotype of AXIN2-related disorders. In addition to ODCRCS features, these individuals have global developmental delay, microcephaly, and limb, ophthalmologic, and renal abnormalities. Structural modeling of these variants suggests that they disrupt AXIN2 binding to tankyrase, which regulates AXIN2 levels through PARsylation and subsequent proteasomal degradation. To test whether these variants produce a phenotype in vivo , we utilized an innovative prime editing N1 screen to phenotype heterozygous (p.E66K) mouse embryos, which were perinatal lethal with short palate and skeletal abnormalities, contrary to published viable Axin2 null mouse models. Modeling of the p.E66K variant in the Drosophila wing revealed gain-of-function activity compared to reference AXIN2. However, the variant showed loss-of-function activity in the fly eye compared to reference AXIN2, suggesting that the mechanism by which p.E66K affects AXIN2 function is cell context-dependent. Together, our studies in humans, mice, and flies demonstrate that specific variants in the tankyrase-binding domain of AXIN2 are pathogenic, leading to phenotypic expansion with context-dependent effects on AXIN2 function and WNT signaling. Moreover, the modeling strategies used to demonstrate variant pathogenicity may be beneficial for the resolution of other de novo heterozygous variants of uncertain significance associated with congenital anomalies in humans.
Collapse
|
9
|
Luo J, Luo Y. Learning maximally spanning representations improves protein function annotation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.13.638156. [PMID: 40027840 PMCID: PMC11870436 DOI: 10.1101/2025.02.13.638156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Automated protein function annotation is a fundamental problem in computational biology, crucial for understanding the functional roles of proteins in biological processes, with broad implications in medicine and biotechnology. A persistent challenge in this problem is the imbalanced, long-tail distribution of available function annotations: a small set of well-studied function classes account for most annotated proteins, while many other classes have few annotated proteins, often due to investigative bias, experimental limitations, or intrinsic biases in protein evolution. As a result, existing machine learning models for protein function prediction tend to only optimize the prediction accuracy for well-studied function classes overrepresented in the training data, leading to poor accuracy for understudied functions. In this work, we develop MSRep, a novel deep learning-based protein function annotation framework designed to address this imbalance issue and improve annotation accuracy. MSRep is inspired by an intriguing phenomenon, called neural collapse (NC), commonly observed in high-accuracy deep neural networks used for classification tasks, where hidden representations in the final layer collapse to class-specific mean embeddings, while maintaining maximal inter-class separation. Given that NC consistently emerges across diverse architectures and tasks for high-accuracy models, we hypothesize that inducing NC structure in models trained on imbalanced data can enhance both prediction accuracy and generalizability. To achieve this, MSRep refines a pre-trained protein language model to produce NC-like representations by optimizing an NC-inspired loss function, which ensures that minority functions are equally represented in the embedding space as majority functions, in contrast to conventional classification methods whose embedding spaces are dominated by overrepresented classes. In evaluations across four protein function annotation tasks on the prediction of Enzyme Commission numbers, Gene3D codes, Pfam families, and Gene Ontology terms, MSRep demonstrates superior predictive performance for both well- and underrepresented classes, outperforming several state-of-the-art annotation tools. We anticipate that MSRep will enhance the annotation of understudied functions and novel, uncharacterized proteins, advancing future protein function studies and accelerating the discovery of new functional proteins. The source code of MSRep is available at https://github.com/luo-group/MSRep.
Collapse
Affiliation(s)
- Jiaqi Luo
- School of Computational Science and Engineering, Georgia Institute of Technology
| | - Yunan Luo
- School of Computational Science and Engineering, Georgia Institute of Technology
| |
Collapse
|
10
|
Cheng J, Liu J, Neupane P. Accurate Prediction of Protein Complex Stoichiometry by Integrating AlphaFold3 and Template Information. RESEARCH SQUARE 2025:rs.3.rs-5855710. [PMID: 39975926 PMCID: PMC11838762 DOI: 10.21203/rs.3.rs-5855710/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Protein structure prediction methods require stoichiometry information (i.e., subunit counts) to predict the quaternary structure of protein complexes. However, this information is often unavailable, making stoichiometry prediction crucial for complexes with unknown stoichiometry. Despite its importance, few computational methods address this challenge. In this study, we present an approach that integrates AlphaFold3 structure predictions with homologous template data to predict stoichiometry. The method generates candidate stoichiometries, builds structural models for them using AlphaFold3, ranks them based on AlphaFold3 scores, and further refine predictions with template-based information when available. In the 16th community-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP16), our method achieved 71.4% top-1 accuracy and 92.9% top-3 accuracy, outperforming other predictors in terms of the overall performance. This demonstrates the complementary strengths of AlphaFold3- and template-based predictions and highlights its applicability for uncharacterized protein complexes lacking stoichiometry data.
Collapse
Affiliation(s)
| | - Jian Liu
- University of Missouri - Columbia
| | | |
Collapse
|
11
|
Runthala A, Satya Sri PS, Nair AS, Puttagunta MK, Sekhar Rao TC, Sreya V, Sowmya GR, Reddy GK. Decoding transaminase motifs: Tracing the unknown patterns for enhancing the accuracy of computational screening methodologies. Gene 2025; 936:149091. [PMID: 39557371 DOI: 10.1016/j.gene.2024.149091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 10/28/2024] [Accepted: 11/11/2024] [Indexed: 11/20/2024]
Abstract
Transaminases, enzymes known for their amino group transfer capabilities, encompass four distinct subfamilies: D-alanine transaminase (DATA), L-selective Branched chain aminotransferase (BCAT), and 4-amino-4-deoxychorismate lyase (ADCL) and R-selective aminotransferase (RATA). RATA enzymes are particularly valuable in biocatalysis for synthesizing chiral amines and resolving racemic mixtures, yet their identification in sequence databases is challenging due to the lack of robust motif-based screening methods. Constructing a sequence dataset of transaminases, and categorizing them to various subfamilies, the conserved motifs are screened over the experimentally known ones, and the novel motifs are explored. Phylogenetic clustering of these subfamilies and structural localization of the identified motifs on the Alphafold-predicted protein models of the representative sequences validate their functional importance. For the ADCL, BCAT, DATA, and RATA datasets, we identified 5, 7, 10, and 2 novel motifs, with 3, 5, 7, and 2 motifs localized on secondary structures, confirming their structural importance. Furthermore, the analysis revealed 1, 3, 2, and 1 unique residue patterns of 293-KxxxR-297; 336-KxxxxY-341, 379-ExxxxNxF-386, and 453-ExFxxGT-459; 187-HxxRL-191, and 284-DxRWxxCDIK-293; and 191-HxxRL-195, integrating of which in the known computational tools would improve their accuracy. The conserved residue pattern or motif-based computational approach for robustly screening the transaminases holds promise for unveiling the novel RATA enzymes, facilitating their exploitation in biocatalytic applications.
Collapse
Affiliation(s)
- Ashish Runthala
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, India; Department of Integrated Research & Development, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, India.
| | - Pulla Sai Satya Sri
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, India
| | - Aayush Sasikumar Nair
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, India
| | - Murali Krishna Puttagunta
- Department of Computer Science & Engineering, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, India
| | - T Chandra Sekhar Rao
- Department of Electronics & Communication Engineering, Sri Venkateswara College of Engineering, Tirupati, India
| | - Vajrala Sreya
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, India
| | - Ganugapati Reshma Sowmya
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, India
| | - G Koteswara Reddy
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Guntur, Andhra Pradesh, India
| |
Collapse
|
12
|
Mutz P, Camargo AP, Sahakyan H, Neri U, Butkovic A, Wolf YI, Krupovic M, Dolja VV, Koonin EV. The protein structurome of Orthornavirae and its dark matter. mBio 2025; 16:e0320024. [PMID: 39714180 PMCID: PMC11796362 DOI: 10.1128/mbio.03200-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Accepted: 10/28/2024] [Indexed: 12/24/2024] Open
Abstract
Metatranscriptomics is uncovering more and more diverse families of viruses with RNA genomes comprising the viral kingdom Orthornavirae in the realm Riboviria. Thorough protein annotation and comparison are essential to get insights into the functions of viral proteins and virus evolution. In addition to sequence- and hmm profile‑based methods, protein structure comparison adds a powerful tool to uncover protein functions and relationships. We constructed an Orthornavirae "structurome" consisting of already annotated as well as unannotated ("dark matter") proteins and domains encoded in viral genomes. We used protein structure modeling and similarity searches to illuminate the remaining dark matter in hundreds of thousands of orthornavirus genomes. The vast majority of the dark matter domains showed either "generic" folds, such as single α-helices, or no high confidence structure predictions. Nevertheless, a variety of lineage-specific globular domains that were new either to orthornaviruses in general or to particular virus families were identified within the proteomic dark matter of orthornaviruses, including several predicted nucleic acid-binding domains and nucleases. In addition, we identified a case of exaptation of a cellular nucleoside monophosphate kinase as an RNA-binding protein in several virus families. Notwithstanding the continuing discovery of numerous orthornaviruses, it appears that all the protein domains conserved in large groups of viruses have already been identified. The rest of the viral proteome seems to be dominated by poorly structured domains including intrinsically disordered ones that likely mediate specific virus-host interactions. IMPORTANCE Advanced methods for protein structure prediction, such as AlphaFold2, greatly expand our capability to identify protein domains and infer their likely functions and evolutionary relationships. This is particularly pertinent for proteins encoded by viruses that are known to evolve rapidly and as a result often cannot be adequately characterized by analysis of the protein sequences. We performed an exhaustive structure prediction and comparative analysis for uncharacterized proteins and domains ("dark matter") encoded by viruses with RNA genomes. The results show the dark matter of RNA virus proteome consists mostly of disordered and all-α-helical domains that cannot be readily assigned a specific function and that likely mediate various interactions between viral proteins and between viral and host proteins. The great majority of globular proteins and domains of RNA viruses are already known although we identified several unexpected domains represented in individual viral families.
Collapse
Affiliation(s)
- Pascal Mutz
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Antonio Pedro Camargo
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Harutyun Sahakyan
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Uri Neri
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Anamarija Butkovic
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Archaeal Virology Unit, Paris, France
| | - Yuri I. Wolf
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Archaeal Virology Unit, Paris, France
| | - Valerian V. Dolja
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, Oregon, USA
| | - Eugene V. Koonin
- Division of Intramural Research, Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
13
|
Kilinc M, Jia K, Jernigan RL. Major advances in protein function assignment by remote homolog detection with protein language models - A review. Curr Opin Struct Biol 2025; 90:102984. [PMID: 39864241 DOI: 10.1016/j.sbi.2025.102984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 12/23/2024] [Accepted: 01/02/2025] [Indexed: 01/28/2025]
Abstract
There is an ever-increasing need for accurate and efficient methods to identify protein homologs. Traditionally, sequence similarity-based methods have dominated protein homolog identification for function identification, but these struggle when the sequence identity between the pairs is low. Recently, transformer architecture-based deep learning methods have achieved breakthrough performances in many fields. One type of model that uses transformer architecture is the protein language model (pLM). Here, we describe methods that use pLMs for protein homolog identification intended for function identification and describe their strengths and weaknesses. Several important ideas emerge, such as filtering the substitution matrix generated from embeddings, selecting specific pLM layers for specific purposes, compressing the embeddings, and dividing proteins into domains before searching for homologs that improve remote homolog detection accuracy considerably. All of these approaches produce huge numbers of new homologs that can reliably extend the reach of protein relationships for a deeper understanding of evolution and many other problems.
Collapse
Affiliation(s)
- Mesih Kilinc
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA; Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA; Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT 06511, USA
| | - Robert L Jernigan
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA; Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
14
|
Zhu A, Cao L, Do T, Link AJ. Cysimiditides: RiPPs with a Zn-Tetracysteine Motif and Aspartimidylation. Biochemistry 2025; 64:479-489. [PMID: 39763476 DOI: 10.1021/acs.biochem.4c00661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Aspartimidylation is a post-translational modification found in multiple families of ribosomally synthesized and post-translationally modified peptides (RiPPs). We recently reported on the imiditides, a new RiPP family in which aspartimidylation is the class-defining modification. Imiditide biosynthetic gene clusters encode a precursor protein and a methyltransferase that methylates a specific Asp residue, converting it to aspartimide. A subset of imiditides harbor a tetracysteine motif, so we have named these molecules cysimiditides. Here, using genome mining, we show that there are 56 putative cysimiditides predicted in publicly available genome sequences, all within actinomycetota. We successfully heterologously expressed two examples of cysimiditides and showed that the major products are aspartimidylated and that the tetracysteine motif is necessary for protein stability. Cysimiditides bind a Zn2+ ion, presumably at the tetracysteine motif. Using in vitro reconstitution of the aspartimidylation reaction, we show that Zn2+ is required for the methylation and subsequent aspartimidylation of the precursor protein. An AlphaFold 3 model of the cysimiditide from Thermobifida cellulosilytica shows a hairpin structure anchored by the Zn2+-tetracysteine motif with the aspartimide site in the hairpin loop. An AlphaFold 3 model of this cysimiditide in complex with its cognate methyltransferase suggests that the methyltransferase recognizes the Zn2+-tetracysteine motif to correctly dock the precursor protein. Cysimiditides expand the set of experimentally confirmed RiPPs harboring aspartimides and represent the first RiPP class that has an obligate metal ion.
Collapse
Affiliation(s)
- Angela Zhu
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Li Cao
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Truc Do
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - A James Link
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
- Department of Molecular Biology, Princeton University, Princeton, New Jersey 08544, United States
| |
Collapse
|
15
|
Liu J, Neupane P, Cheng J. Accurate Prediction of Protein Complex Stoichiometry by Integrating AlphaFold3 and Template Information. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.12.632663. [PMID: 39868088 PMCID: PMC11761747 DOI: 10.1101/2025.01.12.632663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Protein structure prediction methods require stoichiometry information (i.e., subunit counts) to predict the quaternary structure of protein complexes. However, this information is often unavailable, making stoichiometry prediction crucial for complexes with unknown stoichiometry. Despite its importance, few computational methods address this challenge. In this study, we present an approach that integrates AlphaFold3 structure predictions with homologous template data to predict stoichiometry. The method generates candidate stoichiometries, builds structural models for them using AlphaFold3, ranks them based on AlphaFold3 scores, and further refine predictions with template-based information when available. In the 16th community-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP16), our method achieved 71.4% top-1 accuracy and 92.9% top-3 accuracy, outperforming other predictors in terms of the overall performance. This demonstrates the complementary strengths of AlphaFold3- and template-based predictions and highlights its applicability for uncharacterized protein complexes lacking stoichiometry data.
Collapse
Affiliation(s)
| | | | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
16
|
Guan C, Fernandes FC, Franco OL, de la Fuente-Nunez C. Leveraging large language models for peptide antibiotic design. CELL REPORTS. PHYSICAL SCIENCE 2025; 6:102359. [PMID: 39949833 PMCID: PMC11823563 DOI: 10.1016/j.xcrp.2024.102359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/16/2025]
Abstract
Large language models (LLMs) have significantly impacted various domains of our society, including recent applications in complex fields such as biology and chemistry. These models, built on sophisticated neural network architectures and trained on extensive datasets, are powerful tools for designing, optimizing, and generating molecules. This review explores the role of LLMs in discovering and designing antibiotics, focusing on peptide molecules. We highlight advancements in drug design and outline the challenges of applying LLMs in these areas.
Collapse
Affiliation(s)
- Changge Guan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors contributed equally
| | - Fabiano C. Fernandes
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- Departamento de Ciência da Computação, Instituto Federal de Brasília, Campus Taguatinga, Brasília, Brazil
- These authors contributed equally
| | - Octavio L. Franco
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- S-Inova Biotech, Programa de Pós-Graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, Brazil
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
17
|
Johnson S, Weigele P, Fomenkov A, Ge A, Vincze A, Eaglesham J, Roberts R, Sun Z. Domainator, a flexible software suite for domain-based annotation and neighborhood analysis, identifies proteins involved in antiviral systems. Nucleic Acids Res 2025; 53:gkae1175. [PMID: 39657740 PMCID: PMC11754643 DOI: 10.1093/nar/gkae1175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 11/07/2024] [Accepted: 11/15/2024] [Indexed: 12/12/2024] Open
Abstract
The availability of large databases of biological sequences presents an opportunity for in-depth exploration of gene diversity and function. Bacterial defense systems are a rich source of diverse but difficult to annotate genes with biotechnological applications. In this work, we present Domainator, a flexible and modular software suite for domain-based gene neighborhood and protein search, extraction and clustering. We demonstrate the utility of Domainator through three examples related to bacterial defense systems. First, we cluster CRISPR-associated Rossman fold (CARF) containing proteins with difficult to annotate effector domains, classifying most of them as likely transcriptional regulators and a subset as likely RNases. Second, we extract and cluster P4-like phage satellite defense hotspots, identify an abundant variant of Lamassu defense systems and demonstrate its in vivo activity against several T-even phages. Third, we integrate a protein language model into Domainator and use it to identify restriction endonucleases with low similarity to known reference sequences, validating the activity of one example in vitro. Domainator is made available as an open-source package with detailed documentation and usage examples.
Collapse
Affiliation(s)
| | | | | | - Andrew Ge
- New England Biolabs Inc., Ipswich, MA 01938, USA
| | - Anna Vincze
- New England Biolabs Inc., Ipswich, MA 01938, USA
| | | | | | - Zhiyi Sun
- New England Biolabs Inc., Ipswich, MA 01938, USA
| |
Collapse
|
18
|
Paysan-Lafosse T, Andreeva A, Blum M, Chuguransky S, Grego T, Pinto B, Salazar G, Bileschi M, Llinares-López F, Meng-Papaxanthos L, Colwell L, Grishin N, Schaeffer RD, Clementel D, Tosatto SE, Sonnhammer E, Wood V, Bateman A. The Pfam protein families database: embracing AI/ML. Nucleic Acids Res 2025; 53:D523-D534. [PMID: 39540428 PMCID: PMC11701544 DOI: 10.1093/nar/gkae997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Revised: 10/09/2024] [Accepted: 10/16/2024] [Indexed: 11/16/2024] Open
Abstract
The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam website and integration with InterPro, harmonization with the ECOD structural classification, and expanded curation of metagenomic, microprotein and repeat-containing families. We highlight how AlphaFold structure predictions are being leveraged to refine domain boundaries and identify new domains. New families discovered through large-scale sequence similarity analysis of AlphaFold models are described. We also detail the development of Pfam-N, which uses deep learning to expand family coverage, achieving an 8.8% increase in UniProtKB coverage compared to standard Pfam. We discuss plans for more frequent Pfam releases integrated with InterPro and the potential for artificial intelligence to further assist curation. Despite recent advances, many protein families remain to be classified, and Pfam continues working toward comprehensive coverage of the protein universe.
Collapse
Affiliation(s)
- Typhaine Paysan-Lafosse
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Antonina Andreeva
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matthias Blum
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sara Rocio Chuguransky
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Tiago Grego
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Beatriz Lazaro Pinto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Gustavo A Salazar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | | | | | | | - Lucy J Colwell
- Google DeepMind, 355 Main Street, Cambridge, MA 02142, USA
- Department of Chemistry, University of Cambridge, Lansfield road, Cambridge CB2 1EW, UK
| | - Nick V Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX75390, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX75390, USA
| | - R Dustin Schaeffer
- Department of Biophysics, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd., Dallas, TX75390, USA
| | - Damiano Clementel
- Department of Biomedical Sciences, University of Padova, Via 8 Febbraio, 2, 35122 Padova, Italy
| | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padova, Via 8 Febbraio, 2, 35122 Padova, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR-IBIOM), Via Giovanni Amendola, 122/O, 70126 Bari, Italy
| | - Erik Sonnhammer
- Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, Tomtebodavägen 23A, 17165 Solna, Sweden
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Hopkins Building Downing Site, Tennis Court Road, Cambridge CB2 1QW, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
19
|
Jin R, Du F, Han X, Guo J, Song W, Xia Y, Yue X, Yang D, Tong J, Zhang Q, Liu Y. Prognostic Value of Insulin Growth Factor-Like Receptor 1 (IGFLR1) in Stage II and III Colorectal Cancer and Its Association with Immune Cell Infiltration. Appl Biochem Biotechnol 2025; 197:427-442. [PMID: 39141178 PMCID: PMC11748461 DOI: 10.1007/s12010-024-05006-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/23/2024] [Indexed: 08/15/2024]
Abstract
IGFLR1 is a novel biomarker, and some evidences suggested that is involved in the immune microenvironment of CRC. Here, we explored the expression of IGFLR1 and its association with the prognosis as well as immune cell infiltration in CRC, with the aim to provide a basis for further studies on IGFLR1. Immunohistochemical staining for IGFLR1, TIM-3, FOXP3, CD4, CD8, and PD-1 was performed in eligible tissues to analyze the expression of IGFLR1 and its association with prognosis and immune cell infiltration. Then, we screened colon cancer samples from TCGA and grouped patients according to IGFLR1-related genes. We also evaluated the co-expression and immune-related pathways of IGFLR1 to identify the potential mechanism of it in CRC. When P < 0.05, the results were considered statistically significant. IGFLR1 and IGFLR1-related genes were associated with the prognosis and immune cell infiltration (P < 0.05). In stage II and III CRC tissue and normal tissue, we found (1) IGFLR1 was expressed in both the cell membrane and cytoplasm and which was differentially expressed between cancer tissue and normal tissue. IGFLR1 expression was associated with the expression of FOXP3, CD8, and gender but was not associated with microsatellite instability. (2) IGFLR1 was an independent prognostic factor and patients with high IGFLR1 had a better prognosis. (3) A model including IGFLR1, FOXP3, PD-1, and CD4 showed good prognostic stratification ability. (4) There was a significant interaction between IGFLR1 and GATA3, and IGFLR1 had a significant co-expression with related factors in the INFR pathway. IGFLR1 has emerged as a new molecule related to disease prognosis and immune cell infiltration in CRC patients and showed a good ability to predict the prognosis of patients.
Collapse
Affiliation(s)
- Ran Jin
- Department of Breast Surgery, Harbin Medical University Cancer Hospital, Harbin, China
| | - Fenqi Du
- Department of Colorectal Surgery, Harbin Medical University Cancer Hospital, Harbin, China
| | - Xinhao Han
- Department of Biostatistics, Public Health School of Harbin Medical University, Harbin, China
| | - Junnan Guo
- Department of Colorectal Surgery, Harbin Medical University Cancer Hospital, Harbin, China
| | - Wenjie Song
- Department of Colorectal Surgery, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yixiu Xia
- Department of Colorectal Surgery, Harbin Medical University Cancer Hospital, Harbin, China
| | - Xinyu Yue
- Department of Colorectal Surgery, Harbin Medical University Cancer Hospital, Harbin, China
| | - Da Yang
- Department of Colorectal Surgery, Harbin Medical University Cancer Hospital, Harbin, China
| | - Jinxue Tong
- Department of Colorectal Surgery, Harbin Medical University Cancer Hospital, Harbin, China.
| | - Qiuju Zhang
- Health Management Centre, Harbin Medical University Cancer Hospital, Harbin, China.
- Department of Biostatistics, Public Health School of Harbin Medical University, Harbin, China.
| | - Yanlong Liu
- Department of Colorectal Surgery, Harbin Medical University Cancer Hospital, Harbin, China.
| |
Collapse
|
20
|
Kabir MN, Wang LR, Goh WWB. Exploiting the similarity of dissimilarities for biomedical applications and enhanced machine learning. PLoS Comput Biol 2025; 21:e1012716. [PMID: 39854337 PMCID: PMC11759369 DOI: 10.1371/journal.pcbi.1012716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2025] Open
Abstract
The "similarity of dissimilarities" is an emerging paradigm in biomedical science with significant implications for protein function prediction, machine learning (ML), and personalized medicine. In protein function prediction, recognizing dissimilarities alongside similarities provides a more detailed understanding of evolutionary processes, allowing for a deeper exploration of regions that influence biological functionality. For ML models, incorporating dissimilarity measures helps avoid misleading results caused by highly correlated or similar data, addressing confounding issues like the Doppelgänger Effect. This leads to more accurate insights and a stronger understanding of complex biological systems. In the realm of personalized AI and precision medicine, the importance of dissimilarities is paramount. Personalized AI builds local models for each sample by identifying a network of neighboring samples. However, if the neighboring samples are too similar, it becomes difficult to identify factors critical to disease onset for the individual, limiting the effectiveness of personalized interventions or treatments. This paper discusses the "similarity of dissimilarities" concept, using protein function prediction, ML, and personalized AI as key examples. Integrating this approach into an analysis allows for the design of better, more meaningful experiments and the development of smarter validation methods, ensuring that the models learn in a meaningful way.
Collapse
Affiliation(s)
- Mohammad Neamul Kabir
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
| | - Li Rong Wang
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
- Center of AI in Medicine, Nanyang Technological University, Singapore, Singapore
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| |
Collapse
|
21
|
Jia B, Baek JH, Lee JK, Sun Y, Kim KH, Jung JY, Jeon CO. Expanding the β-Lactamase Family in the Human Microbiome. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2403563. [PMID: 39447121 PMCID: PMC11633517 DOI: 10.1002/advs.202403563] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 09/23/2024] [Indexed: 10/26/2024]
Abstract
β-lactams, the most common antibiotics globally, have resistance primarily determined by β-lactamases. Human microbiota and β-lactams influence mutually; however, β-lactamase variety and abundance in the human microbiome remain partially understood. This study aimed to elucidate the diversity, abundance, and substrate spectrum of β-lactamases. 1369 characterized β-lactamases and 16 204 putative sequences are collected from protein databases. Upon clustering analysis and biochemical assays, nine proteins exhibiting less than 35% identity to those previously characterized are confirmed as β-lactamases. These newly identified β-lactamases originated from eight distinct clusters comprising 1163 β-lactamases. Quantifying healthy participants (n = 2394) across 19 countries using functionally confirmed clusters revealed that Japan have the highest gut β-lactamase abundance (log2[reads per million (RPM)] = 6.52) and Fiji have the lowest (log2[RPM] = 2.31). The β-lactamase abundance is correlated with β-lactam consumption (R = 0.50, p = 0.029) and income (R = 0.51, p = 0.024). Comparing individuals with ailments with healthy participants, β-lactamase abundance in the gut is increased significantly in patients with colorectal cancer, cardiovascular diseases, breast cancer, and epilepsy. These outcomes provide insights into investigating antibiotic resistance, antibiotic stewardship, and gut microbiome-antibiotic interactions.
Collapse
Affiliation(s)
- Baolei Jia
- Xianghu LaboratoryHangzhou311231China
- Department of Life ScienceChung‐Ang UniversitySeoul06974Republic of Korea
| | - Ju Hye Baek
- Department of Life ScienceChung‐Ang UniversitySeoul06974Republic of Korea
| | - Jae Kyeong Lee
- Department of Life ScienceChung‐Ang UniversitySeoul06974Republic of Korea
| | - Ying Sun
- Department of Veterinary and Animal SciencesUniversity of CopenhagenCopenhagen1870Denmark
| | - Kyung Hyun Kim
- Department of Biological Sciences and BiotechnologyHannam UniversityDaejon34054Republic of Korea
| | - Ji Young Jung
- Microbial Research DepartmentNakdonggang National Institute of Biological ResourcesGyeongsangbuk‐do37242Republic of Korea
| | - Che Ok Jeon
- Department of Life ScienceChung‐Ang UniversitySeoul06974Republic of Korea
| |
Collapse
|
22
|
Zhang C, Wang Q, Li Y, Teng A, Hu G, Wuyun Q, Zheng W. The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction. Biomolecules 2024; 14:1531. [PMID: 39766238 PMCID: PMC11673352 DOI: 10.3390/biom14121531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 01/11/2025] Open
Abstract
Multiple sequence alignment (MSA) has evolved into a fundamental tool in the biological sciences, playing a pivotal role in predicting molecular structures and functions. With broad applications in protein and nucleic acid modeling, MSAs continue to underpin advancements across a range of disciplines. MSAs are not only foundational for traditional sequence comparison techniques but also increasingly important in the context of artificial intelligence (AI)-driven advancements. Recent breakthroughs in AI, particularly in protein and nucleic acid structure prediction, rely heavily on the accuracy and efficiency of MSAs to enhance remote homology detection and guide spatial restraints. This review traces the historical evolution of MSA, highlighting its significance in molecular structure and function prediction. We cover the methodologies used for protein monomers, protein complexes, and RNA, while also exploring emerging AI-based alternatives, such as protein language models, as complementary or replacement approaches to traditional MSAs in application tasks. By discussing the strengths, limitations, and applications of these methods, this review aims to provide researchers with valuable insights into MSA's evolving role, equipping them to make informed decisions in structural prediction research.
Collapse
Affiliation(s)
- Chenyue Zhang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Qinxin Wang
- Suzhou New & High-Tech Innovation Service Center, Suzhou 215011, China;
| | - Yiyang Li
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Anqi Teng
- Bioscience and Biomedical Engineering Thrust, Systems Hub, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China;
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Wei Zheng
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
23
|
Béchon N, Tal N, Stokar-Avihail A, Savidor A, Kupervaser M, Melamed S, Amitai G, Sorek R. Diversification of molecular pattern recognition in bacterial NLR-like proteins. Nat Commun 2024; 15:9860. [PMID: 39543107 PMCID: PMC11564622 DOI: 10.1038/s41467-024-54214-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 11/01/2024] [Indexed: 11/17/2024] Open
Abstract
Antiviral STANDs (Avs) are bacterial anti-phage proteins evolutionarily related to immune pattern recognition receptors of the NLR family. Type 2 Avs proteins (Avs2) were suggested to recognize the phage large terminase subunit as a signature of phage infection. Here, we show that Avs2 from Klebsiella pneumoniae (KpAvs2) can recognize several different phage proteins as signature for infection. While KpAvs2 recognizes the large terminase subunit of Seuratvirus phages, we find that to protect against Dhillonvirus phages, KpAvs2 recognizes a different phage protein named KpAvs2-stimulating protein 1 (Ksap1). KpAvs2 directly binds Ksap1 to become activated, and phages mutated in Ksap1 escape KpAvs2 defense despite encoding an intact terminase. We further show that KpAvs2 protects against a third group of phages by recognizing another protein, Ksap2. Our results exemplify the evolutionary diversification of molecular pattern recognition in bacterial Avs2, and show that a single pattern recognition receptor evolved to recognize different phage-encoded proteins.
Collapse
Affiliation(s)
- Nathalie Béchon
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Nitzan Tal
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | | | - Alon Savidor
- de Botton Institute for Protein Profiling, The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot, Israel
| | - Meital Kupervaser
- de Botton Institute for Protein Profiling, The Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot, Israel
| | - Sarah Melamed
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Gil Amitai
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Rotem Sorek
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
24
|
Richards TA, Eme L, Archibald JM, Leonard G, Coelho SM, de Mendoza A, Dessimoz C, Dolezal P, Fritz-Laylin LK, Gabaldón T, Hampl V, Kops GJPL, Leger MM, Lopez-Garcia P, McInerney JO, Moreira D, Muñoz-Gómez SA, Richter DJ, Ruiz-Trillo I, Santoro AE, Sebé-Pedrós A, Snel B, Stairs CW, Tromer EC, van Hooff JJE, Wickstead B, Williams TA, Roger AJ, Dacks JB, Wideman JG. Reconstructing the last common ancestor of all eukaryotes. PLoS Biol 2024; 22:e3002917. [PMID: 39585925 PMCID: PMC11627563 DOI: 10.1371/journal.pbio.3002917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 12/09/2024] [Indexed: 11/27/2024] Open
Abstract
Understanding the origin of eukaryotic cells is one of the most difficult problems in all of biology. A key challenge relevant to the question of eukaryogenesis is reconstructing the gene repertoire of the last eukaryotic common ancestor (LECA). As data sets grow, sketching an accurate genomics-informed picture of early eukaryotic cellular complexity requires provision of analytical resources and a commitment to data sharing. Here, we summarise progress towards understanding the biology of LECA and outline a community approach to inferring its wider gene repertoire. Once assembled, a robust LECA gene set will be a useful tool for evaluating alternative hypotheses about the origin of eukaryotes and understanding the evolution of traits in all descendant lineages, with relevance in diverse fields such as cell biology, microbial ecology, biotechnology, agriculture, and medicine. In this Consensus View, we put forth the status quo and an agreed path forward to reconstruct LECA's gene content.
Collapse
Affiliation(s)
| | - Laura Eme
- Ecologie Systématique Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Gif-sur-Yvette, France
- Department of Cell & Molecular Biology, The University of Rhode Island, Kingston, Rhode Island, United States of America
| | - John M. Archibald
- Department of Biochemistry and Molecular Biology and the Institute for Comparative Genomics, Dalhousie University, Halifax, Canada
| | - Guy Leonard
- Department of Biology, University of Oxford, Oxford, United Kingdom
| | - Susana M. Coelho
- Department of Algal Development and Evolution, Max Planck Institute for Biology Tübingen, Tübingen, Germany
| | - Alex de Mendoza
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, United States of America
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Pavel Dolezal
- Charles University, Faculty of Science, Department of Parasitology, BIOCEV, Vestec, Czech Republic
| | - Lillian K. Fritz-Laylin
- Department of Biology, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| | - Toni Gabaldón
- Barcelona Supercomputing Centre (BSC-CNS), Barcelona, Spain
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- CIBER de Enfermedades Infecciosas, Instituto de Salud Carlos III, Madrid, Spain
| | - Vladimír Hampl
- Charles University, Faculty of Science, Department of Parasitology, BIOCEV, Vestec, Czech Republic
| | - Geert J. P. L. Kops
- Hubrecht Institute-KNAW, Oncode Institute, UMC Utrecht, Utrecht, the Netherlands
| | - Michelle M. Leger
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
- Okinawa Institute of Science and Technology Graduate University (OIST), Okinawa, Japan
| | - Purificacion Lopez-Garcia
- Ecologie Systématique Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Gif-sur-Yvette, France
| | - James O. McInerney
- Department of Evolution, Ecology and Behaviour, Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, United Kingdom
| | - David Moreira
- Ecologie Systématique Evolution, CNRS, Université Paris-Saclay, AgroParisTech, Gif-sur-Yvette, France
| | - Sergio A. Muñoz-Gómez
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Daniel J. Richter
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| | - Iñaki Ruiz-Trillo
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Barcelona, Spain
| | - Alyson E. Santoro
- Department of Ecology, Evolution and Marine Biology, University of California, Santa Barbara, California, United States of America
| | - Arnau Sebé-Pedrós
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Berend Snel
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Utrecht, the Netherlands
| | | | - Eelco C. Tromer
- Cell Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, Rijksuniversiteit Groningen, Groningen, the Netherlands
| | - Jolien J. E. van Hooff
- Laboratory of Microbiology, Wageningen University & Research, Wageningen, the Netherlands
| | - Bill Wickstead
- School of Life Sciences, University of Nottingham, Nottingham, United Kingdom
| | - Tom A. Williams
- School of Biological Sciences, University of Bristol, Bristol, United Kingdom
| | - Andrew J. Roger
- Department of Biochemistry and Molecular Biology and the Institute for Comparative Genomics, Dalhousie University, Halifax, Canada
| | - Joel B. Dacks
- Division of Infectious Diseases, Department of Medicine, and Department of Biological Sciences, University of Alberta, Edmonton, Canada
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
- Centre for Life’s Origins and Evolution, Department of Genetics, Evolution, & Environment, University College, London, United Kingdom
| | - Jeremy G. Wideman
- Center for Mechanisms of Evolution, School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America
| |
Collapse
|
25
|
Ramos-León F, Anjuwon-Foster BR, Anantharaman V, Updegrove TB, Ferreira CN, Ibrahim AM, Tai CH, Kruhlak MJ, Missiakas DM, Camberg JL, Aravind L, Ramamurthi KS. PcdA promotes orthogonal division plane selection in Staphylococcus aureus. Nat Microbiol 2024; 9:2997-3012. [PMID: 39468247 DOI: 10.1038/s41564-024-01821-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 08/30/2024] [Indexed: 10/30/2024]
Abstract
The bacterial pathogen, Staphylococcus aureus, grows by dividing in two alternating orthogonal planes. How these cell division planes are positioned correctly is not known. Here we used chemical genetic screening to identify PcdA as a division plane placement factor. Molecular biology and imaging approaches revealed non-orthogonal division plane selection for pcdA mutant bacteria. PcdA is a structurally and functionally altered member of the McrB AAA+ NTPase family, which are often found as restriction enzyme subunits. PcdA interacts with the tubulin-like divisome component, FtsZ, and the structural protein, DivIVA; it also localizes to future cell division sites. PcdA multimerization, localization and function are NTPase activity-dependent. We propose that the DivIVA/PcdA complex recruits unpolymerized FtsZ to assemble along the proper cell division plane. Although pcdA deletion did not affect S. aureus growth in several laboratory conditions, its clustered growth pattern was disrupted, sensitivity to cell-wall-targeting antibiotics increased and virulence in mice decreased. We propose that the characteristic clustered growth pattern of S. aureus, which emerges from dividing in alternating orthogonal division planes, might protect the bacterium from host defences.
Collapse
Affiliation(s)
- Félix Ramos-León
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Brandon R Anjuwon-Foster
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Vivek Anantharaman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Taylor B Updegrove
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Colby N Ferreira
- Department of Cell and Molecular Biology, University of Rhode Island, Kingston, RI, USA
| | - Amany M Ibrahim
- Department of Microbiology, Howard Taylor Ricketts Laboratory, University of Chicago, Lemont, IL, USA
| | - Chin-Hsien Tai
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Michael J Kruhlak
- Laboratory of Cancer Biology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Dominique M Missiakas
- Department of Microbiology, Howard Taylor Ricketts Laboratory, University of Chicago, Lemont, IL, USA
| | - Jodi L Camberg
- Department of Cell and Molecular Biology, University of Rhode Island, Kingston, RI, USA
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Kumaran S Ramamurthi
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
26
|
Updegrove TB, Delerue T, Anantharaman V, Cho H, Chan C, Nipper T, Choo-Wosoba H, Jenkins LM, Zhang L, Su Y, Shroff H, Chen J, Bewley CA, Aravind L, Ramamurthi KS. Altruistic feeding and cell-cell signaling during bacterial differentiation actively enhance phenotypic heterogeneity. SCIENCE ADVANCES 2024; 10:eadq0791. [PMID: 39423260 PMCID: PMC11488536 DOI: 10.1126/sciadv.adq0791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 09/12/2024] [Indexed: 10/21/2024]
Abstract
Starvation triggers bacterial spore formation, a committed differentiation program that transforms a vegetative cell into a dormant spore. Cells in a population enter sporulation nonuniformly to secure against the possibility that favorable growth conditions, which put sporulation-committed cells at a disadvantage, may resume. This heterogeneous behavior is initiated by a passive mechanism: stochastic activation of a master transcriptional regulator. Here, we identify a cell-cell communication pathway containing the proteins ShfA (YabQ) and ShfP (YvnB) that actively promotes phenotypic heterogeneity, wherein Bacillus subtilis cells that start sporulating early use a calcineurin-like phosphoesterase to release glycerol, which simultaneously acts as a signaling molecule and a nutrient to delay nonsporulating cells from entering sporulation. This produced a more diverse population that was better poised to exploit a sudden influx of nutrients compared to those generating heterogeneity via stochastic gene expression alone. Although conflict systems are prevalent among microbes, genetically encoded cooperative behavior in unicellular organisms can evidently also boost inclusive fitness.
Collapse
Affiliation(s)
- Taylor B. Updegrove
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Thomas Delerue
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Vivek Anantharaman
- Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hyomoon Cho
- Laboratory of Bioorganic Chemistry, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Carissa Chan
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Thomas Nipper
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hyoyoung Choo-Wosoba
- Office of Collaborative Biostatistics, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lisa M. Jenkins
- Laboratory of Cell Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lixia Zhang
- Advanced Imaging and Microscopy Resource, National Institutes of Health, Bethesda, MD, USA
| | - Yijun Su
- Laboratory of High Resolution Optical Imaging, National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, MD, USA
- Janelia Farm Research Campus, Howard Hughes Medical Institute (HHMI), Ashburn, VA, USA
| | - Hari Shroff
- Laboratory of High Resolution Optical Imaging, National Institute of Biomedical Imaging and Bioengineering, National Institutes of Health, Bethesda, MD, USA
- Janelia Farm Research Campus, Howard Hughes Medical Institute (HHMI), Ashburn, VA, USA
| | - Jiji Chen
- Advanced Imaging and Microscopy Resource, National Institutes of Health, Bethesda, MD, USA
| | - Carole A. Bewley
- Laboratory of Bioorganic Chemistry, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD, USA
| | - L. Aravind
- Computational Biology Branch, Division of Intramural Research, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Kumaran S. Ramamurthi
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
27
|
Iovino BG, Tang H, Ye Y. Protein domain embeddings for fast and accurate similarity search. Genome Res 2024; 34:1434-1444. [PMID: 39237301 PMCID: PMC11529836 DOI: 10.1101/gr.279127.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 09/03/2024] [Indexed: 09/07/2024]
Abstract
Recently developed protein language models have enabled a variety of applications with the protein contextual embeddings they produce. Per-protein representations (each protein is represented as a vector of fixed dimension) can be derived via averaging the embeddings of individual residues, or applying matrix transformation techniques such as the discrete cosine transformation (DCT) to matrices of residue embeddings. Such protein-level embeddings have been applied to enable fast searches of similar proteins; however, limitations have been found; for example, PROST is good at detecting global homologs but not local homologs, and knnProtT5 excels for proteins with single domains but not multidomain proteins. Here, we propose a novel approach that first segments proteins into domains (or subdomains) and then applies the DCT to the vectorized embeddings of residues in each domain to infer domain-level contextual vectors. Our approach, called DCTdomain, uses predicted contact maps from ESM-2 for domain segmentation, which is formulated as a domain segmentation problem and can be solved using a recursive cut algorithm (RecCut in short) in quadratic time to the protein length; for comparison, an existing approach for domain segmentation uses a cubic-time algorithm. We show such domain-level contextual vectors (termed as DCT fingerprints) enable fast and accurate detection of similarity between proteins that share global similarities but with undefined extended regions between shared domains, and those that only share local similarities. In addition, tests on a database search benchmark show that the DCTdomain is able to detect distant homologs by leveraging the structural information in the contextual embeddings.
Collapse
Affiliation(s)
- Benjamin Giovanni Iovino
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, USA
| | - Haixu Tang
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, USA
| | - Yuzhen Ye
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, USA
| |
Collapse
|
28
|
Ng G, Gouda A, Andrysek J. Quantifying Asymmetric Gait Pattern Changes Using a Hidden Markov Model Similarity Measure (HMM-SM) on Inertial Sensor Signals. SENSORS (BASEL, SWITZERLAND) 2024; 24:6431. [PMID: 39409470 PMCID: PMC11479378 DOI: 10.3390/s24196431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 09/17/2024] [Accepted: 10/01/2024] [Indexed: 10/20/2024]
Abstract
Wearable gait analysis systems using inertial sensors offer the potential for easy-to-use gait assessment in lab and free-living environments. This can enable objective long-term monitoring and decision making for individuals with gait disabilities. This study explores a novel approach that applies a hidden Markov model-based similarity measure (HMM-SM) to assess changes in gait patterns based on the gyroscope and accelerometer signals from just one or two inertial sensors. Eleven able-bodied individuals were equipped with a system which perturbed gait patterns by manipulating stance-time symmetry. Inertial sensor data were collected from various locations on the lower body to train hidden Markov models. The HMM-SM was evaluated to determine whether it corresponded to changes in gait as individuals deviated from their baseline, and whether it could provide a reliable measure of gait similarity. The HMM-SM showed consistent changes in accordance with stance-time symmetry in the following sensor configurations: pelvis, combined upper leg signals, and combined lower leg signals. Additionally, the HMM-SM demonstrated good reliability for the combined upper leg signals (ICC = 0.803) and lower leg signals (ICC = 0.795). These findings provide preliminary evidence that the HMM-SM could be useful in assessing changes in overall gait patterns. This could enable the development of compact, wearable systems for unsupervised gait assessment, without the requirement to pre-identify and measure a set of gait parameters.
Collapse
Affiliation(s)
- Gabriel Ng
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON M5S 1A1, Canada; (G.N.); (A.G.)
- Research Institute, Holland Bloorview Kids Rehabilitation Hospital, Toronto, ON M4G 1R8, Canada
| | - Aliaa Gouda
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON M5S 1A1, Canada; (G.N.); (A.G.)
- Research Institute, Holland Bloorview Kids Rehabilitation Hospital, Toronto, ON M4G 1R8, Canada
| | - Jan Andrysek
- Institute of Biomedical Engineering, University of Toronto, Toronto, ON M5S 1A1, Canada; (G.N.); (A.G.)
- Research Institute, Holland Bloorview Kids Rehabilitation Hospital, Toronto, ON M4G 1R8, Canada
| |
Collapse
|
29
|
Msweli S, Pakala SB, Syed K. NF-κB Transcription Factors: Their Distribution, Family Expansion, Structural Conservation, and Evolution in Animals. Int J Mol Sci 2024; 25:9793. [PMID: 39337282 PMCID: PMC11432056 DOI: 10.3390/ijms25189793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 09/05/2024] [Accepted: 09/08/2024] [Indexed: 09/30/2024] Open
Abstract
The Nuclear Factor Kappa B (NF-κB) transcription factor family consists of five members: RelA (p65), RelB, c-Rel, p50 (p105/NF-κB1), and p52 (p100/NF-κB2). This family is considered a master regulator of classical biochemical pathways such as inflammation, immunity, cell proliferation, and cell death. The proteins in this family have a conserved Rel homology domain (RHD) with the following subdomains: DNA binding domain (RHD-DBD) and dimerization domain (RHD-DD). Despite the importance of the NF-κB family in biology, there is a lack of information with respect to their distribution patterns, evolution, and structural conservation concerning domains and subdomains in animals. This study aims to address this critical gap regarding NF-κB proteins. A comprehensive analysis of NF-κB family proteins revealed their distinct distribution in animals, with differences in protein sizes, conserved domains, and subdomains (RHD-DBD and RHD-DD). For the first time, NF-κB proteins with multiple RHD-DBDs and RHD-DDs have been identified, and in some cases, this is due to subdomain duplication. The presence of RelA/p65 exclusively in vertebrates shows that innate immunity originated in fishes, followed by amphibians, reptiles, aves, and mammals. Phylogenetic analysis showed that NF-κB family proteins grouped according to animal groups, signifying structural conservation after speciation. The evolutionary analysis of RHDs suggests that NF-κB family members p50/p105 and c-Rel may have been the first to emerge in arthropod ancestors, followed by RelB, RelA, and p52/p100.
Collapse
Affiliation(s)
- Siphesihle Msweli
- Department of Biochemistry and Microbiology, Faculty of Science, Agriculture and Engineering, University of Zululand, KwaDlangezwa 3886, South Africa; (S.M.); (S.B.P.)
| | - Suresh B. Pakala
- Department of Biochemistry and Microbiology, Faculty of Science, Agriculture and Engineering, University of Zululand, KwaDlangezwa 3886, South Africa; (S.M.); (S.B.P.)
- Department of Biochemistry, School of Life Sciences, University of Hyderabad, Hyderabad 500-046, India
| | - Khajamohiddin Syed
- Department of Biochemistry and Microbiology, Faculty of Science, Agriculture and Engineering, University of Zululand, KwaDlangezwa 3886, South Africa; (S.M.); (S.B.P.)
| |
Collapse
|
30
|
Liu W, Shi X, Cai Y, Sun W, He P, Perez-Moreno J, Liu D, Yu F. Two near-chromosomal-level genomes of globally-distributed Macroascomycete based on single-molecule fluorescence and Hi-C methods. Sci Data 2024; 11:964. [PMID: 39231989 PMCID: PMC11375150 DOI: 10.1038/s41597-024-03794-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 08/16/2024] [Indexed: 09/06/2024] Open
Abstract
Discinaceae holds significant importance within the Pezizales, representing a prominent group of macroascomycetes distributed globally. However, there is a dearth of genomic studies focusing on this family, resulting in gaps in our understanding of its evolution, development, and ecology. Here we utilized state-of-the-art genome assembly methodologies, incorporating third-generation single-molecule fluorescence and Hi-C-assisted methods, to elucidate the genomic landscapes of Gyromitra esculenta and Paragyromitra xinjiangensis. The genome sizes of two species were determined to be 47.10 Mb and 48.20 Mb, with 23 and 22 scaffolds, respectively. 10,438 and 11,469 coding proteins were identified, with functional annotations encompassing over 96.47% and 94.40%, respectively. Assessment of completeness using BUSCO revealed that 98.71% and 98.89% of the conserved proteins were identified. The application of comparative genomic technology has helped in identifying traits associated with of heterothallic life cycle traits and elucidating unique patterns of chromosomal evolution. Additionally, we identified potential saprotrophic nutritional modes and systematic phylogenetic relationships between the two species. Therefore, this study provides crucial genomic insights into the evolution, nutritional type, and ecological roles of species within the Pezizales.
Collapse
Affiliation(s)
- Wei Liu
- The Germplasm Bank of Wild Species & Yunnan Key Laboratory for Fungal Diversity and Green Development, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Xiaofei Shi
- The Germplasm Bank of Wild Species & Yunnan Key Laboratory for Fungal Diversity and Green Development, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China
| | - Yingli Cai
- Institute of Agro-products Processing, Yunnan Academy of Agricultural Sciences, Kunming, 650221, China
| | - Wenhua Sun
- College of Food and Biological Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China
| | - Peixin He
- College of Food and Biological Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China
| | - Jesus Perez-Moreno
- Edafología, Campus Montecillo, Colegio de Postgraduados, Texcoco, 56230, Mexico
| | - Dong Liu
- The Germplasm Bank of Wild Species & Yunnan Key Laboratory for Fungal Diversity and Green Development, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China.
| | - Fuqiang Yu
- The Germplasm Bank of Wild Species & Yunnan Key Laboratory for Fungal Diversity and Green Development, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China.
| |
Collapse
|
31
|
Kutlu Y, Axel G, Kolodny R, Ben-Tal N, Haliloglu T. Reused Protein Segments Linked to Functional Dynamics. Mol Biol Evol 2024; 41:msae184. [PMID: 39226145 PMCID: PMC11412252 DOI: 10.1093/molbev/msae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 08/10/2024] [Accepted: 08/26/2024] [Indexed: 09/05/2024] Open
Abstract
Protein space is characterized by extensive recurrence, or "reuse," of parts, suggesting that new proteins and domains can evolve by mixing-and-matching of existing segments. From an evolutionary perspective, for a given combination to persist, the protein segments should presumably not only match geometrically but also dynamically communicate with each other to allow concerted motions that are key to function. Evidence from protein space supports the premise that domains indeed combine in this manner; we explore whether a similar phenomenon can be observed at the sub-domain level. To this end, we use Gaussian Network Models (GNMs) to calculate the so-called soft modes, or low-frequency modes of motion for a dataset of 150 protein domains. Modes of motion can be used to decompose a domain into segments of consecutive amino acids that we call "dynamic elements", each of which belongs to one of two parts that move in opposite senses. We find that, in many cases, the dynamic elements, detected based on GNM analysis, correspond to established "themes": Sub-domain-level segments that have been shown to recur in protein space, and which were detected in previous research using sequence similarity alone (i.e. completely independently of the GNM analysis). This statistically significant correlation hints at the importance of dynamics in evolution. Overall, the results are consistent with an evolutionary scenario where proteins have emerged from themes that need to match each other both geometrically and dynamically, e.g. to facilitate allosteric regulation.
Collapse
Affiliation(s)
- Yiğit Kutlu
- Department of Chemical Engineering and Polymer Research Center, Bogazici University, Istanbul, Turkey
| | - Gabriel Axel
- School of Neurobiology, Biochemistry & Biophysics, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa, Israel
| | - Nir Ben-Tal
- School of Neurobiology, Biochemistry & Biophysics, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Turkan Haliloglu
- Department of Chemical Engineering and Polymer Research Center, Bogazici University, Istanbul, Turkey
| |
Collapse
|
32
|
Ouyang J, Gao Y, Yang Y. PCP-GC-LM: single-sequence-based protein contact prediction using dual graph convolutional neural network and convolutional neural network. BMC Bioinformatics 2024; 25:287. [PMID: 39223474 PMCID: PMC11370006 DOI: 10.1186/s12859-024-05914-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 08/22/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND Recently, the process of evolution information and the deep learning network has promoted the improvement of protein contact prediction methods. Nevertheless, still remain some bottleneck: (1) One of the bottlenecks is the prediction of orphans and other fewer evolution information proteins. (2) The other bottleneck is the method of predicting single-sequence-based proteins mainly focuses on selecting protein sequence features and tuning the neural network architecture, However, while the deeper neural networks improve prediction accuracy, there is still the problem of increasing the computational burden. Compared with other neural networks in the field of protein prediction, the graph neural network has the following advantages: due to the advantage of revealing the topology structure via graph neural network and being able to take advantage of the hierarchical structure and local connectivity of graph neural networks has certain advantages in capturing the features of different levels of abstraction in protein molecules. When using protein sequence and structure information for joint training, the dependencies between the two kinds of information can be better captured. And it can process protein molecular structures of different lengths and shapes, while traditional neural networks need to convert proteins into fixed-size vectors or matrices for processing. RESULTS Here, we propose a single-sequence-based protein contact map predictor PCP-GC-LM, with dual-level graph neural networks and convolution networks. Our method performs better with other single-sequence-based predictors in different independent tests. In addition, to verify the validity of our method against complex protein structures, we will also compare it with other methods in two homodimers protein test sets (DeepHomo test dataset and CASP-CAPRI target dataset). Furthermore, we also perform ablation experiments to demonstrate the necessity of a dual graph network. In all, our framework presents new modules to accurately predict inter-chain contact maps in protein and it's also useful to analyze interactions in other types of protein complexes.
Collapse
Affiliation(s)
- J Ouyang
- Key Laboratory of Intelligent Computing Information Processing, Xiangtan University, Xiangtan, China
- School of Computer Science, Xiangtan University, Xiangtan, China
| | - Y Gao
- Key Laboratory of Intelligent Computing Information Processing, Xiangtan University, Xiangtan, China.
- School of Computer Science, Xiangtan University, Xiangtan, China.
| | - Y Yang
- School of Computer Science, Xiangtan University, Xiangtan, China
| |
Collapse
|
33
|
Becker F, Stanke M. learnMSA2: deep protein multiple alignments with large language and hidden Markov models. Bioinformatics 2024; 40:ii79-ii86. [PMID: 39230690 PMCID: PMC11373405 DOI: 10.1093/bioinformatics/btae381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION For the alignment of large numbers of protein sequences, tools are predominant that decide to align two residues using only simple prior knowledge, e.g. amino acid substitution matrices, and using only part of the available data. The accuracy of state-of-the-art programs declines with decreasing sequence identity and when increasingly large numbers of sequences are aligned. Recently, transformer-based deep-learning models started to harness the vast amount of protein sequence data, resulting in powerful pretrained language models with the main purpose of generating high-dimensional numerical representations, embeddings, for individual sites that agglomerate evolutionary, structural, and biophysical information. RESULTS We extend the traditional profile hidden Markov model so that it takes as inputs unaligned protein sequences and the corresponding embeddings. We fit the model with gradient descent using our existing differentiable hidden Markov layer. All sequences and their embeddings are jointly aligned to a model of the protein family. We report that our upgraded HMM-based aligner, learnMSA2, combined with the ProtT5-XL protein language model aligns on average almost 6% points more columns correctly than the best amino acid-based competitor and scales well with sequence number. The relative advantage of learnMSA2 over other programs tends to be greater when the sequence identity is lower and when the number of sequences is larger. Our results strengthen the evidence on the rich information contained in protein language models' embeddings and their potential downstream impact on the field of bioinformatics. Availability and implementation: https://github.com/Gaius-Augustus/learnMSA, PyPI and Bioconda, evaluation: https://github.com/felbecker/snakeMSA.
Collapse
Affiliation(s)
- Felix Becker
- Institute of Mathematics and Computer Science, University of Greifswald, 17489 Greifswald, Germany
| | - Mario Stanke
- Institute of Mathematics and Computer Science, University of Greifswald, 17489 Greifswald, Germany
| |
Collapse
|
34
|
Pons JL, Reys V, Grand F, Moreau V, Gracy J, Exner TE, Labesse G. @TOME 3.0: Interfacing Protein Structure Modeling and Ligand Docking. J Mol Biol 2024; 436:168704. [PMID: 39237192 DOI: 10.1016/j.jmb.2024.168704] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 07/02/2024] [Accepted: 07/09/2024] [Indexed: 09/07/2024]
Abstract
Knowledge of protein-ligand complexes is essential for efficient drug design. Virtual docking can bring important information on putative complexes but it is still far from being simultaneously fast and accurate. Receptors are flexible and adapt to the incoming small molecules while docking is highly sensitive to small conformational deviations. Conformation ensemble is providing a mean to simulate protein flexibility. However, modeling multiple protein structures for many targets is seldom connected to ligand screening in an efficient and straightforward manner. @TOME-3 is an updated version of our former pipeline @TOME-2, in which protein structure modeling is now directly interfaced with flexible ligand docking. Sequence-sequence profile comparisons identify suitable PDB templates for structure modeling and ligands from these templates are used to deduce binding sites to be screened. In addition, bound ligand can be used as pharmacophoric restraint during the virtual docking. The latter is performed by PLANTS while the docking poses are analysed through multiple chemoinformatics functions. This unique combination of tools allows rapid and efficient ligand docking on multiple receptor conformations in parallel. @TOME-3 is freely available on the web at https://atome.cbs.cnrs.fr.
Collapse
Affiliation(s)
- Jean-Luc Pons
- A.B.C.I.S, CNRS UMR5048 - INSERM U1054 - Université de Montpellier 29, Rue de Navacelles, 34090 Montpellier Cedex, France
| | - Victor Reys
- A.B.C.I.S, CNRS UMR5048 - INSERM U1054 - Université de Montpellier 29, Rue de Navacelles, 34090 Montpellier Cedex, France
| | - François Grand
- A.B.C.I.S, CNRS UMR5048 - INSERM U1054 - Université de Montpellier 29, Rue de Navacelles, 34090 Montpellier Cedex, France
| | - Violaine Moreau
- A.B.C.I.S, CNRS UMR5048 - INSERM U1054 - Université de Montpellier 29, Rue de Navacelles, 34090 Montpellier Cedex, France
| | - Jerôme Gracy
- A.B.C.I.S, CNRS UMR5048 - INSERM U1054 - Université de Montpellier 29, Rue de Navacelles, 34090 Montpellier Cedex, France
| | - Thomas E Exner
- Seven Past Nine d.o.o., Hribljane 10, 1380 Cerknica, Slovenia
| | - Gilles Labesse
- A.B.C.I.S, CNRS UMR5048 - INSERM U1054 - Université de Montpellier 29, Rue de Navacelles, 34090 Montpellier Cedex, France.
| |
Collapse
|
35
|
Rahimzadeh F, Mohammad Khanli L, Salehpoor P, Golabi F, PourBahrami S. Unveiling the evolution of policies for enhancing protein structure predictions: A comprehensive analysis. Comput Biol Med 2024; 179:108815. [PMID: 38986287 DOI: 10.1016/j.compbiomed.2024.108815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 06/09/2024] [Accepted: 06/24/2024] [Indexed: 07/12/2024]
Abstract
Predicting protein structure is both fascinating and formidable, playing a crucial role in structure-based drug discovery and unraveling diseases with elusive origins. The Critical Assessment of Protein Structure Prediction (CASP) serves as a biannual battleground where global scientists converge to untangle the intricate relationships within amino acid chains. Two primary methods, Template-Based Modeling (TBM) and Template-Free (TF) strategies, dominate protein structure prediction. The trend has shifted towards Template-Free predictions due to their broader sequence coverage with fewer templates. The predictive process can be broadly classified into contact map, binned-distance, and real-valued distance predictions, each with distinctive strengths and limitations manifested through tailored loss functions. We have also introduced revolutionary end-to-end, and all-atom diffusion-based techniques that have transformed protein structure predictions. Recent advancements in deep learning techniques have significantly improved prediction accuracy, although the effectiveness is contingent upon the quality of input features derived from natural bio-physiochemical attributes and Multiple Sequence Alignments (MSA). Hence, the generation of high-quality MSA data holds paramount importance in harnessing informative input features for enhanced prediction outcomes. Remarkable successes have been achieved in protein structure prediction accuracy, however not enough for what structural knowledge was intended to, which implies need for development in some other aspects of the predictions. In this regard, scientists have opened other frontiers for protein structural prediction. The utilization of subsampling in multiple sequence alignment (MSA) and protein language modeling appears to be particularly promising in enhancing the accuracy and efficiency of predictions, ultimately aiding in drug discovery efforts. The exploration of predicting protein complex structure also opens up exciting opportunities to deepen our knowledge of molecular interactions and design therapeutics that are more effective. In this article, we have discussed the vicissitudes that the scientists have gone through to improve prediction accuracy, and examined the effective policies in predicting from different aspects, including the construction of high quality MSA, providing informative input features, and progresses in deep learning approaches. We have also briefly touched upon transitioning from predicting single-chain protein structures to predicting protein complex structures. Our findings point towards promoting open research environments to support the objectives of protein structure prediction.
Collapse
Affiliation(s)
- Faezeh Rahimzadeh
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | | | - Pedram Salehpoor
- Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Faegheh Golabi
- Department of Biomedical Engineering, Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Shahin PourBahrami
- Department of Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran
| |
Collapse
|
36
|
Souza BR, Codo BC, Romano-Silva MA, Tropepe V. Darpp-32 is regulated by dopamine and is required for the formation of GABAergic neurons in the developing telencephalon. Prog Neuropsychopharmacol Biol Psychiatry 2024; 134:111060. [PMID: 38906412 DOI: 10.1016/j.pnpbp.2024.111060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 04/22/2024] [Accepted: 06/17/2024] [Indexed: 06/23/2024]
Abstract
DARPP-32 (dopamine and cAMP-regulated phosphoprotein Mr. 32 kDa) is a phosphoprotein that is modulated by multiple receptors integrating intracellular pathways and playing roles in various physiological functions. It is regulated by dopaminergic receptors through the cAMP/protein kinase A (PKA) pathway, which modulates the phosphorylation of threonine 34 (Thr34). When phosphorylated at Thr34, DARPP-32 becomes a potent protein phosphatase-1 (PP1) inhibitor. Since dopamine is involved in the development of GABAergic neurons and DARPP-32 is expressed in the developing brain, it is possible that DARPP-32 has a role in GABAergic neuronal development. We cloned the zebrafish darpp-32 gene (ppp1r1b) gene and observed that it is evolutionarily conserved in its inhibitory domain (Thr34 and surrounding residues) and the docking motif (residues 7-11 (KKIQF)). We also characterized darpp-32 protein expression throughout the 5 days post-fertilization (dpf) zebrafish larval brain by immunofluorescence and demonstrated that darpp-32 is mainly expressed in regions that receive dopaminergic projections (pallium, subpallium, preoptic region, and hypothalamus). We demonstrated that dopamine acutely suppressed darpp-32 activity by reducing the levels of p-darpp-32 in the 5dpf zebrafish larval brain. In addition, the knockdown of darpp-32 resulted in a decrease in the number of GABAergic neurons in the subpallium of the 5dpf larval brain, with a concomitant increase in the number of DAergic neurons. Finally, we demonstrated that darpp-32 downregulation during development reduced the motor behavior of 5dpf zebrafish larvae. Thus, our observations suggest that darpp-32 is an evolutionarily conserved regulator of dopamine receptor signaling and is required for the formation of GABAergic neurons in the developing telencephalon.
Collapse
Affiliation(s)
- Bruno Rezende Souza
- Laboratório NeuroDEv, Department of Physiology and Biophysics, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil 31270-901; Laboratório de Neurociências Molecular e Comportamental (LANEC) - Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil.
| | - Beatriz Campos Codo
- Laboratório NeuroDEv, Department of Physiology and Biophysics, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil 31270-901; Laboratório de Neurociências Molecular e Comportamental (LANEC) - Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brazil
| | - Marco Aurélio Romano-Silva
- Laboratório de Neurociências and INCT de Medicina Molecular, Department of Mental Health, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil 30130-100
| | - Vincent Tropepe
- Department of Cell & Systems Biology, University of Toronto, Toronto, ON, Canada M5S 3G5.
| |
Collapse
|
37
|
Yutin N, Tolstoy I, Mutz P, Wolf YI, Krupovic M, Koonin EV. DNA polymerase swapping in Caudoviricetes bacteriophages. Virol J 2024; 21:200. [PMID: 39187833 PMCID: PMC11348598 DOI: 10.1186/s12985-024-02482-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 08/21/2024] [Indexed: 08/28/2024] Open
Abstract
BACKGROUND Viruses with double-stranded (ds) DNA genomes in the realm Duplodnaviria share a conserved structural gene module but show a broad range of variation in their repertoires of DNA replication proteins. Some of the duplodnaviruses encode (nearly) complete replication systems whereas others lack (almost) all genes required for replication, relying on the host replication machinery. DNA polymerases (DNAPs) comprise the centerpiece of the DNA replication apparatus. The replicative DNAPs are classified into 4 unrelated or distantly related families (A-D), with the protein structures and sequences within each family being, generally, highly conserved. More than half of the duplodnaviruses encode a DNAP of family A, B or C. We showed previously that multiple pairs of closely related viruses in the order Crassvirales encode DNAPs of different families. METHODS Groups of phages in which DNAP swapping likely occurred were identified as subtrees of a defined depth in a comprehensive evolutionary tree of tailed bacteriophages that included phages with DNAPs of different families. The DNAP swaps were validated by constrained tree analysis that was performed on phylogenetic tree of large terminase subunits, and the phage genomes encoding swapped DNAPs were aligned using Mauve. The structures of the discovered unusual DNAPs were predicted using AlphaFold2. RESULTS We identified four additional groups of tailed phages in the class Caudoviricetes in which the DNAPs apparently were swapped on multiple occasions, with replacements occurring both between families A and B, or A and C, or between distinct subfamilies within the same family. The DNAP swapping always occurs "in situ", without changes in the organization of the surrounding genes. In several cases, the DNAP gene is the only region of substantial divergence between closely related phage genomes, whereas in others, the swap apparently involved neighboring genes encoding other proteins involved in phage genome replication. In addition, we identified two previously undetected, highly divergent groups of family A DNAPs that are encoded in some phage genomes along with the main DNAP implicated in genome replication. CONCLUSIONS Replacement of the DNAP gene by one encoding a DNAP of a different family occurred on many independent occasions during the evolution of different families of tailed phages, in some cases, resulting in very closely related phages encoding unrelated DNAPs. DNAP swapping was likely driven by selection for avoidance of host antiphage mechanisms targeting the phage DNAP that remain to be identified, and/or by selection against replicon incompatibility.
Collapse
Affiliation(s)
- Natalya Yutin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Igor Tolstoy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pascal Mutz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mart Krupovic
- Archaeal Virology Unit, Institut Pasteur, Université Paris Cité, Paris, France
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
38
|
Ghazikhani H, Butler G. Exploiting protein language models for the precise classification of ion channels and ion transporters. Proteins 2024; 92:998-1055. [PMID: 38656743 DOI: 10.1002/prot.26694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 03/26/2024] [Accepted: 04/08/2024] [Indexed: 04/26/2024]
Abstract
This study introduces TooT-PLM-ionCT, a comprehensive framework that consolidates three distinct systems, each meticulously tailored for one of the following tasks: distinguishing ion channels (ICs) from membrane proteins (MPs), segregating ion transporters (ITs) from MPs, and differentiating ICs from ITs. Drawing upon the strengths of six Protein Language Models (PLMs)-ProtBERT, ProtBERT-BFD, ESM-1b, ESM-2 (650M parameters), and ESM-2 (15B parameters), TooT-PLM-ionCT employs a combination of traditional classifiers and deep learning models for nuanced protein classification. Originally validated on an existing dataset by previous researchers, our systems demonstrated superior performance in identifying ITs from MPs and distinguishing ICs from ITs, with the IC-MP discrimination achieving state-of-the-art results. In light of recommendations for additional validation, we introduced a new dataset, significantly enhancing the robustness and generalization of our models across bioinformatics challenges. This new evaluation underscored the effectiveness of TooT-PLM-ionCT in adapting to novel data while maintaining high classification accuracy. Furthermore, this study explores critical factors affecting classification accuracy, such as dataset balancing, the impact of using frozen versus fine-tuned PLM representations, and the variance between half and full precision in floating-point computations. To facilitate broader application and accessibility, a web server (https://tootsuite.encs.concordia.ca/service/TooT-PLM-ionCT) has been developed, allowing users to evaluate unknown protein sequences through our specialized systems for IC-MP, IT-MP, and IC-IT classification tasks.
Collapse
Affiliation(s)
- Hamed Ghazikhani
- Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada
| | - Gregory Butler
- Centre for Structural and Functional Genomics, Concordia University, Montréal, Québec, Canada
| |
Collapse
|
39
|
Anderson T, Wheeler TJ. An FPGA-based hardware accelerator supporting sensitive sequence homology filtering with profile hidden Markov models. BMC Bioinformatics 2024; 25:247. [PMID: 39075359 PMCID: PMC11285124 DOI: 10.1186/s12859-024-05879-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 07/23/2024] [Indexed: 07/31/2024] Open
Abstract
BACKGROUND Sequence alignment lies at the heart of genome sequence annotation. While the BLAST suite of alignment tools has long held an important role in alignment-based sequence database search, greater sensitivity is achieved through the use of profile hidden Markov models (pHMMs). Here, we describe an FPGA hardware accelerator, called HAVAC, that targets a key bottleneck step (SSV) in the analysis pipeline of the popular pHMM alignment tool, HMMER. RESULTS The HAVAC kernel calculates the SSV matrix at 1739 GCUPS on a ∼ $3000 Xilinx Alveo U50 FPGA accelerator card, ∼ 227× faster than the optimized SSV implementation in nhmmer. Accounting for PCI-e data transfer data processing, HAVAC is 65× faster than nhmmer's SSV with one thread and 35× faster than nhmmer with four threads, and uses ∼ 31% the energy of a traditional high end Intel CPU. CONCLUSIONS HAVAC demonstrates the potential offered by FPGA hardware accelerators to produce dramatic speed gains in sequence annotation and related bioinformatics applications. Because these computations are performed on a co-processor, the host CPU remains free to simultaneously compute other aspects of the analysis pipeline.
Collapse
Affiliation(s)
- Tim Anderson
- Department of Computer Science, University of Montana, Missoula, MT, USA
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ, USA.
| |
Collapse
|
40
|
Yutin N, Mutz P, Krupovic M, Koonin EV. Mriyaviruses: small relatives of giant viruses. mBio 2024; 15:e0103524. [PMID: 38832788 PMCID: PMC11253617 DOI: 10.1128/mbio.01035-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Accepted: 05/01/2024] [Indexed: 06/05/2024] Open
Abstract
The phylum Nucleocytoviricota consists of large and giant viruses that range in genome size from about 100 kilobases (kb) to more than 2.5 megabases. Here, using metagenome mining followed by extensive phylogenomic analysis and protein structure comparison, we delineate a distinct group of viruses with double-stranded (ds) DNA genomes in the range of 35-45 kb that appear to be related to the Nucleocytoviricota. In phylogenetic trees of the conserved double jelly-roll major capsid proteins (MCPs) and DNA packaging ATPases, these viruses do not show affinity to any particular branch of the Nucleocytoviricota and accordingly would comprise a class which we propose to name "Mriyaviricetes" (after Ukrainian "mriya," dream). Structural comparison of the MCP suggests that, among the extant virus lineages, mriyaviruses are the closest one to the ancestor of the Nucleocytoviricota. In the phylogenetic trees, mriyaviruses split into two well-separated branches, the family Yaraviridae and proposed new family "Gamadviridae." The previously characterized members of these families, yaravirus and Pleurochrysis sp. endemic viruses, infect amoeba and haptophytes, respectively. The genomes of the rest of the mriyaviruses were assembled from metagenomes from diverse environments, suggesting that mriyaviruses infect various unicellular eukaryotes. Mriyaviruses lack DNA polymerase, which is encoded by all other members of the Nucleocytoviricota, and RNA polymerase subunits encoded by all cytoplasmic viruses among the Nucleocytoviricota, suggesting that they replicate in the host cell nuclei. All mriyaviruses encode a HUH superfamily endonuclease that is likely to be essential for the initiation of virus DNA replication via the rolling circle mechanism. IMPORTANCE The origin of giant viruses of eukaryotes that belong to the phylum Nucleocytoviricota is not thoroughly understood and remains a matter of major interest and debate. Here, we combine metagenome database searches with extensive protein sequence and structure analysis to describe a distinct group of viruses with comparatively small genomes of 35-45 kilobases that appear to comprise a distinct class within the phylum Nucleocytoviricota that we provisionally named "Mriyaviricetes." Mriyaviruses appear to be the closest identified relatives of the ancestors of the Nucleocytoviricota. Analysis of proteins encoded in mriyavirus genomes suggests that they replicate their genome via the rolling circle mechanism that is unusual among viruses with double-stranded DNA genomes and so far not described for members of Nucleocytoviricota.
Collapse
Affiliation(s)
- Natalya Yutin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Pascal Mutz
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, Archaeal Virology Unit, Paris, France
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| |
Collapse
|
41
|
Mei C, Li X, Yan P, Feng B, Mamat A, Wang J, Li N. Identification of Apple Flower Development-Related Gene Families and Analysis of Transcriptional Regulation. Int J Mol Sci 2024; 25:7510. [PMID: 39062752 PMCID: PMC11277112 DOI: 10.3390/ijms25147510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 06/28/2024] [Accepted: 06/29/2024] [Indexed: 07/28/2024] Open
Abstract
Apple (Malus domestica Borkh.) stands out as a globally significant fruit tree with considerable economic importance. Nonetheless, the orchard production of 'Fuji' apples faces significant challenges, including delayed flowering in young trees and inconsistent annual yields in mature trees, ultimately resulting in suboptimal fruit yield due to insufficient flower bud formation. Flower development represents a pivotal process influencing plant adaptation to environmental conditions and is a crucial determinant of successful plant reproduction. The three gene or transcription factor (TF) families, C2H2, DELLA, and FKF1, have emerged as key regulators in plant flowering regulation; however, understanding their roles during apple flowering remains limited. Consequently, this study identified 24 MdC2H2, 6 MdDELLA, and 6 MdFKF1 genes in the apple genome with high confidence. Through phylogenetic analyses, the genes within each family were categorized into three distinct subgroups, with all facets of protein physicochemical properties and conserved motifs contingent upon subgroup classification. Repetitive events between these three gene families within the apple genome were elucidated via collinearity analysis. qRT-PCR analysis was conducted and revealed significant expression differences among MdC2H2-18, MdDELLA1, and MdFKF1-4 during apple bud development. Furthermore, yeast two-hybrid analysis unveiled an interaction between MdC2H2-18 and MdDELLA1. The genome-wide identification of the C2H2, DELLA, and FKF1 gene families in apples has shed light on the molecular mechanisms underlying apple flower bud development.
Collapse
Affiliation(s)
- Chuang Mei
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Key Laboratory of Genome Research and Genetic Improvement of Xinjiang Characteristic Fruits and Vegetables, Institute of Horticultural Crops, Xinjiang Academy of Agricultural Sciences, Urumqi 830091, China; (C.M.); (X.L.); (P.Y.); (B.F.); (A.M.)
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Urumqi 830091, China
| | - Xianguo Li
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Key Laboratory of Genome Research and Genetic Improvement of Xinjiang Characteristic Fruits and Vegetables, Institute of Horticultural Crops, Xinjiang Academy of Agricultural Sciences, Urumqi 830091, China; (C.M.); (X.L.); (P.Y.); (B.F.); (A.M.)
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Urumqi 830091, China
| | - Peng Yan
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Key Laboratory of Genome Research and Genetic Improvement of Xinjiang Characteristic Fruits and Vegetables, Institute of Horticultural Crops, Xinjiang Academy of Agricultural Sciences, Urumqi 830091, China; (C.M.); (X.L.); (P.Y.); (B.F.); (A.M.)
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Urumqi 830091, China
| | - Beibei Feng
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Key Laboratory of Genome Research and Genetic Improvement of Xinjiang Characteristic Fruits and Vegetables, Institute of Horticultural Crops, Xinjiang Academy of Agricultural Sciences, Urumqi 830091, China; (C.M.); (X.L.); (P.Y.); (B.F.); (A.M.)
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Urumqi 830091, China
| | - Aisajan Mamat
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Key Laboratory of Genome Research and Genetic Improvement of Xinjiang Characteristic Fruits and Vegetables, Institute of Horticultural Crops, Xinjiang Academy of Agricultural Sciences, Urumqi 830091, China; (C.M.); (X.L.); (P.Y.); (B.F.); (A.M.)
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Urumqi 830091, China
| | - Jixun Wang
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Key Laboratory of Genome Research and Genetic Improvement of Xinjiang Characteristic Fruits and Vegetables, Institute of Horticultural Crops, Xinjiang Academy of Agricultural Sciences, Urumqi 830091, China; (C.M.); (X.L.); (P.Y.); (B.F.); (A.M.)
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Urumqi 830091, China
| | - Ning Li
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Key Laboratory of Genome Research and Genetic Improvement of Xinjiang Characteristic Fruits and Vegetables, Institute of Horticultural Crops, Xinjiang Academy of Agricultural Sciences, Urumqi 830091, China; (C.M.); (X.L.); (P.Y.); (B.F.); (A.M.)
- The State Key Laboratory of Genetic Improvement and Germplasm Innovation of Crop Resistance in Arid Desert Regions (Preparation), Urumqi 830091, China
| |
Collapse
|
42
|
Schoelmerich MC, Ly L, West-Roberts J, Shi LD, Shen C, Malvankar NS, Taib N, Gribaldo S, Woodcroft BJ, Schadt CW, Al-Shayeb B, Dai X, Mozsary C, Hickey S, He C, Beaulaurier J, Juul S, Sachdeva R, Banfield JF. Borg extrachromosomal elements of methane-oxidizing archaea have conserved and expressed genetic repertoires. Nat Commun 2024; 15:5414. [PMID: 38926353 PMCID: PMC11208441 DOI: 10.1038/s41467-024-49548-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 06/10/2024] [Indexed: 06/28/2024] Open
Abstract
Borgs are huge extrachromosomal elements (ECE) of anaerobic methane-consuming "Candidatus Methanoperedens" archaea. Here, we used nanopore sequencing to validate published complete genomes curated from short reads and to reconstruct new genomes. 13 complete and four near-complete linear genomes share 40 genes that define a largely syntenous genome backbone. We use these conserved genes to identify new Borgs from peatland soil and to delineate Borg phylogeny, revealing two major clades. Remarkably, Borg genes encoding nanowire-like electron-transferring cytochromes and cell surface proteins are more highly expressed than those of host Methanoperedens, indicating that Borgs augment the Methanoperedens activity in situ. We reconstructed the first complete 4.00 Mbp genome for a Methanoperedens that is inferred to be a Borg host and predicted its methylation motifs, which differ from pervasive TC and CC methylation motifs of the Borgs. Thus, methylation may enable Methanoperedens to distinguish their genomes from those of Borgs. Very high Borg to Methanoperedens ratios and structural predictions suggest that Borgs may be capable of encapsulation. The findings clearly define Borgs as a distinct class of ECE with shared genomic signatures, establish their diversification from a common ancestor with genetic inheritance, and raise the possibility of periodic existence outside of host cells.
Collapse
Affiliation(s)
- Marie C Schoelmerich
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Environmental Systems Sciences, ETH Zurich, 8092, Zurich, Switzerland
| | - Lynn Ly
- Oxford Nanopore Technologies Inc, New York, NY, USA
| | - Jacob West-Roberts
- Department of Environmental Science, Policy and Management, University of California, Berkeley, CA, USA
| | - Ling-Dong Shi
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Cong Shen
- Microbial Sciences Institute, Yale University, New Haven, CT, USA
- Deptartment of Molecular Biophysics and Biochemistry, Yale University, West Haven, CT, USA
| | - Nikhil S Malvankar
- Microbial Sciences Institute, Yale University, New Haven, CT, USA
- Deptartment of Molecular Biophysics and Biochemistry, Yale University, West Haven, CT, USA
| | - Najwa Taib
- Institut Pasteur, Université de Paris cité, Unit Evolutionary Biology of the Microbial Cell, Paris, France
| | - Simonetta Gribaldo
- Institut Pasteur, Université de Paris cité, Unit Evolutionary Biology of the Microbial Cell, Paris, France
| | - Ben J Woodcroft
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology (QUT), Translational Research Institute, Woolloongabba, QLD, Australia
| | - Christopher W Schadt
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
- Department of Microbiology, University of Tennessee, Knoxville, TN, USA
| | - Basem Al-Shayeb
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | | | | | - Scott Hickey
- Oxford Nanopore Technologies Inc, New York, NY, USA
| | - Christine He
- Oxford Nanopore Technologies Inc, New York, NY, USA
| | | | - Sissel Juul
- Oxford Nanopore Technologies Inc, New York, NY, USA
| | - Rohan Sachdeva
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Jillian F Banfield
- Innovative Genomics Institute, University of California, Berkeley, CA, USA.
- Department of Environmental Science, Policy and Management, University of California, Berkeley, CA, USA.
- Biomedicine Discovery Institute, Monash University, Clayton, VIC, Australia.
- Department of Earth and Planetary Science, University of California, Berkeley, CA, USA.
| |
Collapse
|
43
|
Elisée E, Ducrot L, Méheust R, Bastard K, Fossey-Jouenne A, Grogan G, Pelletier E, Petit JL, Stam M, de Berardinis V, Zaparucha A, Vallenet D, Vergne-Vaxelaire C. A refined picture of the native amine dehydrogenase family revealed by extensive biodiversity screening. Nat Commun 2024; 15:4933. [PMID: 38858403 PMCID: PMC11164908 DOI: 10.1038/s41467-024-49009-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 05/20/2024] [Indexed: 06/12/2024] Open
Abstract
Native amine dehydrogenases offer sustainable access to chiral amines, so the search for scaffolds capable of converting more diverse carbonyl compounds is required to reach the full potential of this alternative to conventional synthetic reductive aminations. Here we report a multidisciplinary strategy combining bioinformatics, chemoinformatics and biocatalysis to extensively screen billions of sequences in silico and to efficiently find native amine dehydrogenases features using computational approaches. In this way, we achieve a comprehensive overview of the initial native amine dehydrogenase family, extending it from 2,011 to 17,959 sequences, and identify native amine dehydrogenases with non-reported substrate spectra, including hindered carbonyls and ethyl ketones, and accepting methylamine and cyclopropylamine as amine donor. We also present preliminary model-based structural information to inform the design of potential (R)-selective amine dehydrogenases, as native amine dehydrogenases are mostly (S)-selective. This integrated strategy paves the way for expanding the resource of other enzyme families and in highlighting enzymes with original features.
Collapse
Affiliation(s)
- Eddy Elisée
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Laurine Ducrot
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Raphaël Méheust
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Karine Bastard
- School of Pharmacy, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, 2006, Australia
| | - Aurélie Fossey-Jouenne
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Gideon Grogan
- York Structural Biology Laboratory, Department of Chemistry, University of York, Heslington, York, YO10 5DD, UK
| | - Eric Pelletier
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Jean-Louis Petit
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Mark Stam
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Véronique de Berardinis
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Anne Zaparucha
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - David Vallenet
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
| | - Carine Vergne-Vaxelaire
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
| |
Collapse
|
44
|
Dahlström KM, Salminen TA. Apprehensions and emerging solutions in ML-based protein structure prediction. Curr Opin Struct Biol 2024; 86:102819. [PMID: 38631107 DOI: 10.1016/j.sbi.2024.102819] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/05/2024] [Accepted: 03/31/2024] [Indexed: 04/19/2024]
Abstract
The three-dimensional structure of proteins determines their function in vital biological processes. Thus, when the structure is known, the molecular mechanism of protein function can be understood in more detail and obtained information utilized in biotechnological, diagnostics, and therapeutic applications. Over the past five years, machine learning (ML)-based modeling has pushed protein structure prediction to the next level with AlphaFold in the front line, predicting the structure for hundreds of millions of proteins. Further advances recently report promising ML-based approaches for solving remaining challenges by incorporating functionally important metals, co-factors, post-translational modifications, structural dynamics, and interdomain and multimer interactions in the structure prediction process.
Collapse
Affiliation(s)
- Käthe M Dahlström
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Tykistökatu 6A, 20520 Turku, Finland; InFLAMES Research Flagship Center, Åbo Akademi University, 20520 Turku, Finland
| | - Tiina A Salminen
- Structural Bioinformatics Laboratory, Biochemistry, Faculty of Science and Engineering, Åbo Akademi University, Tykistökatu 6A, 20520 Turku, Finland; InFLAMES Research Flagship Center, Åbo Akademi University, 20520 Turku, Finland.
| |
Collapse
|
45
|
Toledo-Patiño S, Goetz SK, Shanmugaratnam S, Höcker B, Farías-Rico JA. Molecular handcraft of a well-folded protein chimera. FEBS Lett 2024; 598:1375-1386. [PMID: 38508768 DOI: 10.1002/1873-3468.14856] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 02/11/2024] [Accepted: 02/12/2024] [Indexed: 03/22/2024]
Abstract
Modular assembly is a compelling pathway to create new proteins, a concept supported by protein engineering and millennia of evolution. Natural evolution provided a repository of building blocks, known as domains, which trace back to even shorter segments that underwent numerous 'copy-paste' processes culminating in the scaffolds we see today. Utilizing the subdomain-database Fuzzle, we constructed a fold-chimera by integrating a flavodoxin-like fragment into a periplasmic binding protein. This chimera is well-folded and a crystal structure reveals stable interfaces between the fragments. These findings demonstrate the adaptability of α/β-proteins and offer a stepping stone for optimization. By emphasizing the practicality of fragment databases, our work pioneers new pathways in protein engineering. Ultimately, the results substantiate the conjecture that periplasmic binding proteins originated from a flavodoxin-like ancestor.
Collapse
Affiliation(s)
- Saacnicteh Toledo-Patiño
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Okinawa Institute of Science and Technology Graduate University, Japan
| | | | - Sooruban Shanmugaratnam
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Department of Biochemistry, University of Bayreuth, Germany
| | - Birte Höcker
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Department of Biochemistry, University of Bayreuth, Germany
| | - José Arcadio Farías-Rico
- Max Planck Institute for Developmental Biology, Tübingen, Germany
- Synthetic Biology Program, Center for Genome Sciences, National Autonomous University of Mexico, Cuernavaca, Mexico
| |
Collapse
|
46
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024; 16:261-288. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
47
|
Bell RT, Sahakyan H, Makarova KS, Wolf YI, Koonin EV. CoCoNuTs are a diverse subclass of Type IV restriction systems predicted to target RNA. eLife 2024; 13:RP94800. [PMID: 38739430 PMCID: PMC11090510 DOI: 10.7554/elife.94800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2024] Open
Abstract
A comprehensive census of McrBC systems, among the most common forms of prokaryotic Type IV restriction systems, followed by phylogenetic analysis, reveals their enormous abundance in diverse prokaryotes and a plethora of genomic associations. We focus on a previously uncharacterized branch, which we denote coiled-coil nuclease tandems (CoCoNuTs) for their salient features: the presence of extensive coiled-coil structures and tandem nucleases. The CoCoNuTs alone show extraordinary variety, with three distinct types and multiple subtypes. All CoCoNuTs contain domains predicted to interact with translation system components, such as OB-folds resembling the SmpB protein that binds bacterial transfer-messenger RNA (tmRNA), YTH-like domains that might recognize methylated tmRNA, tRNA, or rRNA, and RNA-binding Hsp70 chaperone homologs, along with RNases, such as HEPN domains, all suggesting that the CoCoNuTs target RNA. Many CoCoNuTs might additionally target DNA, via McrC nuclease homologs. Additional restriction systems, such as Type I RM, BREX, and Druantia Type III, are frequently encoded in the same predicted superoperons. In many of these superoperons, CoCoNuTs are likely regulated by cyclic nucleotides, possibly, RNA fragments with cyclic termini, that bind associated CARF (CRISPR-Associated Rossmann Fold) domains. We hypothesize that the CoCoNuTs, together with the ancillary restriction factors, employ an echeloned defense strategy analogous to that of Type III CRISPR-Cas systems, in which an immune response eliminating virus DNA and/or RNA is launched first, but then, if it fails, an abortive infection response leading to PCD/dormancy via host RNA cleavage takes over.
Collapse
Affiliation(s)
- Ryan T Bell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaUnited States
| | - Harutyun Sahakyan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaUnited States
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaUnited States
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaUnited States
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesdaUnited States
| |
Collapse
|
48
|
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Petabase-Scale Homology Search for Structure Prediction. Cold Spring Harb Perspect Biol 2024; 16:a041465. [PMID: 38316555 PMCID: PMC11065157 DOI: 10.1101/cshperspect.a041465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
Collapse
Affiliation(s)
- Sewon Lee
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Gyuri Kim
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Sukhwan Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
49
|
Dohnálek V, Doležal P. Installation of LYRM proteins in early eukaryotes to regulate the metabolic capacity of the emerging mitochondrion. Open Biol 2024; 14:240021. [PMID: 38772414 PMCID: PMC11293456 DOI: 10.1098/rsob.240021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 03/13/2024] [Indexed: 05/23/2024] Open
Abstract
Core mitochondrial processes such as the electron transport chain, protein translation and the formation of Fe-S clusters (ISC) are of prokaryotic origin and were present in the bacterial ancestor of mitochondria. In animal and fungal models, a family of small Leu-Tyr-Arg motif-containing proteins (LYRMs) uniformly regulates the function of mitochondrial complexes involved in these processes. The action of LYRMs is contingent upon their binding to the acylated form of acyl carrier protein (ACP). This study demonstrates that LYRMs are structurally and evolutionarily related proteins characterized by a core triplet of α-helices. Their widespread distribution across eukaryotes suggests that 12 specialized LYRMs were likely present in the last eukaryotic common ancestor to regulate the assembly and folding of the subunits that are conserved in bacteria but that lack LYRM homologues. The secondary reduction of mitochondria to anoxic environments has rendered the function of LYRMs and their interaction with acylated ACP dispensable. Consequently, these findings strongly suggest that early eukaryotes installed LYRMs in aerobic mitochondria as orchestrated switches, essential for regulating core metabolism and ATP production.
Collapse
Affiliation(s)
- Vít Dohnálek
- Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Vestec252 50, Czech Republic
| | - Pavel Doležal
- Department of Parasitology, Faculty of Science, Charles University, BIOCEV, Vestec252 50, Czech Republic
| |
Collapse
|
50
|
Yutin N, Tolstoy I, Mutz P, Wolf YI, Krupovic M, Koonin EV. Jumping DNA polymerases in bacteriophages. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.26.591309. [PMID: 38903090 PMCID: PMC11188092 DOI: 10.1101/2024.04.26.591309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/22/2024]
Abstract
Viruses with double-stranded (ds) DNA genomes in the realm Duplodnaviria share a conserved structural gene module but show a broad range of variation in their repertoires of DNA replication proteins. Some of the duplodnaviruses encode (nearly) complete replication systems whereas others lack (almost) all genes required for replication, relying on the host replication machinery. DNA polymerases (DNAPs) comprise the centerpiece of the DNA replication apparatus. The replicative DNAPs are classified into 4 unrelated or distantly related families (A-D), with the protein structures and sequences within each family being, generally, highly conserved. More than half of the duplodnaviruses encode a DNAP of family A, B or C. We showed previously that multiple pairs of closely related viruses in the order Crassvirales encode DNAPs of different families. Here we identify four additional groups of tailed phages in the class Caudoviricetes in which the DNAPs apparently were swapped on multiple occasions, with replacements occurring both between families A and B, or A and C, or between distinct subfamilies within the same family. The DNAP swapping always occurs "in situ", without changes in the organization of the surrounding genes. In several cases, the DNAP gene is the only region of substantial divergence between closely related phage genomes, whereas in others, the swap apparently involved neighboring genes encoding other proteins involved in phage replication. We hypothesize that DNAP swapping is driven by selection for avoidance of host antiphage mechanisms targeting the phage DNAP that remain to be identified, and/or by selection against replicon incompatibility. In addition, we identified two previously undetected, highly divergent groups of family A DNAPs that are encoded in some phage genomes along with the main DNAP implicated in genome replication.
Collapse
Affiliation(s)
- Natalya Yutin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Igor Tolstoy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Pascal Mutz
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Mart Krupovic
- Institut Pasteur, Université Paris Cité, Archaeal Virology Unit, Paris, France
| | | |
Collapse
|