1
|
Kim TD, Pretorius D, Murray JW, Cardona T. Exploring the Structural Diversity and Evolution of the D1 Subunit of Photosystem II Using AlphaFold and Foldtree. PHYSIOLOGIA PLANTARUM 2025; 177:e70284. [PMID: 40401773 PMCID: PMC12096807 DOI: 10.1111/ppl.70284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2025] [Revised: 04/24/2025] [Accepted: 04/29/2025] [Indexed: 05/23/2025]
Abstract
Although our knowledge of photosystem II has expanded to include time-resolved atomic details, the diversity of experimental structures of the enzyme remains limited. Recent advances in protein structure prediction with AlphaFold offer a promising approach to fill this gap in structural diversity in non-model systems. This study used AlphaFold to predict the structures of the D1 protein, the core subunit of photosystem II, across a broad range of photosynthetic organisms. The prediction produced high-confidence structures, and structural alignment analyses highlighted conserved regions across the different D1 groups, which were in line with high pLDDT scoring regions. In contrast, varying pLDDT in the DE loop and terminal regions appears to correlate with different degrees of structural flexibility or disorder. Subsequent structural phylogenetic analysis using Foldtree provided a tree that is in good agreement with previous sequence-based studies. Moreover, the phylogeny supports a parsimonious scenario in which far-red D1 and D1INT evolved from an ancestral form of G4 D1. This work demonstrates the potential of AlphaFold and Foldtree to study the molecular evolution of photosynthesis.
Collapse
Affiliation(s)
- Tom Dongmin Kim
- School of Biological and Behavioural SciencesQueen Mary University of LondonLondonUK
- Department of Life SciencesImperial College LondonLondonUK
| | | | | | - Tanai Cardona
- School of Biological and Behavioural SciencesQueen Mary University of LondonLondonUK
- Department of Life SciencesImperial College LondonLondonUK
| |
Collapse
|
2
|
Choopanian P, Andressoo JO, Mirzaie M. A fast approach for structural and evolutionary analysis based on energetic profile protein comparison. Nat Commun 2025; 16:2231. [PMID: 40044697 PMCID: PMC11882786 DOI: 10.1038/s41467-025-57374-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Accepted: 02/14/2025] [Indexed: 03/09/2025] Open
Abstract
In structural bioinformatics, the efficiency of predicting protein similarity, function, and evolutionary relationships is crucial. Our approach proposed herein leverages protein energy profiles derived from a knowledge-based potential, deviating from traditional methods relying on structural alignment or atomic distances. This method assigns unique energy profiles to individual proteins, facilitating rapid comparative analysis for both structural similarities and evolutionary relationships across various hierarchical levels. Our study demonstrates that energy profiles contain substantial information about protein structure at class, fold, superfamily, and family levels. Notably, these profiles accurately distinguish proteins across species, illustrated by the classification of coronavirus spike glycoproteins and bacteriocin proteins. Introducing a separation measure based on energy profile similarity, our method shows significant correlation with a network-based approach, emphasizing the potential of energy profiles as efficient predictors for drug combinations with faster computational requirements. Our key insight is that the sequence-based energy profile strongly correlates with structure-derived energy, enabling rapid and efficient protein comparisons based solely on sequences.
Collapse
Affiliation(s)
- Peyman Choopanian
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Jaan-Olle Andressoo
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
- Division of Neurogeriatrics, Department of Neurobiology, Care Sciences and Society (NVS), Karolinska Institutet, Stockholm, Sweden.
| | - Mehdi Mirzaie
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
3
|
Hills FR, Geoghegan JL, Bostina M. Architects of infection: A structural overview of SARS-related coronavirus spike glycoproteins. Virology 2025; 604:110383. [PMID: 39983449 DOI: 10.1016/j.virol.2024.110383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2024] [Revised: 12/22/2024] [Accepted: 12/29/2024] [Indexed: 02/23/2025]
Abstract
The frequency of zoonotic viral emergence within the Coronaviridae family highlights the critical need to understand the structural features of spike proteins that govern viral entry and host adaptation. Investigating the structural conservation and variation in key regions of the spike protein-those involved in host range, binding affinity, viral entry, and immune evasion-is essential for predicting the evolutionary pathways of coronaviruses, assessing the risk of future host-jumping events, and discovering pan-neutralising antibodies. Here we summarise our current structural understanding of the spike proteins similar to SARS-CoV-2 from the Coronaviridae family and compare key functional similarities and differences. Our aim is to demonstrate the significant structural and sequence conservation between spike proteins from a range of host species and to outline the importance of animal coronavirus surveillance and structural investigation in our endeavour for pandemic preparedness against emerging viruses.
Collapse
Affiliation(s)
- Francesca R Hills
- Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand
| | - Jemma L Geoghegan
- Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand
| | - Mihnea Bostina
- Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand.
| |
Collapse
|
4
|
Bou Dagher L, Madern D, Malbos P, Brochier-Armanet C. Faithful Interpretation of Protein Structures through Weighted Persistent Homology Improves Evolutionary Distance Estimation. Mol Biol Evol 2025; 42:msae271. [PMID: 39761698 PMCID: PMC11789942 DOI: 10.1093/molbev/msae271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 12/02/2024] [Accepted: 12/20/2024] [Indexed: 02/05/2025] Open
Abstract
Phylogenetic inference is mainly based on sequence analysis and requires reliable alignments. This can be challenging, especially when sequences are highly divergent. In this context, the use of three-dimensional protein structures is a promising alternative. In a recent study, we introduced an original topological data analysis method based on persistent homology to estimate the evolutionary distances from structures. The method was successfully tested on 518 protein families representing 22,940 predicted structures. However, as anticipated, the reliability of the estimated evolutionary distances was impacted by the quality of the predicted structures and the presence of indels in the proteins. This paper introduces a new topological descriptor, called bio-topological marker (BTM), which provides a more faithful description of the structures, a topological analysis for estimating evolutionary distances from BTMs, and a new weight-filtering method adapted to protein structures. These new developments significantly improve the estimation of evolutionary distances and phylogenies inferred from structures.
Collapse
Affiliation(s)
- Léa Bou Dagher
- Universite Claude Bernard Lyon 1, LBBE, UMR 5558, CNRS, VAS, Villeurbanne F-69622, France
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, Villeurbanne F-69622, France
- Laboratoire de mathématiques, École Doctorale en Science et Technologie, Université Libanaise, Post Box 5, Hadath, Liban
| | | | - Philippe Malbos
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, Villeurbanne F-69622, France
| | - Céline Brochier-Armanet
- Universite Claude Bernard Lyon 1, LBBE, UMR 5558, CNRS, VAS, Villeurbanne F-69622, France
- Institut Universitaire de France
| |
Collapse
|
5
|
Kraemer MUG, Tsui JLH, Chang SY, Lytras S, Khurana MP, Vanderslott S, Bajaj S, Scheidwasser N, Curran-Sebastian JL, Semenova E, Zhang M, Unwin HJT, Watson OJ, Mills C, Dasgupta A, Ferretti L, Scarpino SV, Koua E, Morgan O, Tegally H, Paquet U, Moutsianas L, Fraser C, Ferguson NM, Topol EJ, Duchêne DA, Stadler T, Kingori P, Parker MJ, Dominici F, Shadbolt N, Suchard MA, Ratmann O, Flaxman S, Holmes EC, Gomez-Rodriguez M, Schölkopf B, Donnelly CA, Pybus OG, Cauchemez S, Bhatt S. Artificial intelligence for modelling infectious disease epidemics. Nature 2025; 638:623-635. [PMID: 39972226 PMCID: PMC11987553 DOI: 10.1038/s41586-024-08564-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 12/20/2024] [Indexed: 02/21/2025]
Abstract
Infectious disease threats to individual and public health are numerous, varied and frequently unexpected. Artificial intelligence (AI) and related technologies, which are already supporting human decision making in economics, medicine and social science, have the potential to transform the scope and power of infectious disease epidemiology. Here we consider the application to infectious disease modelling of AI systems that combine machine learning, computational statistics, information retrieval and data science. We first outline how recent advances in AI can accelerate breakthroughs in answering key epidemiological questions and we discuss specific AI methods that can be applied to routinely collected infectious disease surveillance data. Second, we elaborate on the social context of AI for infectious disease epidemiology, including issues such as explainability, safety, accountability and ethics. Finally, we summarize some limitations of AI applications in this field and provide recommendations for how infectious disease epidemiology can harness most effectively current and future developments in AI.
Collapse
Affiliation(s)
- Moritz U G Kraemer
- Pandemic Sciences Institute, University of Oxford, Oxford, UK.
- Department of Biology, University of Oxford, Oxford, UK.
| | - Joseph L-H Tsui
- Pandemic Sciences Institute, University of Oxford, Oxford, UK
- Department of Biology, University of Oxford, Oxford, UK
| | - Serina Y Chang
- Department of Electrical Engineering and Computer Science, University of California Berkeley, Berkeley, CA, USA
- UCSF UC Berkeley Joint Program in Computational Precision Health, Berkeley, CA, USA
| | - Spyros Lytras
- Division of Systems Virology, Department of Microbiology and Immunology, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Mark P Khurana
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Samantha Vanderslott
- Pandemic Sciences Institute, University of Oxford, Oxford, UK
- Oxford Vaccine Group, University of Oxford and NIHR Oxford Biomedical Research Centre, Oxford, UK
| | - Sumali Bajaj
- Department of Biology, University of Oxford, Oxford, UK
| | - Neil Scheidwasser
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | | | - Elizaveta Semenova
- Department of Epidemiology and Biostatistics, Imperial College London, London, UK
| | - Mengyan Zhang
- Department of Computer Science, University of Oxford, Oxford, UK
| | | | - Oliver J Watson
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, UK
| | - Cathal Mills
- Pandemic Sciences Institute, University of Oxford, Oxford, UK
- Department of Statistics, University of Oxford, Oxford, UK
| | - Abhishek Dasgupta
- Pandemic Sciences Institute, University of Oxford, Oxford, UK
- Doctoral Training Centre, University of Oxford, Oxford, UK
| | - Luca Ferretti
- Pandemic Sciences Institute, University of Oxford, Oxford, UK
| | - Samuel V Scarpino
- Institute for Experiential AI, Northeastern University, Boston, MA, USA
- Santa Fe Institute, Santa Fe, NM, USA
| | - Etien Koua
- World Health Organization Regional Office for Africa, Brazzaville, Congo
| | - Oliver Morgan
- WHO Hub for Pandemic and Epidemic Intelligence, Health Emergencies Programme, World Health Organization, Berlin, Germany
| | - Houriiyah Tegally
- Centre for Epidemic Response and Innovation (CERI), School for Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa
| | - Ulrich Paquet
- African Institute for Mathematical Sciences (AIMS) South Africa, Muizenberg, Cape Town, South Africa
| | | | | | - Neil M Ferguson
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, UK
| | | | - David A Duchêne
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Patricia Kingori
- The Ethox Centre, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Michael J Parker
- Pandemic Sciences Institute, University of Oxford, Oxford, UK
- The Ethox Centre, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Francesca Dominici
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Nigel Shadbolt
- Department of Computer Science, University of Oxford, Oxford, UK
- The Open Data Institute, London, UK
| | - Marc A Suchard
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, USA
| | - Oliver Ratmann
- Department of Mathematics, Imperial College London, London, UK
- Imperial-X, Imperial College, London, UK
| | - Seth Flaxman
- Department of Computer Science, University of Oxford, Oxford, UK
| | - Edward C Holmes
- School of Medical Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | | | - Bernhard Schölkopf
- Max Planck Institute for Intelligent Systems and ELLIS Institute Tübingen, Tübingen, Germany
| | - Christl A Donnelly
- Pandemic Sciences Institute, University of Oxford, Oxford, UK
- Department of Statistics, University of Oxford, Oxford, UK
| | - Oliver G Pybus
- Pandemic Sciences Institute, University of Oxford, Oxford, UK
- Department of Biology, University of Oxford, Oxford, UK
- Department of Pathobiology and Population Sciences, The Royal Veterinary College, London, UK
| | - Simon Cauchemez
- Mathematical Modelling of Infectious Diseases Unit, Institut Pasteur, Université Paris Cité, U1332 INSERM, UMR2000 CNRS, Paris, France
| | - Samir Bhatt
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
- MRC Centre for Global Infectious Disease Analysis, School of Public Health, Imperial College London, London, UK.
- Pioneer Centre for Artificial Intelligence University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
6
|
Baltzis A, Santus L, Langer BE, Magis C, de Vienne DM, Gascuel O, Mansouri L, Notredame C. multistrap: boosting phylogenetic analyses with structural information. Nat Commun 2025; 16:293. [PMID: 39814729 PMCID: PMC11735642 DOI: 10.1038/s41467-024-55264-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 12/04/2024] [Indexed: 01/18/2025] Open
Abstract
In a phylogeny, trustworthy reliability branch support estimates are as important as the tree itself. We show that reliability support values based on bootstrapping can be improved by combining sequence and structural information from proteins. Our approach relies on the systematic comparison of homologous intra-molecular structural distances. These variations exhibit less saturation than sequence-based Hamming distances and support the computation of tree-like distance matrices resolvable into phylogenetic trees using distance-based methods such as minimum evolution. These trees bear strong similarities to their sequence-based counterparts and allow the estimation of bootstrap support values, but they are sufficiently distinct so that their information content may be combined. The combined sequence and structure bootstrap support values yield improved discrimination between correct and incorrect branches. In this work we show that our approach, named multistrap, is suitable for the improvement of bootstrap branch support values using both predicted and experimental 3D structures.
Collapse
Affiliation(s)
- Athanasios Baltzis
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Luisa Santus
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Björn E Langer
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Cedrik Magis
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain
| | - Damien M de Vienne
- Univ Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR5558, Villeurbanne, France
| | - Olivier Gascuel
- Institut de Systématique, Evolution, Biodiversité (UMR 7205-CNRS, Muséum National d'Histoire Naturelle, SU, EPHE UA), Paris, France
| | - Leila Mansouri
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.
| | - Cedric Notredame
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.
- Universitat Pompeu Fabra (UPF), Barcelona, Spain.
| |
Collapse
|
7
|
Jácome R. Structural and Evolutionary Analysis of Proteins Endowed with a Nucleotidyltransferase, or Non-canonical Palm, Catalytic Domain. J Mol Evol 2024; 92:799-814. [PMID: 39297932 DOI: 10.1007/s00239-024-10207-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 09/09/2024] [Indexed: 09/21/2024]
Abstract
Many polymerases and other proteins are endowed with a catalytic domain belonging to the nucleotidyltransferase fold, which has also been deemed the non-canonical palm domain, in which three conserved acidic residues coordinate two divalent metal ions. Tertiary structure-based evolutionary analyses provide valuable information when the phylogenetic signal contained in the primary structure is blurry or has been lost, as is the case with these proteins. Pairwise structural comparisons of proteins with a nucleotidyltransferase fold were performed in the PDBefold web server: the RMSD, the number of superimposed residues, and the Qscore were obtained. The structural alignment score (RMSD × 100/number of superimposed residues) and the 1-Qscore were calculated, and distance matrices were constructed, from which a dendogram and a phylogenetic network were drawn for each score. The dendograms and the phylogenetic networks display well-defined clades, reflecting high levels of structural conservation within each clade, not mirrored by primary sequence. The conserved structural core between all these proteins consists of the catalytic nucleotidyltransferase fold, which is surrounded by different functional domains. Hence, many of the clades include proteins that bind different substrates or partake in non-related functions. Enzymes endowed with a nucleotidyltransferase fold are present in all domains of life, and participate in essential cellular and viral functions, which suggests that this domain is very ancient. Despite the loss of evolutionary traces in their primary structure, tertiary structure-based analyses allow us to delve into the evolution and functional diversification of the NT fold.
Collapse
Affiliation(s)
- Rodrigo Jácome
- Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, México.
| |
Collapse
|
8
|
Chanket W, Pipatthana M, Sangphukieo A, Harnvoravongchai P, Chankhamhaengdecha S, Janvilisri T, Phanchana M. The complete catalog of antimicrobial resistance secondary active transporters in Clostridioides difficile: evolution and drug resistance perspective. Comput Struct Biotechnol J 2024; 23:2358-2374. [PMID: 38873647 PMCID: PMC11170357 DOI: 10.1016/j.csbj.2024.05.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 05/01/2024] [Accepted: 05/16/2024] [Indexed: 06/15/2024] Open
Abstract
Secondary active transporters shuttle substrates across eukaryotic and prokaryotic membranes, utilizing different electrochemical gradients. They are recognized as one of the antimicrobial efflux pumps among pathogens. While primary active transporters within the genome of C. difficile 630 have been completely cataloged, the systematical study of secondary active transporters remains incomplete. Here, we not only identify secondary active transporters but also disclose their evolution and role in drug resistance in C. difficile 630. Our analysis reveals that C. difficile 630 carries 147 secondary active transporters belonging to 27 (super)families. Notably, 50 (34%) of them potentially contribute to antimicrobial resistance (AMR). AMR-secondary active transporters are structurally classified into five (super)families: the p-aminobenzoyl-glutamate transporter (AbgT), drug/metabolite transporter (DMT) superfamily, major facilitator (MFS) superfamily, multidrug and toxic compound extrusion (MATE) family, and resistance-nodulation-division (RND) family. Surprisingly, complete RND genes found in C. difficile 630 are likely an evolutionary leftover from the common ancestor with the diderm. Through protein structure comparisons, we have potentially identified six novel AMR-secondary active transporters from DMT, MATE, and MFS (super)families. Pangenome analysis revealed that half of the AMR-secondary transporters are accessory genes, which indicates an important role in adaptive AMR function rather than innate physiological homeostasis. Gene expression profile firmly supports their ability to respond to a wide spectrum of antibiotics. Our findings highlight the evolution of AMR-secondary active transporters and their integral role in antibiotic responses. This marks AMR-secondary active transporters as interesting therapeutic targets to synergize with other antibiotic activity.
Collapse
Affiliation(s)
- Wannarat Chanket
- Graduate Program in Molecular Medicine, Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Methinee Pipatthana
- Department of Microbiology, Faculty of Public Health, Mahidol University, Bangkok, Thailand
| | - Apiwat Sangphukieo
- Center of Multidisciplinary Technology for Advanced Medicine (CMUTEAM), Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
| | | | | | - Tavan Janvilisri
- Department of Biochemistry, Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Matthew Phanchana
- Department of Molecular Tropical Medicine and Genetics, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand
| |
Collapse
|
9
|
Schrago CG, Mello B. Challenges in Assembling the Dated Tree of Life. Genome Biol Evol 2024; 16:evae229. [PMID: 39475308 PMCID: PMC11523137 DOI: 10.1093/gbe/evae229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/15/2024] [Indexed: 11/02/2024] Open
Abstract
The assembly of a comprehensive and dated Tree of Life (ToL) remains one of the most formidable challenges in evolutionary biology. The complexity of life's history, involving both vertical and horizontal transmission of genetic information, defies its representation by a simple bifurcating phylogeny. With the advent of genome and metagenome sequencing, vast amounts of data have become available. However, employing this information for phylogeny and divergence time inference has introduced significant theoretical and computational hurdles. This perspective addresses some key methodological challenges in assembling the dated ToL, namely, the identification and classification of homologous genes, accounting for gene tree-species tree mismatch due to population-level processes along with duplication, loss, and horizontal gene transfer, and the accurate dating of evolutionary events. Ultimately, the success of this endeavor requires new approaches that integrate knowledge databases with optimized phylogenetic algorithms capable of managing complex evolutionary models.
Collapse
Affiliation(s)
- Carlos G Schrago
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| | - Beatriz Mello
- Department of Genetics, Federal University of Rio de Janeiro, Rio de Janeiro, Brazil
| |
Collapse
|
10
|
Sabsay KR, te Velthuis AJW. Using structure prediction of negative sense RNA virus nucleoproteins to assess evolutionary relationships. Virus Evol 2024; 10:veae058. [PMID: 39129834 PMCID: PMC11315766 DOI: 10.1093/ve/veae058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 05/21/2024] [Accepted: 07/19/2024] [Indexed: 08/13/2024] Open
Abstract
Negative sense RNA viruses (NSV) include some of the most detrimental human pathogens, including the influenza, Ebola, and measles viruses. NSV genomes consist of one or multiple single-stranded RNA molecules that are encapsidated into one or more ribonucleoprotein (RNP) complexes. These RNPs consist of viral RNA, a viral RNA polymerase, and many copies of the viral nucleoprotein (NP). Current evolutionary relationships within the NSV phylum are based on the alignment of conserved RNA-dependent RNA polymerase (RdRp) domain amino acid sequences. However, the RdRp domain-based phylogeny does not address whether NP, the other core protein in the NSV genome, evolved along the same trajectory or whether several RdRp-NP pairs evolved through convergent evolution in the segmented and non-segmented NSV genome architectures. Addressing how NP and the RdRp domain evolved may help us better understand NSV diversity. Since NP sequences are too short to infer robust phylogenetic relationships, we here used experimentally obtained and AlphaFold 2.0-predicted NP structures to probe whether evolutionary relationships can be estimated using NSV NP sequences. Following flexible structure alignments of modeled structures, we find that the structural homology of the NSV NPs reveals phylogenetic clusters that are consistent with RdRp-based clustering. In addition, we were able to assign viruses for which RdRp sequences are currently missing to phylogenetic clusters based on the available NP sequence. Both our RdRp-based and NP-based relationships deviate from the current NSV classification of the segmented Naedrevirales, which cluster with the other segmented NSVs in our analysis. Overall, our results suggest that the NSV RdRp and NP genes largely evolved along similar trajectories and even short pieces of genetic, protein-coding information can be used to infer evolutionary relationships, potentially making metagenomic analyses more valuable.
Collapse
Affiliation(s)
- Kimberly R Sabsay
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, United States
- Lewis Sigler Institute, Princeton University, Washington Road, Princeton, NJ 08544, United States
| | - Aartjan J W te Velthuis
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Washington Road, Princeton, NJ 08544, United States
| |
Collapse
|
11
|
Sabsay KR, te Velthuis AJ. Using structure prediction of negative sense RNA virus nucleoproteins to assess evolutionary relationships. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.16.580771. [PMID: 38405982 PMCID: PMC10888975 DOI: 10.1101/2024.02.16.580771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Negative sense RNA viruses (NSV) include some of the most detrimental human pathogens, including the influenza, Ebola and measles viruses. NSV genomes consist of one or multiple single-stranded RNA molecules that are encapsidated into one or more ribonucleoprotein (RNP) complexes. These RNPs consist of viral RNA, a viral RNA polymerase, and many copies of the viral nucleoprotein (NP). Current evolutionary relationships within the NSV phylum are based on alignment of conserved RNA-directed RNA polymerase (RdRp) domain amino acid sequences. However, the RdRp domain-based phylogeny does not address whether NP, the other core protein in the NSV genome, evolved along the same trajectory or whether several RdRp-NP pairs evolved through convergent evolution in the segmented and non-segmented NSV genomes architectures. Addressing how NP and the RdRp domain evolved may help us better understand NSV diversity. Since NP sequences are too short to infer robust phylogenetic relationships, we here used experimentally-obtained and AlphaFold 2.0-predicted NP structures to probe whether evolutionary relationships can be estimated using NSV NP sequences. Following flexible structure alignments of modeled structures, we find that the structural homology of the NSV NPs reveals phylogenetic clusters that are consistent with RdRp-based clustering. In addition, we were able to assign viruses for which RdRp sequences are currently missing to phylogenetic clusters based on the available NP sequence. Both our RdRp-based and NP-based relationships deviate from the current NSV classification of the segmented Naedrevirales, which cluster with the other segmented NSVs in our analysis. Overall, our results suggest that the NSV RdRp and NP genes largely evolved along similar trajectories and that even short pieces of genetic, protein-coding information can be used to infer evolutionary relationships, potentially making metagenomic analyses more valuable.
Collapse
Affiliation(s)
- Kimberly R. Sabsay
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Princeton, NJ 08544, United States
- Lewis Sigler Institute, Princeton University, Princeton, NJ 08544, United States
| | - Aartjan J.W. te Velthuis
- Lewis Thomas Laboratory, Department of Molecular Biology, Princeton University, Princeton, NJ 08544, United States
| |
Collapse
|
12
|
Rapp E, Wolf M. 18S rDNA sequence-structure phylogeny of the eukaryotes simultaneously inferred from sequences and their individual secondary structures. BMC Res Notes 2024; 17:124. [PMID: 38693573 PMCID: PMC11064340 DOI: 10.1186/s13104-024-06786-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 04/23/2024] [Indexed: 05/03/2024] Open
Abstract
OBJECTIVE The eukaryotic tree of life has been subject of numerous studies ever since the nineteenth century, with more supergroups and their sister relations being decoded in the last years. In this study, we reconstructed the phylogeny of eukaryotes using complete 18S rDNA sequences and their individual secondary structures simultaneously. After the sequence-structure data was encoded, it was automatically aligned and analyzed using sequence-only as well as sequence-structure approaches. We present overall neighbor-joining trees of 211 eukaryotes as well as the respective profile neighbor-joining trees, which helped to resolve the basal branching pattern. A manually chosen subset was further inspected using neighbor-joining, maximum parsimony, and maximum likelihood analyses. Additionally, the 75 and 100 percent consensus structures of the subset were predicted. RESULTS All sequence-structure approaches show improvements compared to the respective sequence-only approaches: the average bootstrap support per node of the sequence-structure profile neighbor-joining analyses with 90.3, was higher than the average bootstrap support of the sequence-only profile neighbor-joining analysis with 73.9. Also, the subset analyses using sequence-structure data were better supported. Furthermore, more subgroups of the supergroups were recovered as monophyletic and sister group relations were much more comparable to results as obtained by multi-marker analyses.
Collapse
Affiliation(s)
- Eva Rapp
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Matthias Wolf
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany.
| |
Collapse
|
13
|
Bou Dagher L, Madern D, Malbos P, Brochier-Armanet C. Persistent homology reveals strong phylogenetic signal in 3D protein structures. PNAS NEXUS 2024; 3:pgae158. [PMID: 38689707 PMCID: PMC11058471 DOI: 10.1093/pnasnexus/pgae158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/01/2024] [Indexed: 05/02/2024]
Abstract
Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on 3D structure comparisons are still in their infancy. In this study, we propose an effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g. classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from 10 major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
Collapse
Affiliation(s)
- Léa Bou Dagher
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
- Université Libanaise, Laboratoire de Mathématiques, École Doctorale en Science et Technologie, PO BOX 5 Hadath, Liban
| | - Dominique Madern
- University Grenoble Alpes, CEA, CNRS, IBS, 38000 Grenoble, France
| | - Philippe Malbos
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
| | - Céline Brochier-Armanet
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
14
|
Cao W, Wu LY, Xia XY, Chen X, Wang ZX, Pan XM. A sequence-based evolutionary distance method for Phylogenetic analysis of highly divergent proteins. Sci Rep 2023; 13:20304. [PMID: 37985846 PMCID: PMC10662474 DOI: 10.1038/s41598-023-47496-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 11/14/2023] [Indexed: 11/22/2023] Open
Abstract
Because of the limited effectiveness of prevailing phylogenetic methods when applied to highly divergent protein sequences, the phylogenetic analysis problem remains challenging. Here, we propose a sequence-based evolutionary distance algorithm termed sequence distance (SD), which innovatively incorporates site-to-site correlation within protein sequences into the distance estimation. In protein superfamilies, SD can effectively distinguish evolutionary relationships both within and between protein families, producing phylogenetic trees that closely align with those based on structural information, even with sequence identity less than 20%. SD is highly correlated with the similarity of the protein structure, and can calculate evolutionary distances for thousands of protein pairs within seconds using a single CPU, which is significantly faster than most protein structure prediction methods that demand high computational resources and long run times. The development of SD will significantly advance phylogenetics, providing researchers with a more accurate and reliable tool for exploring evolutionary relationships.
Collapse
Affiliation(s)
- Wei Cao
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Lu-Yun Wu
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xia-Yu Xia
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Xiang Chen
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Zhi-Xin Wang
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| | - Xian-Ming Pan
- Key Laboratory of Ministry of Education for Protein Science, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
15
|
Malik AJ, Langer D, Verma CS, Poole AM, Allison JR. Structome: a tool for the rapid assembly of datasets for structural phylogenetics. BIOINFORMATICS ADVANCES 2023; 3:vbad134. [PMID: 38046099 PMCID: PMC10692761 DOI: 10.1093/bioadv/vbad134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/17/2023] [Accepted: 09/29/2023] [Indexed: 12/05/2023]
Abstract
Summary Protein structures carry signal of common ancestry and can therefore aid in reconstructing their evolutionary histories. To expedite the structure-informed inference process, a web server, Structome, has been developed that allows users to rapidly identify protein structures similar to a query protein and to assemble datasets useful for structure-based phylogenetics. Structome was created by clustering ∼ 94 % of the structures in RCSB PDB using 90% sequence identity and representing each cluster by a centroid structure. Structure similarity between centroid proteins was calculated, and annotations from PDB, SCOP, and CATH were integrated. To illustrate utility, an H3 histone was used as a query, and results show that the protein structures returned by Structome span both sequence and structural diversity of the histone fold. Additionally, the pre-computed nexus-formatted distance matrix, provided by Structome, enables analysis of evolutionary relationships between proteins not identifiable using searches based on sequence similarity alone. Our results demonstrate that, beginning with a single structure, Structome can be used to rapidly generate a dataset of structural neighbours and allows deep evolutionary history of proteins to be studied. Availability and Implementation Structome is available at: https://structome.bii.a-star.edu.sg.
Collapse
Affiliation(s)
- Ashar J Malik
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 138671 Singapore
| | - Desiree Langer
- School of Biological Sciences, University of Auckland, 1142 Auckland, New Zealand
| | - Chandra S Verma
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 138671 Singapore
- Department of Biological Sciences, National University of Singapore, 117543 Singapore
- School of Biological Sciences, Nanyang Technological University, 637551 Singapore
| | - Anthony M Poole
- School of Biological Sciences, University of Auckland, 1142 Auckland, New Zealand
- Digital Life Institute, University of Auckland, Auckland 1142, New Zealand
| | - Jane R Allison
- School of Biological Sciences, University of Auckland, 1142 Auckland, New Zealand
- Digital Life Institute, University of Auckland, Auckland 1142, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Auckland, 1142 Auckland, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, 8041 Christchurch, New Zealand
| |
Collapse
|
16
|
Fujishiro T, Takaoka K. Class III hybrid cluster protein homodimeric architecture shows evolutionary relationship with Ni, Fe-carbon monoxide dehydrogenases. Nat Commun 2023; 14:5609. [PMID: 37709776 PMCID: PMC10502027 DOI: 10.1038/s41467-023-41289-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 08/30/2023] [Indexed: 09/16/2023] Open
Abstract
Hybrid cluster proteins (HCPs) are Fe-S-O cluster-containing metalloenzymes in three distinct classes (class I and II: monomer, III: homodimer), all of which structurally related to homodimeric Ni, Fe-carbon monoxide dehydrogenases (CODHs). Here we show X-ray crystal structure of class III HCP from Methanothermobacter marburgensis (Mm HCP), demonstrating its homodimeric architecture structurally resembles those of CODHs. Also, despite the different architectures of class III and I/II HCPs, [4Fe-4S] and hybrid clusters are found in equivalent positions in all HCPs. Structural comparison of Mm HCP and CODHs unveils some distinct features such as the environments of their homodimeric interfaces and the active site metalloclusters. Furthermore, structural analysis of Mm HCP C67Y and characterization of several Mm HCP variants with a Cys67 mutation reveal the significance of Cys67 in protein structure, metallocluster binding and hydroxylamine reductase activity. Structure-based bioinformatics analysis of HCPs and CODHs provides insights into the structural evolution of the HCP/CODH superfamily.
Collapse
Affiliation(s)
- Takashi Fujishiro
- Department of Biochemistry and Molecular Biology, Graduate School of Science and Engineering, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama, 338-8570, Japan.
| | - Kyosei Takaoka
- Department of Biochemistry and Molecular Biology, Graduate School of Science and Engineering, Saitama University, Shimo-Okubo 255, Sakura-ku, Saitama, 338-8570, Japan
| |
Collapse
|
17
|
Zheng Y, Young ND, Song J, Gasser RB. Genome-Wide Analysis of Haemonchus contortus Proteases and Protease Inhibitors Using Advanced Informatics Provides Insights into Parasite Biology and Host-Parasite Interactions. Int J Mol Sci 2023; 24:12320. [PMID: 37569696 PMCID: PMC10418638 DOI: 10.3390/ijms241512320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/24/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
Biodiversity within the animal kingdom is associated with extensive molecular diversity. The expansion of genomic, transcriptomic and proteomic data sets for invertebrate groups and species with unique biological traits necessitates reliable in silico tools for the accurate identification and annotation of molecules and molecular groups. However, conventional tools are inadequate for lesser-known organismal groups, such as eukaryotic pathogens (parasites), so that improved approaches are urgently needed. Here, we established a combined sequence- and structure-based workflow system to harness well-curated publicly available data sets and resources to identify, classify and annotate proteases and protease inhibitors of a highly pathogenic parasitic roundworm (nematode) of global relevance, called Haemonchus contortus (barber's pole worm). This workflow performed markedly better than conventional, sequence-based classification and annotation alone and allowed the first genome-wide characterisation of protease and protease inhibitor genes and gene products in this worm. In total, we identified 790 genes encoding 860 proteases and protease inhibitors representing 83 gene families. The proteins inferred included 280 metallo-, 145 cysteine, 142 serine, 121 aspartic and 81 "mixed" proteases as well as 91 protease inhibitors, all of which had marked physicochemical diversity and inferred involvements in >400 biological processes or pathways. A detailed investigation revealed a remarkable expansion of some protease or inhibitor gene families, which are likely linked to parasitism (e.g., host-parasite interactions, immunomodulation and blood-feeding) and exhibit stage- or sex-specific transcription profiles. This investigation provides a solid foundation for detailed explorations of the structures and functions of proteases and protease inhibitors of H. contortus and related nematodes, and it could assist in the discovery of new drug or vaccine targets against infections or diseases.
Collapse
Affiliation(s)
- Yuanting Zheng
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia;
| | - Neil D. Young
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia;
| | - Jiangning Song
- Department of Data Science and AI, Faculty of IT, Monash University, Melbourne, VIC 3800, Australia;
- Department of Biochemistry and Molecular Biology, Biomedicine Discovery Institute, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Robin B. Gasser
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, Parkville, VIC 3010, Australia;
| |
Collapse
|