1
|
Filis G, Bezantakou D, Rigkos K, Noti D, Saridis P, Zarafeta D, Skretas G. ProteoSeeker: A Feature-Rich Metagenomic Analysis Tool for Accessible and Comprehensive Metagenomic Exploration. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025; 12:e2414877. [PMID: 40130725 PMCID: PMC12097006 DOI: 10.1002/advs.202414877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Revised: 02/26/2025] [Indexed: 03/26/2025]
Abstract
The vast majority of microbial diversity remains unculturable, limiting access to novel biotechnological resources. Advances in metagenomics have expanded the understanding of microbial communities, yet targeted protein discovery remains challenging. This study introduces ProteoSeeker, a command-line tool for streamlined metagenomic protein identification and annotation. ProteoSeeker operates in two primary modes: i) Seek mode, which screens the proteins according to user-defined protein families, and ii) Taxonomy mode, which uncovers the taxonomy of the host organisms. By automating key steps, ProteoSeeker reduces computational complexity, enabling time-efficient and comprehensive metagenomic analysis for both specialized and nonspecialized users. The efficiency of ProteoSeeker to achieve targeted enzyme discovery is demonstrated by identifying extremophilic enzymes with desired biochemical features, such as amylases for starch hydrolysis and carbonic anhydrases for CO₂ capture applications. By democratizing functional metagenomics, ProteoSeeker is anticipated to accelerate biotechnology, synthetic biology, and biomedical research and innovation.
Collapse
Affiliation(s)
- Georgios Filis
- Institute for BioinnovationBiomedical Sciences Research Center “Alexander Fleming”Vari16672Greece
- Institute of Chemical BiologyNational Hellenic Research FoundationAthens11635Greece
- Department of Informatics and TelecommunicationsNational and Kapodistrian University of AthensAthens16122Greece
| | - Dimitra Bezantakou
- Institute for BioinnovationBiomedical Sciences Research Center “Alexander Fleming”Vari16672Greece
| | - Konstantinos Rigkos
- Institute for BioinnovationBiomedical Sciences Research Center “Alexander Fleming”Vari16672Greece
- Institute of Chemical BiologyNational Hellenic Research FoundationAthens11635Greece
- Department of Biological Applications and TechnologiesUniversity of IoanninaIoannina45500Greece
| | - Despina Noti
- Institute of Chemical BiologyNational Hellenic Research FoundationAthens11635Greece
| | - Pavlos Saridis
- Institute of Chemical BiologyNational Hellenic Research FoundationAthens11635Greece
- Faculty of BiologyNational and Kapodistrian University of AthensAthens15772Greece
| | - Dimitra Zarafeta
- Institute for BioinnovationBiomedical Sciences Research Center “Alexander Fleming”Vari16672Greece
- Institute of Chemical BiologyNational Hellenic Research FoundationAthens11635Greece
| | - Georgios Skretas
- Institute for BioinnovationBiomedical Sciences Research Center “Alexander Fleming”Vari16672Greece
- Institute of Chemical BiologyNational Hellenic Research FoundationAthens11635Greece
| |
Collapse
|
2
|
Sun J, Ru J, Cribbs AP, Xiong D. PyPropel: a Python-based tool for efficiently processing and characterising protein data. BMC Bioinformatics 2025; 26:70. [PMID: 40025421 PMCID: PMC11871610 DOI: 10.1186/s12859-025-06079-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2024] [Accepted: 02/10/2025] [Indexed: 03/04/2025] Open
Abstract
BACKGROUND The volume of protein sequence data has grown exponentially in recent years, driven by advancements in metagenomics. Despite this, a substantial proportion of these sequences remain poorly annotated, underscoring the need for robust bioinformatics tools to facilitate efficient characterisation and annotation for functional studies. RESULTS We present PyPropel, a Python-based computational tool developed to streamline the large-scale analysis of protein data, with a particular focus on applications in machine learning. PyPropel integrates sequence and structural data pre-processing, feature generation, and post-processing for model performance evaluation and visualisation, offering a comprehensive solution for handling complex protein datasets. CONCLUSION PyPropel provides added value over existing tools by offering a unified workflow that encompasses the full spectrum of protein research, from raw data pre-processing to functional annotation and model performance analysis, thereby supporting efficient protein function studies.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, University of Oxford, Headington, Oxford, OX3 7LD, UK.
| | - Jinlong Ru
- Chair of Prevention of Microbial Diseases, School of Life Sciences Weihenstephan, Technical University of Munich, 85354, Freising, Germany
| | - Adam P Cribbs
- Botnar Research Centre, University of Oxford, Headington, Oxford, OX3 7LD, UK
| | - Dapeng Xiong
- Department of Computational Biology, Cornell University, Ithaca, 14853, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, USA.
| |
Collapse
|
3
|
Hermans P, Tsishyn M, Schwersensky M, Rooman M, Pucci F. Exploring Evolution to Uncover Insights Into Protein Mutational Stability. Mol Biol Evol 2025; 42:msae267. [PMID: 39786559 PMCID: PMC11721782 DOI: 10.1093/molbev/msae267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 11/27/2024] [Accepted: 11/28/2024] [Indexed: 01/12/2025] Open
Abstract
Determining the impact of mutations on the thermodynamic stability of proteins is essential for a wide range of applications such as rational protein design and genetic variant interpretation. Since protein stability is a major driver of evolution, evolutionary data are often used to guide stability predictions. Many state-of-the-art stability predictors extract evolutionary information from multiple sequence alignments of proteins homologous to a query protein, and leverage it to predict the effects of mutations on protein stability. To evaluate the power and the limitations of such methods, we used the massive amount of stability data recently obtained by deep mutational scanning to study how best to construct multiple sequence alignments and optimally extract evolutionary information from them. We tested different evolutionary models and found that, unexpectedly, independent-site models achieve similar accuracy to more complex epistatic models. A detailed analysis of the latter models suggests that their inference often results in noisy couplings, which do not appear to add predictive power over the independent-site contribution, at least in the context of stability prediction. Interestingly, by combining any of the evolutionary features with a simple structural feature, the relative solvent accessibility of the mutated residue, we achieved similar prediction accuracy to supervised, machine learning-based, protein stability change predictors. Our results provide new insights into the relationship between protein evolution and stability, and show how evolutionary information can be exploited to improve the performance of mutational stability prediction.
Collapse
Affiliation(s)
- Pauline Hermans
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Matsvei Tsishyn
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels 1050, Belgium
| |
Collapse
|
4
|
Genc AG, McGuffin LJ. Beyond AlphaFold2: The Impact of AI for the Further Improvement of Protein Structure Prediction. Methods Mol Biol 2025; 2867:121-139. [PMID: 39576578 DOI: 10.1007/978-1-0716-4196-5_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Protein structure prediction is fundamental to molecular biology and has numerous applications in areas such as drug discovery and protein engineering. Machine learning techniques have greatly advanced protein 3D modeling in recent years, particularly with the development of AlphaFold2 (AF2), which can analyze sequences of amino acids and predict 3D structures with near experimental accuracy. Since the release of AF2, numerous studies have been conducted, either using AF2 directly for large-scale modeling or building upon the software for other use cases. Many reviews have been published discussing the impact of AF2 in the field of protein bioinformatics, particularly in relation to neural networks, which have highlighted what AF2 can and cannot do. It is evident that AF2 and similar approaches are open to further development and several new approaches have emerged, in addition to older refinement approaches, for improving the quality of predictions. Here we provide a brief overview, aimed at the general biologist, of how machine learning techniques have been used for improvement of 3D models of proteins following AF2, and we highlight the impacts of these approaches. In the most recent experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP15), the most successful groups all developed their own tools for protein structure modeling that were based at least in some part on AF2. This improvement involved employing techniques such as generative modeling, changing parameters such as dropout to generate more AF2 structures, and data-driven approaches including using alternative templates and MSAs.
Collapse
Affiliation(s)
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK.
| |
Collapse
|
5
|
Ariaeenejad S, Gharechahi J, Foroozandeh Shahraki M, Fallah Atanaki F, Han JL, Ding XZ, Hildebrand F, Bahram M, Kavousi K, Hosseini Salekdeh G. Precision enzyme discovery through targeted mining of metagenomic data. NATURAL PRODUCTS AND BIOPROSPECTING 2024; 14:7. [PMID: 38200389 PMCID: PMC10781932 DOI: 10.1007/s13659-023-00426-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 12/19/2023] [Indexed: 01/12/2024]
Abstract
Metagenomics has opened new avenues for exploring the genetic potential of uncultured microorganisms, which may serve as promising sources of enzymes and natural products for industrial applications. Identifying enzymes with improved catalytic properties from the vast amount of available metagenomic data poses a significant challenge that demands the development of novel computational and functional screening tools. The catalytic properties of all enzymes are primarily dictated by their structures, which are predominantly determined by their amino acid sequences. However, this aspect has not been fully considered in the enzyme bioprospecting processes. With the accumulating number of available enzyme sequences and the increasing demand for discovering novel biocatalysts, structural and functional modeling can be employed to identify potential enzymes with novel catalytic properties. Recent efforts to discover new polysaccharide-degrading enzymes from rumen metagenome data using homology-based searches and machine learning-based models have shown significant promise. Here, we will explore various computational approaches that can be employed to screen and shortlist metagenome-derived enzymes as potential biocatalyst candidates, in conjunction with the wet lab analytical methods traditionally used for enzyme characterization.
Collapse
Affiliation(s)
- Shohreh Ariaeenejad
- Department of Systems and Synthetic Biology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREEO), Karaj, Iran
| | - Javad Gharechahi
- Human Genetics Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Mehdi Foroozandeh Shahraki
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Fereshteh Fallah Atanaki
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Jian-Lin Han
- Livestock Genetics Program, International Livestock Research, Institute (ILRI), Nairobi, 00100, Kenya
- CAAS-ILRI Joint Laboratory On Livestock and Forage Genetic Resources, Institute of Animal Science, Chinese Academy of Agricultural Sciences (CAAS), Beijing, 100193, China
| | - Xue-Zhi Ding
- Key Laboratory of Yak Breeding Engineering, Lanzhou Institute of Husbandry and Pharmaceutical Sciences, Chinese Academy of Agricultural Sciences (CAAS), Lanzhou, 730050, China
| | - Falk Hildebrand
- Gut Microbes and Health, Quadram Institute Bioscience, Norwich, Norfolk, UK
- Digital Biology, Earlham Institute, Norwich, Norfolk, UK
| | - Mohammad Bahram
- Department of Ecology, Swedish University of Agricultural Sciences, Ulls Väg 16, 756 51, Uppsala, Sweden
- Department of Botany, Institute of Ecology and Earth Sciences, University of Tartu, 40 Lai St, Tartu, Estonia
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.
| | | |
Collapse
|
6
|
Pavlopoulos GA, Baltoumas FA, Liu S, Selvitopi O, Camargo AP, Nayfach S, Azad A, Roux S, Call L, Ivanova NN, Chen IM, Paez-Espino D, Karatzas E, Iliopoulos I, Konstantinidis K, Tiedje JM, Pett-Ridge J, Baker D, Visel A, Ouzounis CA, Ovchinnikov S, Buluç A, Kyrpides NC. Unraveling the functional dark matter through global metagenomics. Nature 2023; 622:594-602. [PMID: 37821698 PMCID: PMC10584684 DOI: 10.1038/s41586-023-06583-7] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 08/30/2023] [Indexed: 10/13/2023]
Abstract
Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.
Collapse
Affiliation(s)
- Georgios A Pavlopoulos
- Institute for Fundamental Biomedical Research, Biomedical Science Research Center Alexander Fleming, Vari, Greece.
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece.
| | - Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, Biomedical Science Research Center Alexander Fleming, Vari, Greece
| | - Sirui Liu
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, USA
| | - Oguz Selvitopi
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Antonio Pedro Camargo
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Stephen Nayfach
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Ariful Azad
- Luddy School of Informatics, Computing and Engineering, Indiana University Bloomington, Bloomington, IN, USA
| | - Simon Roux
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Lee Call
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Natalia N Ivanova
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - I Min Chen
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - David Paez-Espino
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, Biomedical Science Research Center Alexander Fleming, Vari, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, Heraklion, Greece
| | | | - James M Tiedje
- Center for Microbial Ecology, Michigan State University, East Lansing, MI, USA
| | - Jennifer Pett-Ridge
- Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, CA, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Axel Visel
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Christos A Ouzounis
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Biological Computation & Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece
- Biological Computation & Computational Biology Group, Artificial Intelligence & Information Analysis Lab, School of Informatics, Aristotle University of Thessalonica, Thessalonica, Greece
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, USA
| | - Aydin Buluç
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
7
|
Contreras MJ, Núñez-Montero K, Bruna P, Zárate A, Pezo F, García M, Leal K, Barrientos L. Mammals' sperm microbiome: current knowledge, challenges, and perspectives on metagenomics of seminal samples. Front Microbiol 2023; 14:1167763. [PMID: 37138598 PMCID: PMC10149849 DOI: 10.3389/fmicb.2023.1167763] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 03/31/2023] [Indexed: 05/05/2023] Open
Abstract
Bacterial growth is highly detrimental to sperm quality and functionality. However, during the last few years, using sequencing techniques with a metagenomic approach, it has been possible to deepen the study of bacteria-sperm relationships and describe non-culturable species and synergistic and antagonistic relationships between the different species in mammalian animals. We compile the recent metagenomics studies performed on mammalian semen samples and provide updated evidence to understand the importance of the microbial communities in the results of sperm quality and sperm functionality of males, looking for future perspectives on how these technologies can collaborate in the development of andrological knowledge.
Collapse
Affiliation(s)
- María José Contreras
- Extreme Environments Biotechnology Lab, Center of Excellence in Translational Medicine, Universidad de La Frontera, Temuco, Chile
| | - Kattia Núñez-Montero
- Facultad de Ciencias de la Salud, Instituto de Ciencias Biomédicas, Universidad Autónoma de Chile, Temuco, Chile
| | - Pablo Bruna
- Extreme Environments Biotechnology Lab, Center of Excellence in Translational Medicine, Universidad de La Frontera, Temuco, Chile
| | - Ana Zárate
- Extreme Environments Biotechnology Lab, Center of Excellence in Translational Medicine, Universidad de La Frontera, Temuco, Chile
| | - Felipe Pezo
- Escuela de Medicina Veterinaria, Facultad de Recursos Naturales y Medicina Veterinaria, Universidad Santo Tomás, Santiago, Chile
| | - Matías García
- Extreme Environments Biotechnology Lab, Center of Excellence in Translational Medicine, Universidad de La Frontera, Temuco, Chile
| | - Karla Leal
- Extreme Environments Biotechnology Lab, Center of Excellence in Translational Medicine, Universidad de La Frontera, Temuco, Chile
| | - Leticia Barrientos
- Extreme Environments Biotechnology Lab, Center of Excellence in Translational Medicine, Universidad de La Frontera, Temuco, Chile
- Scientific and Technological Bioresource Nucleus (BIOREN), Universidad de La Frontera, Temuco, Chile
- *Correspondence: Leticia Barrientos,
| |
Collapse
|