1
|
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Petabase-Scale Homology Search for Structure Prediction. Cold Spring Harb Perspect Biol 2024; 16:a041465. [PMID: 38316555 PMCID: PMC11065157 DOI: 10.1101/cshperspect.a041465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2024]
Abstract
The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
Collapse
Affiliation(s)
- Sewon Lee
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Gyuri Kim
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
| | - Sukhwan Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Gwanak-gu, Seoul 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
2
|
Zheng W, Wuyun Q, Li Y, Zhang C, Freddolino PL, Zhang Y. Improving deep learning protein monomer and complex structure prediction using DeepMSA2 with huge metagenomics data. Nat Methods 2024; 21:279-289. [PMID: 38167654 PMCID: PMC10864179 DOI: 10.1038/s41592-023-02130-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 11/13/2023] [Indexed: 01/05/2024]
Abstract
Leveraging iterative alignment search through genomic and metagenome sequence databases, we report the DeepMSA2 pipeline for uniform protein single- and multichain multiple-sequence alignment (MSA) construction. Large-scale benchmarks show that DeepMSA2 MSAs can remarkably increase the accuracy of protein tertiary and quaternary structure predictions compared with current state-of-the-art methods. An integrated pipeline with DeepMSA2 participated in the most recent CASP15 experiment and created complex structural models with considerably higher quality than the AlphaFold2-Multimer server (v.2.2.0). Detailed data analyses show that the major advantage of DeepMSA2 lies in its balanced alignment search and effective model selection, and in the power of integrating huge metagenomics databases. These results demonstrate a new avenue to improve deep learning protein structure prediction through advanced MSA construction and provide additional evidence that optimization of input information to deep learning-based structure prediction methods must be considered with as much care as the design of the predictor itself.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - P Lydia Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA.
- Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
3
|
Robinson SL. Structure-guided metagenome mining to tap microbial functional diversity. Curr Opin Microbiol 2023; 76:102382. [PMID: 37741262 DOI: 10.1016/j.mib.2023.102382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/21/2023] [Accepted: 08/22/2023] [Indexed: 09/25/2023]
Abstract
Scientists now have access to millions of accurate three-dimensional (3D) models of protein structures. How do we leverage 3D structural models to learn about microbial functions encoded in metagenomes? Here, we review recent developments using protein structural features to mine metagenomes from diverse environments ranging from the human gut to soil and ocean viromes. We compare 3D protein structural methods to characterize antibiotic resistance phenotypes, nutrient cycling, and host-drug-microbe interactions. Broadly, we encourage the scientific community to look beyond global sequence and structure alignments by considering fine-grained descriptors such as distance to ligand, active site, and tertiary interactions between amino acid residues scaling to microbiomes. Finally, we highlight structure-inspired approaches to chart new areas of microbial protein-coding sequence space.
Collapse
Affiliation(s)
- Serina L Robinson
- Department of Environmental Microbiology, Eawag, Swiss Federal Institute of Aquatic Science and Technology, Ueberlandstrasse 133, 8600 Dübendorf, Switzerland.
| |
Collapse
|
4
|
Zheng W, Wuyun Q, Freddolino PL, Zhang Y. Integrating deep learning, threading alignments, and a multi-MSA strategy for high-quality protein monomer and complex structure prediction in CASP15. Proteins 2023; 91:1684-1703. [PMID: 37650367 PMCID: PMC10840719 DOI: 10.1002/prot.26585] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/04/2023] [Accepted: 08/14/2023] [Indexed: 09/01/2023]
Abstract
We report the results of the "UM-TBM" and "Zheng" groups in CASP15 for protein monomer and complex structure prediction. These prediction sets were obtained using the D-I-TASSER and DMFold-Multimer algorithms, respectively. For monomer structure prediction, D-I-TASSER introduced four new features during CASP15: (i) a multiple sequence alignment (MSA) generation protocol that combines multi-source MSA searching and a structural modeling-based MSA ranker; (ii) attention-network based spatial restraints; (iii) a multi-domain module containing domain partition and arrangement for domain-level templates and spatial restraints; (iv) an optimized I-TASSER-based folding simulation system for full-length model creation guided by a combination of deep learning restraints, threading alignments, and knowledge-based potentials. For 47 free modeling targets in CASP15, the final models predicted by D-I-TASSER showed average TM-score 19% higher than the standard AlphaFold2 program. We thus showed that traditional Monte Carlo-based folding simulations, when appropriately coupled with deep learning algorithms, can generate models with improved accuracy over end-to-end deep learning methods alone. For protein complex structure prediction, DMFold-Multimer generated models by integrating a new MSA generation algorithm (DeepMSA2) with the end-to-end modeling module from AlphaFold2-Multimer. For the 38 complex targets, DMFold-Multimer generated models with an average TM-score of 0.83 and Interface Contact Score of 0.60, both significantly higher than those of competing complex prediction tools. Our analyses on complexes highlighted the critical role played by MSA generating, ranking, and pairing in protein complex structure prediction. We also discuss future room for improvement in the areas of viral protein modeling and complex model ranking.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417 Singapore
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore
| |
Collapse
|
5
|
Lee S, Kim G, Karin EL, Mirdita M, Park S, Chikhi R, Babaian A, Kryshtafovych A, Steinegger M. Petascale Homology Search for Structure Prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.10.548308. [PMID: 37503235 PMCID: PMC10369885 DOI: 10.1101/2023.07.10.548308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFold's advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFold's CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.
Collapse
Affiliation(s)
- Sewon Lee
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Gyuri Kim
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
| | - Sukhwan Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
| | - Rayan Chikhi
- Institut Pasteur, Université Paris Cité, G5 Sequence Bioinformatics, 75015 Paris, France
| | - Artem Babaian
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8, Canada
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul 08826, South Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul 08826, South Korea
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul 08826, South Korea
| |
Collapse
|
6
|
Yang P, Zhu X, Ning K. Microbiome-based enrichment pattern mining has enabled a deeper understanding of the biome-species-function relationship. Commun Biol 2023; 6:391. [PMID: 37037946 PMCID: PMC10085995 DOI: 10.1038/s42003-023-04753-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 03/24/2023] [Indexed: 04/12/2023] Open
Abstract
Microbes live in diverse habitats (i.e. biomes), yet their species and genes were biome-specific, forming enrichment patterns. These enrichment patterns have mirrored the biome-species-function relationship, which is shaped by ecological and evolutionary principles. However, a grand picture of these enrichment patterns, as well as the roles of external and internal factors in driving these enrichment patterns, remain largely unexamined. In this work, we have examined the enrichment patterns based on 1705 microbiome samples from four representative biomes (Engineered, Gut, Freshwater, and Soil). Moreover, an "enrichment sphere" model was constructed to elucidate the regulatory principles behind these patterns. The driving factors for this model were revealed based on two case studies: (1) The copper-resistance genes were enriched in Soil biomes, owing to the copper contamination and horizontal gene transfer. (2) The flagellum-related genes were enriched in the Freshwater biome, due to high fluidity and vertical gene accumulation. Furthermore, this enrichment sphere model has valuable applications, such as in biome identification for metagenome samples, and in guiding 3D structure modeling of proteins. In summary, the enrichment sphere model aims towards creating a bluebook of the biome-species-function relationships and be applied in many fields.
Collapse
Affiliation(s)
- Pengshuo Yang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
- Institute of Medical Genomics, Biomedical Sciences College, Shandong First Medical University, Shandong, 250117, China
| | - Xue Zhu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Kang Ning
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.
- Institute of Medical Genomics, Biomedical Sciences College, Shandong First Medical University, Shandong, 250117, China.
| |
Collapse
|
7
|
Designing of thiazolidinones against chicken pox, monkey pox, and hepatitis viruses: A computational approach. Comput Biol Chem 2023; 103:107827. [PMID: 36805155 PMCID: PMC9922439 DOI: 10.1016/j.compbiolchem.2023.107827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2022] [Revised: 02/03/2023] [Accepted: 02/05/2023] [Indexed: 02/14/2023]
Abstract
Computational designing of four different series (D-G) of thiazolidinone was done starting from different amines which was further condensed with various aldehydes. These underwent in silico molecular investigations for density functional theory (DFT), molecular docking, and absorption, distribution metabolism, excretion, and toxicity (ADMET) studies. The different electrochemical parameters of the compounds are predicted using quantum mechanical modeling approach with Gaussian. The docking software was used to dock the compounds against choosing PDB file for chickenpox, human immunodeficiency, hepatitis, and monkeypox virus as 1OSN, 1VZV, 6VLK, 1RTD, 3I7H, 3TYV, 4JU3, and 4QWO, respectively. The molecular interactions were visualized with discovery studio and maximum binding affinity was observed with D8 compounds against 4QWO (-13.383 kcal/mol) while for compound D5 against 1VZV which was -12.713 kcal/mol. Swiss ADME web tool was used to assess the drug-likeness of the designed compounds under consideration, and it is concluded that these molecules had a drug-like structure with almost zero violations.
Collapse
|
8
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
- *Correspondence: Fotis A. Baltoumas, ; Nikos C. Kyrpides, ; Georgios A. Pavlopoulos,
| |
Collapse
|
9
|
Characterization of Treponema denticola Major Surface Protein (Msp) by Deletion Analysis and Advanced Molecular Modeling. J Bacteriol 2022; 204:e0022822. [PMID: 35913147 PMCID: PMC9487533 DOI: 10.1128/jb.00228-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
Treponema denticola, a keystone pathogen in periodontitis, is a model organism for studying Treponema physiology and host-microbe interactions. Its major surface protein Msp forms an oligomeric outer membrane complex that binds fibronectin, has cytotoxic pore-forming activity, and disrupts several intracellular processes in host cells. T. denticola msp is an ortholog of the Treponema pallidum tprA to -K gene family that includes tprK, whose remarkable in vivo hypervariability is proposed to contribute to T. pallidum immune evasion. We recently identified the primary Msp surface-exposed epitope and proposed a model of the Msp protein as a β-barrel protein similar to Gram-negative bacterial porins. Here, we report fine-scale Msp mutagenesis demonstrating that both the N and C termini as well as the centrally located Msp surface epitope are required for native Msp oligomer expression. Removal of as few as three C-terminal amino acids abrogated Msp detection on the T. denticola cell surface, and deletion of four residues resulted in complete loss of detectable Msp. Substitution of a FLAG tag for either residues 6 to 13 of mature Msp or an 8-residue portion of the central Msp surface epitope resulted in expression of full-length Msp but absence of the oligomer, suggesting roles for both domains in oligomer formation. Consistent with previously reported Msp N-glycosylation, proteinase K treatment of intact cells released a 25 kDa polypeptide containing the Msp surface epitope into culture supernatants. Molecular modeling of Msp using novel metagenome-derived multiple sequence alignment (MSA) algorithms supports the hypothesis that Msp is a large-diameter, trimeric outer membrane porin-like protein whose potential transport substrate remains to be identified. IMPORTANCE The Treponema denticola gene encoding its major surface protein (Msp) is an ortholog of the T. pallidum tprA to -K gene family that includes tprK, whose remarkable in vivo hypervariability is proposed to contribute to T. pallidum immune evasion. Using a combined strategy of fine-scale mutagenesis and advanced predictive molecular modeling, we characterized the Msp protein and present a high-confidence model of its structure as an oligomer embedded in the outer membrane. This work adds to knowledge of Msp-like proteins in oral treponemes and may contribute to understanding the evolutionary and potential functional relationships between T. denticola Msp and the orthologous T. pallidum Tpr proteins.
Collapse
|
10
|
Zheng W, Wuyun Q, Zhou X, Li Y, Freddolino PL, Zhang Y. LOMETS3: integrating deep learning and profile alignment for advanced protein template recognition and function annotation. Nucleic Acids Res 2022; 50:W454-W464. [PMID: 35420129 PMCID: PMC9252734 DOI: 10.1093/nar/gkac248] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 03/29/2022] [Accepted: 03/31/2022] [Indexed: 11/25/2022] Open
Abstract
Deep learning techniques have significantly advanced the field of protein structure prediction. LOMETS3 (https://zhanglab.ccmb.med.umich.edu/LOMETS/) is a new generation meta-server approach to template-based protein structure prediction and function annotation, which integrates newly developed deep learning threading methods. For the first time, we have extended LOMETS3 to handle multi-domain proteins and to construct full-length models with gradient-based optimizations. Starting from a FASTA-formatted sequence, LOMETS3 performs four steps of domain boundary prediction, domain-level template identification, full-length template/model assembly and structure-based function prediction. The output of LOMETS3 contains (i) top-ranked templates from LOMETS3 and its component threading programs, (ii) up to 5 full-length structure models constructed by L-BFGS (limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm) optimization, (iii) the 10 closest Protein Data Bank (PDB) structures to the target, (iv) structure-based functional predictions, (v) domain partition and assembly results, and (vi) the domain-level threading results, including items (i)–(iii) for each identified domain. LOMETS3 was tested in large-scale benchmarks and the blind CASP14 (14th Critical Assessment of Structure Prediction) experiment, where the overall template recognition and function prediction accuracy is significantly beyond its predecessors and other state-of-the-art threading approaches, especially for hard targets without homologous templates in the PDB. Based on the improved developments, LOMETS3 should help significantly advance the capability of broader biomedical community for template-based protein structure and function modelling.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Xiaogen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
11
|
Sun J, Wu B. Protein design with a machine-learned potential about backbone designability. Trends Biochem Sci 2022; 47:638-640. [DOI: 10.1016/j.tibs.2022.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 04/07/2022] [Accepted: 04/07/2022] [Indexed: 10/18/2022]
|
12
|
Hou Q, Pucci F, Pan F, Xue F, Rooman M, Feng Q. Using metagenomic data to boost protein structure prediction and discovery. Comput Struct Biotechnol J 2022; 20:434-442. [PMID: 35070166 PMCID: PMC8760478 DOI: 10.1016/j.csbj.2021.12.030] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 12/17/2021] [Accepted: 12/21/2021] [Indexed: 11/19/2022] Open
Abstract
Over the past decade, metagenomic sequencing approaches have been providing an ever-increasing amount of protein sequence data at an astonishing rate. These constitute an invaluable source of information which has been exploited in various research fields such as the study of the role of the gut microbiota in human diseases and aging. However, only a small fraction of all metagenomic sequences collected have been functionally or structurally characterized, leaving much of them completely unexplored. Here, we review how this information has been used in protein structure prediction and protein discovery. We begin by presenting some widely used metagenomic databases and analyze in detail how metagenomic data has contributed to the impressive improvement in the accuracy of structure prediction methods in recent years. We then examine how metagenomic information can be exploited to annotate protein sequences. More specifically, we focus on the role of metagenomes in the discovery of enzymes and new CRISPR-Cas systems, and in the identification of antibiotic resistance genes. With this review, we provide an overview of how metagenomic data is currently revolutionizing our understanding of protein science.
Collapse
Affiliation(s)
- Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250012, China
- National Institute of Health Data Science of China, Shandong University, Shandong 250002, China
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Fengming Pan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250012, China
- National Institute of Health Data Science of China, Shandong University, Shandong 250002, China
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250012, China
- National Institute of Health Data Science of China, Shandong University, Shandong 250002, China
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Qiang Feng
- Shandong Provincial Key Laboratory of Oral Tissue Regeneration & Shandong Engineering Laboratory for Dental Materials and Oral Tissue Regeneration, Department of Human Microbiome, School of Stomatology, Shandong University, Jinan, Shandong Province 250012, China
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, Shandong Province 266237, China
| |
Collapse
|