1
|
Nunes-Alves AK, Abrahão JS, de Farias ST. Yaravirus brasiliense genomic structure analysis and its possible influence on the metabolism. Genet Mol Biol 2025; 48:e20240139. [PMID: 39918235 PMCID: PMC11803573 DOI: 10.1590/1678-4685-gmb-2024-0139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 12/11/2024] [Indexed: 02/11/2025] Open
Abstract
Here we analyze the Yaravirus brasiliense, an amoeba-infecting 80-nm-sized virus with a 45-kbp dsDNA, using structural molecular modeling. Almost all of its 74 genes were previously identified as ORFans. Considering its unprecedented genetic content, we analyzed Yaravirus genome to understand its genetic organization, its proteome, and how it interacts with its host. We reported possible functions for all Yaravirus proteins. Our results suggest the first ever report of a fragment proteome, in which the proteins are separated in modules and joined together at a protein level. Given the structural resemblance between some Yaravirus proteins and proteins related to tricarboxylic acid cycle (TCA), glyoxylate cycle, and the respiratory complexes, our work also allows us to hypothesize that these viral proteins could be modulating cell metabolism by upregulation. The presence of these TCA cycle-related enzymes specifically could be trying to overcome the cycle's control points, since they are strategic proteins that maintain malate and oxaloacetate levels. Therefore, we propose that Yaravirus proteins are redirecting energy and resources towards viral production, and avoiding TCA cycle control points, "unlocking" the cycle. Altogether, our data helped understand a previously almost completely unknown virus, and a little bit more of the incredible diversity of viruses.
Collapse
Affiliation(s)
- Ana Karoline Nunes-Alves
- Universidade Federal da Paraíba, Departamento de Biologia Molecular,
Laboratório de Genética Evolutiva Paulo Leminski, João Pessoa, PB, Brazil
| | - Jônatas Santos Abrahão
- Universidade Federal de Minas Gerais, Instituto de Ciências
Biológicas, Departamento de Microbiologia, Laboratório de Vírus, Belo Horizonte, MG,
Brazil
| | - Sávio Torres de Farias
- Universidade Federal da Paraíba, Departamento de Biologia Molecular,
Laboratório de Genética Evolutiva Paulo Leminski, João Pessoa, PB, Brazil
- Network of Researchers on the Chemical Evolution of Life (NoRCEL),
Leeds, United Kingdom
| |
Collapse
|
2
|
Bryant P, Noé F. Structure prediction of alternative protein conformations. Nat Commun 2024; 15:7328. [PMID: 39187507 PMCID: PMC11347660 DOI: 10.1038/s41467-024-51507-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 08/07/2024] [Indexed: 08/28/2024] Open
Abstract
Proteins are dynamic molecules whose movements result in different conformations with different functions. Neural networks such as AlphaFold2 can predict the structure of single-chain proteins with conformations most likely to exist in the PDB. However, almost all protein structures with multiple conformations represented in the PDB have been used while training these models. Therefore, it is unclear whether alternative protein conformations can be genuinely predicted using these networks, or if they are simply reproduced from memory. Here, we train a structure prediction network, Cfold, on a conformational split of the PDB to generate alternative conformations. Cfold enables efficient exploration of the conformational landscape of monomeric protein structures. Over 50% of experimentally known nonredundant alternative protein conformations evaluated here are predicted with high accuracy (TM-score > 0.8).
Collapse
Affiliation(s)
- Patrick Bryant
- Department of Mathematics and Informatics, Freie Universität Berlin, Arnimallee 12, 14195, Berlin, Germany.
- The Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Svante Arrhenius väg 20C, 114 18, Stockholm, Sweden.
- Science for Life Laboratory, 172 21, Solna, Sweden.
| | - Frank Noé
- Department of Mathematics and Informatics, Freie Universität Berlin, Arnimallee 12, 14195, Berlin, Germany
- Microsoft Research AI4Science, Karl-Liebknecht Str. 32, 10178, Berlin, Germany
| |
Collapse
|
3
|
Zhu D, Brookes DH, Busia A, Carneiro A, Fannjiang C, Popova G, Shin D, Donohue KC, Lin LF, Miller ZM, Williams ER, Chang EF, Nowakowski TJ, Listgarten J, Schaffer DV. Optimal trade-off control in machine learning-based library design, with application to adeno-associated virus (AAV) for gene therapy. SCIENCE ADVANCES 2024; 10:eadj3786. [PMID: 38266077 PMCID: PMC10807795 DOI: 10.1126/sciadv.adj3786] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 12/22/2023] [Indexed: 01/26/2024]
Abstract
Adeno-associated viruses (AAVs) hold tremendous promise as delivery vectors for gene therapies. AAVs have been successfully engineered-for instance, for more efficient and/or cell-specific delivery to numerous tissues-by creating large, diverse starting libraries and selecting for desired properties. However, these starting libraries often contain a high proportion of variants unable to assemble or package their genomes, a prerequisite for any gene delivery goal. Here, we present and showcase a machine learning (ML) method for designing AAV peptide insertion libraries that achieve fivefold higher packaging fitness than the standard NNK library with negligible reduction in diversity. To demonstrate our ML-designed library's utility for downstream engineering goals, we show that it yields approximately 10-fold more successful variants than the NNK library after selection for infection of human brain tissue, leading to a promising glial-specific variant. Moreover, our design approach can be applied to other types of libraries for AAV and beyond.
Collapse
Affiliation(s)
- Danqing Zhu
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA 94720, USA
| | - David H. Brookes
- Biophysics Graduate Group, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Akosua Busia
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Ana Carneiro
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA 94720, USA
| | | | - Galina Popova
- Department of Anatomy, University of California San Francisco, San Francisco, CA 94143, USA
- Department of Psychiatry and Behavioural Sciences, University of California San Francisco, San Francisco, CA 94143, USA
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California San Francisco, San Francisco, CA 94143, USA
| | - David Shin
- Department of Anatomy, University of California San Francisco, San Francisco, CA 94143, USA
- Department of Psychiatry and Behavioural Sciences, University of California San Francisco, San Francisco, CA 94143, USA
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California San Francisco, San Francisco, CA 94143, USA
| | - Kevin C. Donohue
- Department of Psychiatry and Behavioural Sciences, University of California San Francisco, San Francisco, CA 94143, USA
- School of Medicine, University of California San Francisco, San Francisco, CA, USA. 94143
- Kavli Institute of Fundamental Neuroscience, University of California San Francisco, San Francisco, CA 94143, USA
- Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA 94143, USA
| | - Li F. Lin
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Zachary M. Miller
- Department of Chemistry, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Evan R. Williams
- Department of Chemistry, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Edward F. Chang
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, USA
| | - Tomasz J. Nowakowski
- Department of Anatomy, University of California San Francisco, San Francisco, CA 94143, USA
- Department of Psychiatry and Behavioural Sciences, University of California San Francisco, San Francisco, CA 94143, USA
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California San Francisco, San Francisco, CA 94143, USA
- Weill Institute for Neurosciences, University of California San Francisco, San Francisco, CA 94143, USA
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, USA
| | - Jennifer Listgarten
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - David V. Schaffer
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94720, USA
- Innovative Genomics Institute (IGI), University of California, Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
4
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
5
|
Smith DJ. From Genome Mining to Protein Engineering: A Structural Bioinformatics Route. Methods Mol Biol 2023; 2553:79-94. [PMID: 36227540 DOI: 10.1007/978-1-0716-2617-7_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This chapter outlines applications in genome mining, along with computational methods to predict protein structure and protein-ligand docking. It offers a simple computational route to rapidly identify proteins of interest from genomic and proteomic data, to accurately predict their three-dimensional structures, and to dock small molecules to their binding pockets and strategies to improve their biophysical properties depending on the needs of the experimental researcher.
Collapse
Affiliation(s)
- Derek J Smith
- Singapore Institute for Food and Biotechnology Innovation (SIFBI), Singapore, Singapore.
| |
Collapse
|
6
|
Kaushik R, Zhang KYJ. ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures. Bioinformatics 2022; 38:369-376. [PMID: 34542606 DOI: 10.1093/bioinformatics/btab666] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 09/06/2021] [Accepted: 09/16/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model quality estimation at multiple stages of protein structure prediction and thus, to further push the prediction accuracy. Here, a novel approach, named ProFitFun, for assessing the quality of protein models is proposed by harnessing the sequence and structural features of experimental protein structures in terms of the preferences of backbone dihedral angles and relative surface accessibility of their amino acid residues at the tripeptide level. The proposed approach leverages upon the backbone dihedral angle and surface accessibility preferences of the residues by accounting for its N-terminal and C-terminal neighbors in the protein structure. These preferences are used to evaluate protein structures through a machine learning approach and tested on an extensive dataset of diverse proteins. RESULTS The approach was extensively validated on a large test dataset (n = 25 005) of protein structures, comprising 23 661 models of 82 non-homologous proteins and 1344 non-homologous experimental structures. In addition, an external dataset of 40 000 models of 200 non-homologous proteins was also used for the validation of the proposed method. Both datasets were further used for benchmarking the proposed method with four different state-of-the-art methods for protein structure quality assessment. In the benchmarking, the proposed method outperformed some state-of-the-art methods in terms of Spearman's and Pearson's correlation coefficients, average GDT-TS loss, sum of z-scores and average absolute difference of predictions over corresponding observed values. The high accuracy of the proposed approach promises a potential use of the sequence and structural features in computational protein design. AVAILABILITY AND IMPLEMENTATION http://github.com/KYZ-LSB/ProTerS-FitFun. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rahul Kaushik
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| | - Kam Y J Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
7
|
Ali NF, Paracha RZ, Tahir M. In silico evaluation of molecular virus-virus interactions taking place between Cotton leaf curl Kokhran virus- Burewala strain and Tomato leaf curl New Delhi virus. PeerJ 2021; 9:e12018. [PMID: 34721952 PMCID: PMC8532979 DOI: 10.7717/peerj.12018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 07/29/2021] [Indexed: 11/20/2022] Open
Abstract
Background Cotton leaf curl disease (CLCuD) is a disease of cotton caused by begomoviruses, leading to a drastic loss in the annual yield of the crop. Pakistan has suffered two epidemics of this disease leading to the loss of billions in annual exports. The speculation that a third epidemic of CLCuD may result as consequence of the frequent occurrence of Tomato leaf curl New Delhi virus (ToLCNDV) and Cotton leaf curl Kokhran Virus-Burewala Strain (CLCuKoV-Bu) in CLCuD infected samples, demand that the interactions taking between the two viruses be properly evaluated. This study is designed to assess virus-virus interactions at the molecular level and determine the type of co-infection taking place. Methods Based on the amino acid sequences of the gene products of both CLCuKoV-Bu and ToLCNDV, protein structures were generated using different software, i.e., MODELLER, I-TASSER, QUARKS, LOMETS and RAPTORX. A consensus model for each protein was selected after model quality assessment using ERRAT, QMEANDisCo, PROCHECK Z-Score and Ramachandran plot analysis. The active and passive residues in the protein structures were identified using the CPORT server. Protein–Protein Docking was done using the HADDOCK webserver, and 169 Protein–Protein Interaction (PPIs) were performed between the proteins of the two viruses. The docked complexes were submitted to the PRODIGY server to identify the interacting residues between the complexes. The strongest interactions were determined based on the HADDOCK Score, Desolvation energy, Van der Waals Energy, Restraint Violation Energy, Electrostatic Energy, Buried Surface Area and Restraint Violation Energy, Binding Affinity and Dissociation constant (Kd). A total of 50 ns Molecular Dynamic simulations were performed on complexes that exhibited the strongest affinity in order to validate the stability of the complexes, and to remove any steric hindrances that may exist within the structures. Results Our results indicate significant interactions taking place between the proteins of the two viruses. Out of all the interactions, the strongest were observed between the Replication Initiation protein (Rep) of CLCuKoV-Bu with the Movement protein (MP), Nuclear Shuttle Protein (NSP) of ToLCNDV (DNA-B), while the weakest were seen between the Replication Enhancer protein (REn) of CLCuKoV-Bu with the REn protein of ToLCNDV. The residues identified to be taking a part in interaction belonged to domains having a pivotal role in the viral life cycle and pathogenicity. It maybe deduced that the two viruses exhibit antagonistic behavior towards each other, and the type of infection may be categorised as a type of Super Infection Exclusion (SIE) or homologous interference. However, further experimentation, in the form of transient expression analysis, is needed to confirm the nature of these interactions and increase our understanding of the direct interactions taking place between two viruses.
Collapse
Affiliation(s)
- Nida Fatima Ali
- Department of Plant Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology, Islamabad, Federal, Pakistan
| | - Rehan Zafar Paracha
- Research Center for Modeling and Simulation (RCMS), National University of Sciences and Technology, Islamabad, Federal, Pakistan
| | - Muhammad Tahir
- Department of Plant Biotechnology, Atta-ur-Rahman School of Applied Biosciences (ASAB), National University of Sciences and Technology, Islamabad, Federal, Pakistan
| |
Collapse
|
8
|
Arsiccio A, Beavis J, Raut S, Coxon CH. FVIII inhibitors display FV-neutralizing activity in the prothrombin time assay. J Thromb Haemost 2021; 19:1907-1913. [PMID: 33914406 PMCID: PMC8360109 DOI: 10.1111/jth.15355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Accepted: 04/16/2021] [Indexed: 11/28/2022]
Abstract
BACKGROUND The coagulation factors (F)V and VIII are homologous proteins that support hemostasis through their regulation of FX activity. Hemophilia A (HA) patients have reduced FVIII activity and a prolonged bleeding time that is corrected through the administration of exogenous FVIII. Around one-third of severe HA patients develop FVIII neutralizing antibodies, known as "inhibitors," which neutralize FVIII activity and preclude them from further FVIII therapy. OBJECTIVES We hypothesized that, based on the degree of homology between FV and FVIII (~40%), FVIII-neutralizing antibodies could cross react with FV. To test this hypothesis, a panel of recombinant, patient-derived, FVIII-neutralizing antibodies were screened for cross-reactivity against FV. METHODS Factor V and FVIII activity was measured using one-stage clotting assays; structural analysis was carried out using a structural approach. RESULTS We detected FV neutralizing activity with the anti-FVIII A2 domain antibody NB11B2. Because this antibody was derived from an HA inhibitor patient, FV-neutralizing activity was then evaluated in a number of HA inhibitor patient plasma samples; nine alloimmune samples had FV-neutralizing activity whereas no FV neutralizing activity was found in the two autoimmune samples available. We next examined the degree of surface homology between FV and FVIII and found that structural similarity could explain the cross reactivity of the anti-A2 antibody and likely accounts for the cross reactivity we observed in patient samples. CONCLUSIONS Although this novel observation is of interest, further work will be needed to determine whether FV neutralization in HA patient samples contributes to their bleeding diathesis.
Collapse
Affiliation(s)
- Andrea Arsiccio
- Department of Applied Science and TechnologyPolitecnico di TorinoTorinoItaly
| | - James Beavis
- Oxford Haemophilia CentreChurchill HospitalOxfordUK
| | - Sanj Raut
- National Institute for Biological Standards and ControlPotters BarUK
| | - Carmen H. Coxon
- National Institute for Biological Standards and ControlPotters BarUK
| |
Collapse
|
9
|
Bottino GF, Ferrari AJR, Gozzo FC, Martínez L. Structural discrimination analysis for constraint selection in protein modeling. Bioinformatics 2021; 37:3766-3773. [PMID: 34086840 DOI: 10.1093/bioinformatics/btab425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/07/2021] [Accepted: 06/03/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Protein structure modeling can be improved by the use of distance constraints between amino acid residues, provided such data reflects-at least partially-the native tertiary structure of the target system. In fact, only a small subset of the native contact map is necessary to successfully drive the model conformational search, so one important goal is to obtain the set of constraints with the highest true-positive rate, lowest redundancy, and greatest amount of information. In this work, we introduce a constraint evaluation and selection method based on the point-biserial correlation coefficient, which utilizes structural information from an ensemble of models to indirectly measure the power of each constraint in biasing the conformational search towards consensus structures. RESULTS Residue contact maps obtained by direct coupling analysis are systematically improved by means of discriminant analysis, reaching in some cases accuracies often seen only in modern deep-learning based approaches. When combined with an iterative modeling workflow, the proposed constraint classification optimizes the selection of the constraint set and maximizes the probability of obtaining successful models. The use of discriminant analysis for the valorization of the information of constraint data sets is a general concept with possible applications to other constraint types and modeling problems. AVAILABILITY AND IMPLEMENTATION scripts and procedures to implement the methodology presented herein are available at https://github.com/m3g/2021_Bottino_Biserial. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guilherme F Bottino
- Institute of Chemistry, University of Campinas, Campinas, SP, Brazil.,Center for Computational Engineering & Science, University of Campinas, Campinas, SP, Brazil
| | - Allan J R Ferrari
- Institute of Chemistry, University of Campinas, Campinas, SP, Brazil.,Center for Computational Engineering & Science, University of Campinas, Campinas, SP, Brazil
| | - Fabio C Gozzo
- Institute of Chemistry, University of Campinas, Campinas, SP, Brazil
| | - Leandro Martínez
- Institute of Chemistry, University of Campinas, Campinas, SP, Brazil.,Center for Computational Engineering & Science, University of Campinas, Campinas, SP, Brazil
| |
Collapse
|
10
|
Bhattacharya S, Roche R, Shuvo MH, Bhattacharya D. Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading. Front Mol Biosci 2021; 8:643752. [PMID: 34046429 PMCID: PMC8148041 DOI: 10.3389/fmolb.2021.643752] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Sequence-based protein homology detection has emerged as one of the most sensitive and accurate approaches to protein structure prediction. Despite the success, homology detection remains very challenging for weakly homologous proteins with divergent evolutionary profile. Very recently, deep neural network architectures have shown promising progress in mining the coevolutionary signal encoded in multiple sequence alignments, leading to reasonably accurate estimation of inter-residue interaction maps, which serve as a rich source of additional information for improved homology detection. Here, we summarize the latest developments in protein homology detection driven by inter-residue interaction map threading. We highlight the emerging trends in distant-homology protein threading through the alignment of predicted interaction maps at various granularities ranging from binary contact maps to finer-grained distance and orientation maps as well as their combination. We also discuss some of the current limitations and possible future avenues to further enhance the sensitivity of protein homology detection.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States
- Department of Biological Sciences, Auburn University, Auburn, AL, United States
| |
Collapse
|
11
|
Alvarado D, Cardoso-Arenas S, Corrales-García LL, Clement H, Arenas I, Montero-Dominguez PA, Olamendi-Portugal T, Zamudio F, Csoti A, Borrego J, Panyi G, Papp F, Corzo G. A Novel Insecticidal Spider Peptide that Affects the Mammalian Voltage-Gated Ion Channel hKv1.5. Front Pharmacol 2021; 11:563858. [PMID: 33597864 PMCID: PMC7883638 DOI: 10.3389/fphar.2020.563858] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 10/26/2020] [Indexed: 11/20/2022] Open
Abstract
Spider venoms include various peptide toxins that modify the ion currents, mainly of excitable insect cells. Consequently, scientific research on spider venoms has revealed a broad range of peptide toxins with different pharmacological properties, even for mammal species. In this work, thirty animal venoms were screened against hKv1.5, a potential target for atrial fibrillation therapy. The whole venom of the spider Oculicosa supermirabilis, which is also insecticidal to house crickets, caused voltage-gated potassium ion channel modulation in hKv1.5. Therefore, a peptide from the spider O. supermirabilis venom, named Osu1, was identified through HPLC reverse-phase fractionation. Osu1 displayed similar biological properties as the whole venom; so, the primary sequence of Osu1 was elucidated by both of N-terminal degradation and endoproteolytic cleavage. Based on its primary structure, a gene that codifies for Osu1 was constructed de novo from protein to DNA by reverse translation. A recombinant Osu1 was expressed using a pQE30 vector inside the E. coli SHuffle expression system. recombinant Osu1 had voltage-gated potassium ion channel modulation of human hKv1.5, and it was also as insecticidal as the native toxin. Due to its novel primary structure, and hypothesized disulfide pairing motif, Osu1 may represent a new family of spider toxins.
Collapse
Affiliation(s)
- Diana Alvarado
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Samuel Cardoso-Arenas
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Ligia-Luz Corrales-García
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
- Departamento de Alimentos, Facultad de Ciencias Farmacéuticas y Alimentarias, Universidad de Antioquia, Medellín, Colombia
| | - Herlinda Clement
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Iván Arenas
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Pavel Andrei Montero-Dominguez
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Timoteo Olamendi-Portugal
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Fernando Zamudio
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Agota Csoti
- Department of Biophysics and Cell Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Jesús Borrego
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
| | - Gyorgy Panyi
- Department of Biophysics and Cell Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Ferenc Papp
- Department of Biophysics and Cell Biology, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Gerardo Corzo
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, México
| |
Collapse
|
12
|
Runthala A, Chowdhury S. Refined template selection and combination algorithm significantly improves template-based modeling accuracy. J Bioinform Comput Biol 2020; 17:1950006. [PMID: 31057073 DOI: 10.1142/s0219720019500069] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In contrast to ab-initio protein modeling methodologies, comparative modeling is considered as the most popular and reliable algorithm to model protein structure. However, the selection of the best set of templates is still a major challenge. An effective template-ranking algorithm is developed to efficiently select only the reliable hits for predicting the protein structures. The algorithm employs the pairwise as well as multiple sequence alignments of template hits to rank and select the best possible set of templates. It captures several key sequences and structural information of template hits and converts into scores to effectively rank them. This selected set of templates is used to model a target. Modeling accuracy of the algorithm is tested and evaluated on TBM-HA domain containing CASP8, CASP9 and CASP10 targets. On an average, this template ranking and selection algorithm improves GDT-TS, GDT-HA and TM_Score by 3.531, 4.814 and 0.022, respectively. Further, it has been shown that the inclusion of structurally similar templates with ample conformational diversity is crucial for the modeling algorithm to maximally as well as reliably span the target sequence and construct its near-native model. The optimal model sampling also holds the key to predict the best possible target structure.
Collapse
Affiliation(s)
- Ashish Runthala
- 1 Department of Biological Sciences, Birla Institute of Technology and Science, Pilani-333031, India
| | - Shibasish Chowdhury
- 1 Department of Biological Sciences, Birla Institute of Technology and Science, Pilani-333031, India
| |
Collapse
|
13
|
Bhattacharya S, Bhattacharya D. Evaluating the significance of contact maps in low-homology protein modeling using contact-assisted threading. Sci Rep 2020; 10:2908. [PMID: 32076047 PMCID: PMC7031282 DOI: 10.1038/s41598-020-59834-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 02/04/2020] [Indexed: 12/02/2022] Open
Abstract
The development of improved threading algorithms for remote homology modeling is a critical step forward in template-based protein structure prediction. We have recently demonstrated the utility of contact information to boost protein threading by developing a new contact-assisted threading method. However, the nature and extent to which the quality of a predicted contact map impacts the performance of contact-assisted threading remains elusive. Here, we systematically analyze and explore this interdependence by employing our newly-developed contact-assisted threading method over a large-scale benchmark dataset using predicted contact maps from four complementary methods including direct coupling analysis (mfDCA), sparse inverse covariance estimation (PSICOV), classical neural network-based meta approach (MetaPSICOV), and state-of-the-art ultra-deep learning model (RaptorX). Experimental results demonstrate that contact-assisted threading using high-quality contacts having the Matthews Correlation Coefficient (MCC) ≥ 0.5 improves threading performance in nearly 30% cases, while low-quality contacts with MCC <0.35 degrades the performance for 50% cases. This holds true even in CASP13 dataset, where threading using high-quality contacts (MCC ≥ 0.5) significantly improves the performance of 22 instances out of 29. Collectively, our study uncovers the mutual association between the quality of predicted contacts and its possible utility in boosting threading performance for improving low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA.
- Department of Biological Sciences, Auburn University, Auburn, AL, 36849, USA.
| |
Collapse
|
14
|
Sarkar B, Ullah MA, Araf Y. A systematic and reverse vaccinology approach to design novel subunit vaccines against Dengue virus type-1 (DENV-1) and human Papillomavirus-16 (HPV-16). INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2020.100343] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
|
15
|
Abstract
Motivation Template-based modeling, including homology modeling and protein threading, is a popular method for protein 3D structure prediction. However, alignment generation and template selection for protein sequences without close templates remain very challenging. Results We present a new method called DeepThreader to improve protein threading, including both alignment generation and template selection, by making use of deep learning (DL) and residue co-variation information. Our method first employs DL to predict inter-residue distance distribution from residue co-variation and sequential information (e.g. sequence profile and predicted secondary structure), and then builds sequence-template alignment by integrating predicted distance information and sequential features through an ADMM algorithm. Experimental results suggest that predicted inter-residue distance is helpful to both protein alignment and template selection especially for protein sequences without very close templates, and that our method outperforms currently popular homology modeling method HHpred and threading method CNFpred by a large margin and greatly outperforms the latest contact-assisted protein threading method EigenTHREADER. Availability and implementation http://raptorx.uchicago.edu/ Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jianwei Zhu
- Toyota Technological Institute, Chicago, IL, USA.,Key Lab of Intelligent Information Process, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Sheng Wang
- Toyota Technological Institute, Chicago, IL, USA
| | - Dongbo Bu
- Key Lab of Intelligent Information Process, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jinbo Xu
- Toyota Technological Institute, Chicago, IL, USA
| |
Collapse
|
16
|
Holt MC, Ho CS, Morano MI, Barrett SD, Stein AJ. Improved homology modeling of the human & rat EP 4 prostanoid receptors. BMC Mol Cell Biol 2019; 20:37. [PMID: 31455205 PMCID: PMC6712885 DOI: 10.1186/s12860-019-0212-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 07/11/2019] [Indexed: 12/02/2022] Open
Abstract
Background The EP4 prostanoid receptor is one of four GPCRs that mediate the diverse actions of prostaglandin E2 (PGE2). Novel selective EP4 receptor agonists would assist to further elucidate receptor sub-type function and promote development of therapeutics for bone healing, heart failure, and other receptor associated conditions. The rat EP4 (rEP4) receptor has been used as a surrogate for the human EP4 (hEP4) receptor in multiple SAR studies. To better understand the validity of this traditional approach, homology models were generated by threading for both receptors using the RaptorX server. These models were fit to an implicit membrane using the PPM server and OPM database with refinement of intra and extracellular loops by Prime (Schrödinger). To understand the interaction between the receptors and known agonists, induced-fit docking experiments were performed using Glide and Prime (Schrödinger), with both endogenous agonists and receptor sub-type selective, small-molecule agonists. The docking scores and observed interactions were compared with radioligand displacement experiments and receptor (rat & human) activation assays monitoring cAMP. Results Rank-ordering of in silico compound docking scores aligned well with in vitro activity assay EC50 and radioligand binding Ki. We observed variations between rat and human EP4 binding pockets that have implications in future small-molecule receptor-modulator design and SAR, specifically a S103G mutation within the rEP4 receptor. Additionally, these models helped identify key interactions between the EP4 receptor and ligands including PGE2 and several known sub-type selective agonists while serving as a marked improvement over the previously reported models. Conclusions This work has generated a set of novel homology models of the rEP4 and hEP4 receptors. The homology models provide an improvement upon the previously reported model, largely due to improved solvation. The hEP4 docking scores correlates best with the cAMP activation data, where both data sets rank order Rivenprost>CAY10684 > PGE1 ≈ PGE2 > 11-deoxy-PGE1 ≈ 11-dexoy-PGE2 > 8-aza-11-deoxy-PGE1. This rank-ordering matches closely with the rEP4 receptor as well. Species-specific differences were noted for the weak agonists Sulprostone and Misoprostol, which appear to dock more readily within human receptor versus rat receptor. Electronic supplementary material The online version of this article (10.1186/s12860-019-0212-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Melissa C Holt
- Cayman Chemical Co, 1180 E. Ellsworth Rd, Ann Arbor, MI, 48108, USA
| | - Chi S Ho
- Cayman Chemical Co, 1180 E. Ellsworth Rd, Ann Arbor, MI, 48108, USA
| | - M Inés Morano
- Cayman Chemical Co, 1180 E. Ellsworth Rd, Ann Arbor, MI, 48108, USA
| | | | - Adam J Stein
- Cayman Chemical Co, 1180 E. Ellsworth Rd, Ann Arbor, MI, 48108, USA.
| |
Collapse
|
17
|
Bhattacharya S, Bhattacharya D. Does inclusion of residue-residue contact information boost protein threading? Proteins 2019; 87:596-606. [PMID: 30882932 DOI: 10.1002/prot.25684] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 02/20/2019] [Accepted: 03/13/2019] [Indexed: 12/26/2022]
Abstract
Template-based modeling is considered as one of the most successful approaches for protein structure prediction. However, reliably and accurately selecting optimal template proteins from a library of known protein structures having similar folds as the target protein and making correct alignments between the target sequence and the template structures, a template-based modeling technique known as threading, remains challenging, particularly for non- or distantly-homologous protein targets. With the recent advancement in protein residue-residue contact map prediction powered by sequence co-evolution and machine learning, here we systematically analyze the effect of inclusion of residue-residue contact information in improving the accuracy and reliability of protein threading. We develop a new threading algorithm by incorporating various sequential and structural features, and subsequently integrate residue-residue contact information as an additional scoring term for threading template selection. We show that the inclusion of contact information attains statistically significantly better threading performance compared to a baseline threading algorithm that does not utilize contact information when everything else remains the same. Experimental results demonstrate that our contact based threading approach outperforms popular threading method MUSTER, contact-assisted ab initio folding method CONFOLD2, and recent state-of-the-art contact-assisted protein threading methods EigenTHREADER and map_align on several benchmarks. Our study illustrates that the inclusion of contact maps is a promising avenue in protein threading to ultimately help to improve the accuracy of protein structure prediction.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama
| |
Collapse
|
18
|
Palomo-Ligas L, Gutiérrez-Gutiérrez F, Ochoa-Maganda VY, Cortés-Zárate R, Charles-Niño CL, Castillo-Romero A. Identification of a novel potassium channel (GiK) as a potential drug target in Giardia lamblia: Computational descriptions of binding sites. PeerJ 2019; 7:e6430. [PMID: 30834181 PMCID: PMC6397635 DOI: 10.7717/peerj.6430] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 01/10/2019] [Indexed: 12/12/2022] Open
Abstract
Background The protozoan Giardia lamblia is the causal agent of giardiasis, one of the main diarrheal infections worldwide. Drug resistance to common antigiardial agents and incidence of treatment failures have increased in recent years. Therefore, the search for new molecular targets for drugs against Giardia infection is essential. In protozoa, ionic channels have roles in their life cycle, growth, and stress response. Thus, they are promising targets for drug design. The strategy of ligand-protein docking has demonstrated a great potential in the discovery of new targets and structure-based drug design studies. Methods In this work, we identify and characterize a new potassium channel, GiK, in the genome of Giardia lamblia. Characterization was performed in silico. Because its crystallographic structure remains unresolved, homology modeling was used to construct the three-dimensional model for the pore domain of GiK. The docking virtual screening approach was employed to determine whether GiK is a good target for potassium channel blockers. Results The GiK sequence showed 24–50% identity and 50–90% positivity with 21 different types of potassium channels. The quality assessment and validation parameters indicated the reliability of the modeled structure of GiK. We identified 110 potassium channel blockers exhibiting high affinity toward GiK. A total of 39 of these drugs bind in three specific regions. Discussion The GiK pore signature sequence is related to the small conductance calcium-activated potassium channels (SKCa). The predicted binding of 110 potassium blockers to GiK makes this protein an attractive target for biological testing to evaluate its role in the life cycle of Giardia lamblia and potential candidate for the design of novel antigiardial drugs.
Collapse
Affiliation(s)
- Lissethe Palomo-Ligas
- Departamento de Fisiología, Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Filiberto Gutiérrez-Gutiérrez
- Departamento de Química, Centro Universitario de Ciencias Exactas e Ingenierías, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Verónica Yadira Ochoa-Maganda
- Departamento de Fisiología, Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Rafael Cortés-Zárate
- Departamento de Microbiología y Patología, Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Claudia Lisette Charles-Niño
- Departamento de Microbiología y Patología, Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| | - Araceli Castillo-Romero
- Departamento de Microbiología y Patología, Centro Universitario de Ciencias de la Salud, Universidad de Guadalajara, Guadalajara, Jalisco, Mexico
| |
Collapse
|
19
|
Pandey RK, Prajapati VK. Exploring sand fly salivary proteins to design multiepitope subunit vaccine to fight against visceral leishmaniasis. J Cell Biochem 2019; 120:1141-1155. [PMID: 29377223 DOI: 10.1002/jcb.26719] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 01/24/2018] [Indexed: 01/24/2023]
Abstract
Visceral leishmaniasis (VL) is caused by the parasites of Leishmania donovani complex, leads to the death of 20 000 to 40 000 people from 56 affected countries, worldwide. Till date, there is not a single available vaccine candidate to prevent the VL infection, and treatment only relies upon expensive and toxic chemotherapeutic options. Consequently, immunoinformatics approach was applied to design a multiepitope-based subunit vaccine to enhance the humoral as well as cell-mediated immunity. Constructed vaccine candidate was further subjected to evaluation on allergenicity and antigenicity and physiochemical parameters. Later on, disulfide engineering was performed to increase the stability of vaccine construct. Also, molecular docking and molecular dynamics simulation study were performed to check the binding affinity and stability of toll-like receptor-4 to vaccine construct complex. Finally, codon optimization and in silico cloning were performed to ensure the expression of proposed vaccine construct in a microbial expression system.
Collapse
Affiliation(s)
- Rajan Kumar Pandey
- Department of Biochemistry, School of Life Sciences, Central University of Rajasthan, Ajmer, India
| | - Vijay Kumar Prajapati
- Department of Biochemistry, School of Life Sciences, Central University of Rajasthan, Ajmer, India
| |
Collapse
|
20
|
Morales-Cordovilla JA, Sanchez V, Ratajczak M. Protein alignment based on higher order conditional random fields for template-based modeling. PLoS One 2018; 13:e0197912. [PMID: 29856860 PMCID: PMC5983487 DOI: 10.1371/journal.pone.0197912] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 05/10/2018] [Indexed: 11/19/2022] Open
Abstract
The query-template alignment of proteins is one of the most critical steps of template-based modeling methods used to predict the 3D structure of a query protein. This alignment can be interpreted as a temporal classification or structured prediction task and first order Conditional Random Fields have been proposed for protein alignment and proven to be rather successful. Some other popular structured prediction problems, such as speech or image classification, have gained from the use of higher order Conditional Random Fields due to the well known higher order correlations that exist between their labels and features. In this paper, we propose and describe the use of higher order Conditional Random Fields for query-template protein alignment. The experiments carried out on different public datasets validate our proposal, especially on distantly-related protein pairs which are the most difficult to align.
Collapse
Affiliation(s)
| | - Victoria Sanchez
- Dept. of Teoría de la Señal Telemática y Comunicaciones, Universidad de Granada, Granada, Spain
| | - Martin Ratajczak
- Graz University of Technology, Signal Processing and Speech Communication Laboratory, Graz, Austria
| |
Collapse
|
21
|
Aiewsakun P, Simmonds P. The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification. MICROBIOME 2018; 6:38. [PMID: 29458427 PMCID: PMC5819261 DOI: 10.1186/s40168-018-0422-7] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2017] [Accepted: 02/07/2018] [Indexed: 05/14/2023]
Abstract
BACKGROUND The International Committee on Taxonomy of Viruses (ICTV) classifies viruses into families, genera and species and provides a regulated system for their nomenclature that is universally used in virus descriptions. Virus taxonomic assignments have traditionally been based upon virus phenotypic properties such as host range, virion morphology and replication mechanisms, particularly at family level. However, gene sequence comparisons provide a clearer guide to their evolutionary relationships and provide the only information that may guide the incorporation of viruses detected in environmental (metagenomic) studies that lack any phenotypic data. RESULTS The current study sought to determine whether the existing virus taxonomy could be reproduced by examination of genetic relationships through the extraction of protein-coding gene signatures and genome organisational features. We found large-scale consistency between genetic relationships and taxonomic assignments for viruses of all genome configurations and genome sizes. The analysis pipeline that we have called 'Genome Relationships Applied to Virus Taxonomy' (GRAViTy) was highly effective at reproducing the current assignments of viruses at family level as well as inter-family groupings into orders. Its ability to correctly differentiate assigned viruses from unassigned viruses, and classify them into the correct taxonomic group, was evaluated by threefold cross-validation technique. This predicted family membership of eukaryotic viruses with close to 100% accuracy and specificity potentially enabling the algorithm to predict assignments for the vast corpus of metagenomic sequences consistently with ICTV taxonomy rules. In an evaluation run of GRAViTy, over one half (460/921) of (near)-complete genome sequences from several large published metagenomic eukaryotic virus datasets were assigned to 127 novel family-level groupings. If corroborated by other analysis methods, these would potentially more than double the number of eukaryotic virus families in the ICTV taxonomy. CONCLUSIONS A rapid and objective means to explore metagenomic viral diversity and make informed recommendations for their assignments at each taxonomic layer is essential. GRAViTy provides one means to make rule-based assignments at family and order levels in a manner that preserves the integrity and underlying organisational principles of the current ICTV taxonomy framework. Such methods are increasingly required as the vast virosphere is explored.
Collapse
Affiliation(s)
- Pakorn Aiewsakun
- Nuffield Department of Medicine, University of Oxford, Peter Medawar Building, South Parks Road, Oxford, OX1 3SY UK
| | - Peter Simmonds
- Nuffield Department of Medicine, University of Oxford, Peter Medawar Building, South Parks Road, Oxford, OX1 3SY UK
| |
Collapse
|
22
|
Hafsa NE, Berjanskii MV, Arndt D, Wishart DS. Rapid and reliable protein structure determination via chemical shift threading. JOURNAL OF BIOMOLECULAR NMR 2018; 70:33-51. [PMID: 29196969 DOI: 10.1007/s10858-017-0154-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 11/14/2017] [Indexed: 06/07/2023]
Abstract
Protein structure determination using nuclear magnetic resonance (NMR) spectroscopy can be both time-consuming and labor intensive. Here we demonstrate how chemical shift threading can permit rapid, robust, and accurate protein structure determination using only chemical shift data. Threading is a relatively old bioinformatics technique that uses a combination of sequence information and predicted (or experimentally acquired) low-resolution structural data to generate high-resolution 3D protein structures. The key motivations behind using NMR chemical shifts for protein threading lie in the fact that they are easy to measure, they are available prior to 3D structure determination, and they contain vital structural information. The method we have developed uses not only sequence and chemical shift similarity but also chemical shift-derived secondary structure, shift-derived super-secondary structure, and shift-derived accessible surface area to generate a high quality protein structure regardless of the sequence similarity (or lack thereof) to a known structure already in the PDB. The method (called E-Thrifty) was found to be very fast (often < 10 min/structure) and to significantly outperform other shift-based or threading-based structure determination methods (in terms of top template model accuracy)-with an average TM-score performance of 0.68 (vs. 0.50-0.62 for other methods). Coupled with recent developments in chemical shift refinement, these results suggest that protein structure determination, using only NMR chemical shifts, is becoming increasingly practical and reliable. E-Thrifty is available as a web server at http://ethrifty.ca .
Collapse
Affiliation(s)
- Noor E Hafsa
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
| | - Mark V Berjanskii
- Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada
| | - David Arndt
- Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada
| | - David S Wishart
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada.
- Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada.
| |
Collapse
|
23
|
Fontenla S, Rinaldi G, Smircich P, Tort JF. Conservation and diversification of small RNA pathways within flatworms. BMC Evol Biol 2017; 17:215. [PMID: 28893179 PMCID: PMC5594548 DOI: 10.1186/s12862-017-1061-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 09/05/2017] [Indexed: 02/04/2023] Open
Abstract
Background Small non-coding RNAs, including miRNAs, and gene silencing mediated by RNA interference have been described in free-living and parasitic lineages of flatworms, but only few key factors of the small RNA pathways have been exhaustively investigated in a limited number of species. The availability of flatworm draft genomes and predicted proteomes allowed us to perform an extended survey of the genes involved in small non-coding RNA pathways in this phylum. Results Overall, findings show that the small non-coding RNA pathways are conserved in all the analyzed flatworm linages; however notable peculiarities were identified. While Piwi genes are amplified in free-living worms they are completely absent in all parasitic species. Remarkably all flatworms share a specific Argonaute family (FL-Ago) that has been independently amplified in different lineages. Other key factors such as Dicer are also duplicated, with Dicer-2 showing structural differences between trematodes, cestodes and free-living flatworms. Similarly, a very divergent GW182 Argonaute interacting protein was identified in all flatworm linages. Contrasting to this, genes involved in the amplification of the RNAi interfering signal were detected only in the ancestral free living species Macrostomum lignano. We here described all the putative small RNA pathways present in both free living and parasitic flatworm lineages. Conclusion These findings highlight innovations specifically evolved in platyhelminths presumably associated with novel mechanisms of gene expression regulation mediated by small RNA pathways that differ to what has been classically described in model organisms. Understanding these phylum-specific innovations and the differences between free living and parasitic species might provide clues to adaptations to parasitism, and would be relevant for gene-silencing technology development for parasitic flatworms that infect hundreds of million people worldwide. Electronic supplementary material The online version of this article (10.1186/s12862-017-1061-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Santiago Fontenla
- Departamento de Genética, Facultad de Medicina, Universidad de la República (UDELAR), Gral. Flores 2125, CP11800, Montevideo, MVD, Uruguay
| | - Gabriel Rinaldi
- Parasite Genomics, Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Pablo Smircich
- Departamento de Genética, Facultad de Medicina, Universidad de la República (UDELAR), Gral. Flores 2125, CP11800, Montevideo, MVD, Uruguay.,Laboratorio de Interacciones Moleculares, Facultad de Ciencias, Universidad de la República (UdelaR), Montevideo, Uruguay
| | - Jose F Tort
- Departamento de Genética, Facultad de Medicina, Universidad de la República (UDELAR), Gral. Flores 2125, CP11800, Montevideo, MVD, Uruguay.
| |
Collapse
|
24
|
Zhang QL, Zhang L, Zhao TX, Wang J, Zhu QH, Chen JY, Yuan ML. Gene sequence variations and expression patterns of mitochondrial genes are associated with the adaptive evolution of two Gynaephora species (Lepidoptera: Lymantriinae) living in different high-elevation environments. Gene 2017; 610:148-155. [DOI: 10.1016/j.gene.2017.02.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 01/05/2017] [Accepted: 02/06/2017] [Indexed: 01/06/2023]
|
25
|
Brown DK, Tastan Bishop Ö. Role of Structural Bioinformatics in Drug Discovery by Computational SNP Analysis: Analyzing Variation at the Protein Level. Glob Heart 2017; 12:151-161. [PMID: 28302551 DOI: 10.1016/j.gheart.2017.01.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 01/13/2017] [Indexed: 10/20/2022] Open
Abstract
With the completion of the human genome project at the beginning of the 21st century, the biological sciences entered an unprecedented age of data generation, and made its first steps toward an era of personalized medicine. This abundance of sequence data has led to the proliferation of numerous sequence-based techniques for associating variation with disease, such as genome-wide association studies and candidate gene association studies. However, these statistical methods do not provide an understanding of the functional effects of variation. Structure-based drug discovery and design is increasingly incorporating structural bioinformatics techniques to model and analyze protein targets, perform large scale virtual screening to identify hit to lead compounds, and simulate molecular interactions. These techniques are fast, cost-effective, and complement existing experimental techniques such as high throughput sequencing. In this paper, we discuss the contributions of structural bioinformatics to drug discovery, focusing particularly on the analysis of nonsynonymous single nucleotide polymorphisms. We conclude by suggesting a protocol for future analyses of the structural effects of nonsynonymous single nucleotide polymorphisms on proteins and protein complexes.
Collapse
Affiliation(s)
- David K Brown
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, South Africa
| | - Özlem Tastan Bishop
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, South Africa.
| |
Collapse
|
26
|
Hoque MT, Yang Y, Mishra A, Zhou Y. s
DFIRE
: Sequence‐specific statistical energy function for protein structure prediction by decoy selections. J Comput Chem 2016; 37:1119-24. [DOI: 10.1002/jcc.24298] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2015] [Revised: 12/06/2015] [Accepted: 12/13/2015] [Indexed: 12/15/2022]
Affiliation(s)
- Md Tamjidul Hoque
- Computer Science, University of New Orleans, New OrleansLouisiana70148
| | - Yuedong Yang
- Institute for Glycomics and School of Informatics and Communication Technology, Griffith UniversityQueensland4222 Australia
| | - Avdesh Mishra
- Computer Science, University of New Orleans, New OrleansLouisiana70148
| | - Yaoqi Zhou
- Institute for Glycomics and School of Informatics and Communication Technology, Griffith UniversityQueensland4222 Australia
| |
Collapse
|
27
|
Molecular Modeling of Myrosinase from Brassica oleracea: A Structural Investigation of Sinigrin Interaction. Genes (Basel) 2015; 6:1315-29. [PMID: 26703735 PMCID: PMC4690043 DOI: 10.3390/genes6041315] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 12/02/2015] [Accepted: 12/15/2015] [Indexed: 11/16/2022] Open
Abstract
Myrosinase, which is present in cruciferous plant species, plays an important role in the hydrolysis of glycosides such as glucosinolates and is involved in plant defense. Brassicaceae myrosinases are diverse although they share common ancestry, and structural knowledge about myrosinases from cabbage (Brassica oleracea) was needed. To address this, we constructed a three-dimensional model structure of myrosinase based on Sinapis alba structures using Iterative Threading ASSEmbly Refinement server (I-TASSER) webserver, and refined model coordinates were evaluated with ProQ and Verify3D. The resulting model was predicted with β/α fold, ten conserved N-glycosylation sites, and three disulfide bridges. In addition, this model shared features with the known Sinapis alba myrosinase structure. To obtain a better understanding of myrosinase–sinigrin interaction, the refined model was docked using Autodock Vina with crucial key amino acids. The key nucleophile residues GLN207 and GLU427 were found to interact with sinigrin to form a hydrogen bond. Further, 20-ns molecular dynamics simulation was performed to examine myrosinase–sinigrin complex stability, revealing that residue GLU207 maintained its hydrogen bond stability throughout the entire simulation and structural orientation was similar to that of the docked state. This conceptual model should be useful for understanding the structural features of myrosinase and their binding orientation with sinigrin.
Collapse
|
28
|
An empirical energy function for structural assessment of protein transmembrane domains. Biochimie 2015; 115:155-61. [DOI: 10.1016/j.biochi.2015.05.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 05/21/2015] [Indexed: 11/19/2022]
|
29
|
Kozma D, Tusnády GE. TMFoldRec: a statistical potential-based transmembrane protein fold recognition tool. BMC Bioinformatics 2015; 16:201. [PMID: 26123059 PMCID: PMC4486421 DOI: 10.1186/s12859-015-0638-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 06/06/2015] [Indexed: 12/26/2022] Open
Abstract
Background Transmembrane proteins (TMPs) are the key components of signal transduction, cell-cell adhesion and energy and material transport into and out from the cells. For the deep understanding of these processes, structure determination of transmembrane proteins is indispensable. However, due to technical difficulties, only a few transmembrane protein structures have been determined experimentally. Large-scale genomic sequencing provides increasing amounts of sequence information on the proteins and whole proteomes of living organisms resulting in the challenge of bioinformatics; how the structural information should be gained from a sequence. Results Here, we present a novel method, TMFoldRec, for fold prediction of membrane segments in transmembrane proteins. TMFoldRec based on statistical potentials was tested on a benchmark set containing 124 TMP chains from the PDBTM database. Using a 10-fold jackknife method, the native folds were correctly identified in 77 % of the cases. This accuracy overcomes the state-of-the-art methods. In addition, a key feature of TMFoldRec algorithm is the ability to estimate the reliability of the prediction and to decide with an accuracy of 70 %, whether the obtained, lowest energy structure is the native one. Conclusion These results imply that the membrane embedded parts of TMPs dictate the TM structures rather than the soluble parts. Moreover, predictions with reliability scores make in this way our algorithm applicable for proteome-wide analyses. Availability The program is available upon request for academic use. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0638-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dániel Kozma
- "Momentum" Membrane Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, PO Box 7, , H 1518, Budapest, Hungary.
| | - Gábor E Tusnády
- "Momentum" Membrane Protein Bioinformatics Research Group, Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, PO Box 7, , H 1518, Budapest, Hungary.
| |
Collapse
|
30
|
Brylinski M. Is the growth rate of Protein Data Bank sufficient to solve the protein structure prediction problem using template-based modeling? BIO-ALGORITHMS AND MED-SYSTEMS 2015. [DOI: 10.1515/bams-2014-0024] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractThe Protein Data Bank (PDB) undergoes an exponential expansion in terms of the number of macromolecular structures deposited every year. A pivotal question is how this rapid growth of structural information improves the quality of three-dimensional models constructed by contemporary bioinformatics approaches. To address this problem, we performed a retrospective analysis of the structural coverage of a representative set of proteins using remote homology detected by COMPASS and HHpred. We show that the number of proteins whose structures can be confidently predicted increased during a 9-year period between 2005 and 2014 on account of the PDB growth alone. Nevertheless, this encouraging trend slowed down noticeably around the year 2008 and has yielded insignificant improvements ever since. At the current pace, it is unlikely that the protein structure prediction problem will be solved in the near future using existing template-based modeling techniques. Therefore, further advances in experimental structure determination, qualitatively better approaches in fold recognition, and more accurate template-free structure prediction methods are desperately needed.
Collapse
|
31
|
Secondary and Tertiary Structure Prediction of Proteins: A Bioinformatic Approach. COMPLEX SYSTEM MODELLING AND CONTROL THROUGH INTELLIGENT SOFT COMPUTATIONS 2015. [DOI: 10.1007/978-3-319-12883-2_19] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
32
|
|
33
|
Molecular modelling approaches for cystic fibrosis transmembrane conductance regulator studies. Int J Biochem Cell Biol 2014; 52:39-46. [DOI: 10.1016/j.biocel.2014.04.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 04/01/2014] [Accepted: 04/04/2014] [Indexed: 12/30/2022]
|
34
|
Reduction of the number of major representative allergens: from clinical testing to 3-dimensional structures. Mediators Inflamm 2014; 2014:291618. [PMID: 24778467 PMCID: PMC3980986 DOI: 10.1155/2014/291618] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 02/07/2014] [Indexed: 12/02/2022] Open
Abstract
Vast amounts of allergen sequence data have been accumulated, thus complicating the identification of specific allergenic proteins when performing diagnostic allergy tests and immunotherapy. This study aims to rank the importance/potency of the allergens so as to logically reduce the number of allergens and/or allergenic sources. Meta-analysis of 62 allergenic sources used for intradermal testing on 3,335 allergic patients demonstrated that in southern China, mite, sesame, spiny amaranth, Pseudomonas aeruginosa, and house dust account for 88.0% to 100% of the observed positive reactions to the 62 types of allergenic sources tested. The Kolmogorov-Smironov Test results of the website-obtained allergen data and allergen family featured peptides suggested that allergen research in laboratories worldwide has been conducted in parallel on many of the same species. The major allergens were reduced to 21 representative allergens, which were further divided into seven structural classes, each of which contains similar structural components. This study therefore has condensed numerous allergenic sources and major allergens into fewer major representative ones, thus allowing for the use of a smaller number of allergens when conducting comprehensive allergen testing and immunotherapy treatments.
Collapse
|
35
|
Ahmed MH, Kellogg GE, Selley DE, Safo MK, Zhang Y. Predicting the molecular interactions of CRIP1a-cannabinoid 1 receptor with integrated molecular modeling approaches. Bioorg Med Chem Lett 2014; 24:1158-65. [PMID: 24461351 PMCID: PMC4353595 DOI: 10.1016/j.bmcl.2013.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Revised: 12/26/2013] [Accepted: 12/29/2013] [Indexed: 12/14/2022]
Abstract
Cannabinoid receptors are a family of G-protein coupled receptors that are involved in a wide variety of physiological processes and diseases. One of the key regulators that are unique to cannabinoid receptors is the cannabinoid receptor interacting proteins (CRIPs). Among them CRIP1a was found to decrease the constitutive activity of the cannabinoid type-1 receptor (CB1R). The aim of this study is to gain an understanding of the interaction between CRIP1a and CB1R through using different computational techniques. The generated model demonstrated several key putative interactions between CRIP1a and CB1R, including the critical involvement of Lys130 in CRIP1a.
Collapse
Affiliation(s)
- Mostafa H Ahmed
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Glen E Kellogg
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA; Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Dana E Selley
- Department of Pharmacology and Toxicology, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Martin K Safo
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Yan Zhang
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA.
| |
Collapse
|
36
|
Reconstructing protein structures by neural network pairwise interaction fields and iterative decoy set construction. Biomolecules 2014; 4:160-80. [PMID: 24970210 PMCID: PMC4030983 DOI: 10.3390/biom4010160] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Revised: 01/22/2014] [Accepted: 01/30/2014] [Indexed: 11/17/2022] Open
Abstract
Predicting the fold of a protein from its amino acid sequence is one of the grand problems in computational biology. While there has been progress towards a solution, especially when a protein can be modelled based on one or more known structures (templates), in the absence of templates, even the best predictions are generally much less reliable. In this paper, we present an approach for predicting the three-dimensional structure of a protein from the sequence alone, when templates of known structure are not available. This approach relies on a simple reconstruction procedure guided by a novel knowledge-based evaluation function implemented as a class of artificial neural networks that we have designed: Neural Network Pairwise Interaction Fields (NNPIF). This evaluation function takes into account the contextual information for each residue and is trained to identify native-like conformations from non-native-like ones by using large sets of decoys as a training set. The training set is generated and then iteratively expanded during successive folding simulations. As NNPIF are fast at evaluating conformations, thousands of models can be processed in a short amount of time, and clustering techniques can be adopted for model selection. Although the results we present here are very preliminary, we consider them to be promising, with predictions being generated at state-of-the-art levels in some of the cases.
Collapse
|
37
|
|
38
|
Källberg M, Margaryan G, Wang S, Ma J, Xu J. RaptorX server: a resource for template-based protein structure modeling. Methods Mol Biol 2014; 1137:17-27. [PMID: 24573471 DOI: 10.1007/978-1-4939-0366-5_2] [Citation(s) in RCA: 187] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Assigning functional properties to a newly discovered protein is a key challenge in modern biology. To this end, computational modeling of the three-dimensional atomic arrangement of the amino acid chain is often crucial in determining the role of the protein in biological processes. We present a community-wide web-based protocol, RaptorX server ( http://raptorx.uchicago.edu ), for automated protein secondary structure prediction, template-based tertiary structure modeling, and probabilistic alignment sampling.Given a target sequence, RaptorX server is able to detect even remotely related template sequences by means of a novel nonlinear context-specific alignment potential and probabilistic consistency algorithm. Using the protocol presented here it is thus possible to obtain high-quality structural models for many target protein sequences when only distantly related protein domains have experimentally solved structures. At present, RaptorX server can perform secondary and tertiary structure prediction of a 200 amino acid target sequence in approximately 30 min.
Collapse
|
39
|
Brylinski M. The utility of artificially evolved sequences in protein threading and fold recognition. J Theor Biol 2013; 328:77-88. [PMID: 23542050 DOI: 10.1016/j.jtbi.2013.03.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/24/2013] [Accepted: 03/18/2013] [Indexed: 12/23/2022]
Abstract
Template-based protein structure prediction plays an important role in Functional Genomics by providing structural models of gene products, which can be utilized by structure-based approaches to function inference. From a systems level perspective, the high structural coverage of gene products in a given organism is critical. Despite continuous efforts towards the development of more sensitive threading approaches, confident structural models cannot be constructed for a considerable fraction of proteins due to difficulties in recognizing low-sequence identity templates with a similar fold to the target. Here we introduce a new modeling stratagem, which employs a library of synthetic sequences to improve template ranking in fold recognition by sequence profile-based methods. We developed a new method for the optimization of generic protein-like amino acid sequences to stabilize the respective structures using a combined empirical scoring function, which is compatible with these commonly used in protein threading and fold recognition. We show that the artificially evolved sequences, whose average sequence identity to the wild-type sequences is as low as 13.8%, have significant capabilities to recognize the correct structures. Importantly, the quality of the corresponding threading alignments is comparable to these constructed using conventional wild-type approaches (the average TM-score is 0.48 and 0.54, respectively). Fold recognition that uses data fusion to combine ranks calculated for both wild-type and synthetic template libraries systematically improves the detection of structural analogs. Depending on the threading algorithm used, it yields on average 4-16% higher recognition rates than using the wild-type template library alone. Synthetic sequences artificially evolved for the template structures provide an orthogonal source of signal that could be exploited to detect these templates unrecognized by standard modeling techniques. It opens up new directions in the development of more sensitive threading methods with the enhanced capabilities of targeting difficult, midnight zone templates.
Collapse
Affiliation(s)
- Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.
| |
Collapse
|
40
|
Kashefi AH, Meshkin A, Zargoosh M, Zahiri J, Taheri M, Ashtiani S. Scatter-search with support vector machine for prediction of relative solvent accessibility. EXCLI JOURNAL 2013; 12:52-63. [PMID: 26417216 PMCID: PMC4531788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Accepted: 01/09/2013] [Indexed: 11/18/2022]
Abstract
Proteins have vital roles in the living cells. The protein function is almost completely dependent on protein structure. The prediction of relative solvent accessibility gives helpful information for the prediction of tertiary structure of a protein. In recent years several relative solvent accessibility (RSA) prediction methods including those that generate real values and those that predict discrete states have been developed. The proposed method consists of two main steps: the first one, provided subset selection of quantitative features based on selected qualitative features and the second, dedicated to train a model with selected quantitative features for RSA prediction. The results show that the proposed method has an improvement in average prediction accuracy and training time. The proposed method can dig out all the valuable knowledge about which physicochemical features of amino acids are deemed more important in prediction of RSA without human supervision, which is of great importance for biologists and their future researches.
Collapse
Affiliation(s)
- Amir Hosein Kashefi
- Young researchers Club, South Tehran Branch, Islamic Azad University, Tehran, Iran
| | - Alireza Meshkin
- Department of Computer Engineering, Islamic Azad University, Damavand Branch, Damavand, Iran,*To whom correspondence should be addressed: Alireza Meshkin, Department of Computer Engineering, Islamic Azad University, Damavand Branch, Damavand, Iran, E-mail:
| | - Mina Zargoosh
- Department of Computer Engineering, Islamic Azad University, Damavand Branch, Damavand, Iran
| | - Javad Zahiri
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran
| | - Mohsen Taheri
- Department of Computer Engineering, Islamic Azad University, Damavand Branch, Damavand, Iran
| | - Saman Ashtiani
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran
| |
Collapse
|
41
|
eThread: a highly optimized machine learning-based approach to meta-threading and the modeling of protein tertiary structures. PLoS One 2012. [PMID: 23185577 PMCID: PMC3503980 DOI: 10.1371/journal.pone.0050200] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Template-based modeling that employs various meta-threading techniques is currently the most accurate, and consequently the most commonly used, approach for protein structure prediction. Despite the evident progress in this field, accurate structure models cannot be constructed for a significant fraction of gene products, thus the development of new algorithms is required. Here, we describe the development, optimization and large-scale benchmarking of eThread, a highly accurate meta-threading procedure for the identification of structural templates and the construction of corresponding target-to-template alignments. eThread integrates ten state-of-the-art threading/fold recognition algorithms in a local environment and extensively uses various machine learning techniques to carry out fully automated template-based protein structure modeling. Tertiary structure prediction employs two protocols based on widely used modeling algorithms: Modeller and TASSER-Lite. As a part of eThread, we also developed eContact, which is a Bayesian classifier for the prediction of inter-residue contacts and eRank, which effectively ranks generated multiple protein models and provides reliable confidence estimates as structure quality assessment. Excluding closely related templates from the modeling process, eThread generates models, which are correct at the fold level, for >80% of the targets; 40–50% of the constructed models are of a very high quality, which would be considered accurate at the family level. Furthermore, in large-scale benchmarking, we compare the performance of eThread to several alternative methods commonly used in protein structure prediction. Finally, we estimate the upper bound for this type of approach and discuss the directions towards further improvements.
Collapse
|
42
|
Kuhn E. Toward understanding life under subzero conditions: the significance of exploring psychrophilic "cold-shock" proteins. ASTROBIOLOGY 2012; 12:1078-86. [PMID: 23082745 DOI: 10.1089/ast.2012.0858] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Understanding the behavior of proteins under freezing conditions is vital for detecting and locating extraterrestrial life in cold environments, such as those found on Mars and the icy moons of Jupiter and Saturn. This review highlights the importance of studying psychrophilic "cold-shock" proteins, a topic that has yet to be explored. A strategy for analyzing the psychrophilic RNA helicase protein CsdA (Psyc_1082) from Psychrobacter arcticus 273-4 as a key protein for life under freezing temperatures is proposed. The experimental model presented here was developed based on previous data from investigations of Escherichia coli, P. arcticus 273-4, and RNA helicases. P. arcticus 273-4 is considered a model for life in freezing environments. It is capable of growing in temperatures as cold as -10°C by using physiological strategies to survive not only in freezing temperatures but also under low-water-activity and limited-nutrient-availability conditions. The analyses of its genome, transcriptome, and proteome revealed specific adaptations that allow it to inhabit freezing environments by adopting a slow metabolic strategy rather than a cellular dormancy state. During growth at subzero temperatures, P. arcticus 273-4 genes related to energy metabolism and carbon substrate incorporation are downregulated, and genes for maintenance of membranes, cell walls, and nucleic acid motion are upregulated. At -6°C, P. arcticus 273-4 does not upregulate the expression of either RNA or protein chaperones; however, it upregulates the expression of its cold-shock induced DEAD-box RNA helicase protein A (CsdA - Psyc_1082). CsdA - Psyc_1082 was investigated as a key helper protein for sustaining life in subzero conditions. Proving CsdA - Psyc_1082 to be functional as a key protein for life under freezing temperatures may extend the known minimum growth temperature of a mesophilic cell and provide key information about the mechanisms that underlie cold-induced biological systems in icy worlds.
Collapse
Affiliation(s)
- Emanuele Kuhn
- Division of Earth and Ecosystem Sciences, Desert Research Institute, Reno, Nevada 89512, USA.
| |
Collapse
|
43
|
Abstract
A key challenge of modern biology is to uncover the functional role of the protein entities that compose cellular proteomes. To this end, the availability of reliable three-dimensional atomic models of proteins is often crucial. This protocol presents a community-wide web-based method using RaptorX (http://raptorx.uchicago.edu/) for protein secondary structure prediction, template-based tertiary structure modeling, alignment quality assessment and sophisticated probabilistic alignment sampling. RaptorX distinguishes itself from other servers by the quality of the alignment between a target sequence and one or multiple distantly related template proteins (especially those with sparse sequence profiles) and by a novel nonlinear scoring function and a probabilistic-consistency algorithm. Consequently, RaptorX delivers high-quality structural models for many targets with only remote templates. At present, it takes RaptorX ~35 min to finish processing a sequence of 200 amino acids. Since its official release in August 2011, RaptorX has processed ~6,000 sequences submitted by ~1,600 users from around the world.
Collapse
|
44
|
Zhao F, Xu J. A position-specific distance-dependent statistical potential for protein structure and functional study. Structure 2012; 20:1118-26. [PMID: 22608968 PMCID: PMC3372698 DOI: 10.1016/j.str.2012.04.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2012] [Revised: 04/09/2012] [Accepted: 04/10/2012] [Indexed: 10/28/2022]
Abstract
Although studied extensively, designing highly accurate protein energy potential is still challenging. A lot of knowledge-based statistical potentials are derived from the inverse of the Boltzmann law and consist of two major components: observed atomic interacting probability and reference state. These potentials mainly distinguish themselves in the reference state and use a similar simple counting method to estimate the observed probability, which is usually assumed to correlate with only atom types. This article takes a rather different view on the observed probability and parameterizes it by the protein sequence profile context of the atoms and the radius of the gyration, in addition to atom types. Experiments confirm that our position-specific statistical potential outperforms currently the popular ones in several decoy discrimination tests. Our results imply that, in addition to reference state, the observed probability also makes energy potentials different and evolutionary information greatly boost performance of energy potentials.
Collapse
Affiliation(s)
- Feng Zhao
- Toyota Technological Institute at Chicago, Chicago IL, USA 60637
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago IL, USA 60637
| |
Collapse
|
45
|
Joo K, Lee SJ, Lee J. Sann: Solvent accessibility prediction of proteins by nearest neighbor method. Proteins 2012; 80:1791-7. [DOI: 10.1002/prot.24074] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2011] [Revised: 02/08/2012] [Accepted: 02/23/2012] [Indexed: 11/06/2022]
|
46
|
Cheng J, Li J, Wang Z, Eickholt J, Deng X. The MULTICOM toolbox for protein structure prediction. BMC Bioinformatics 2012; 13:65. [PMID: 22545707 PMCID: PMC3495398 DOI: 10.1186/1471-2105-13-65] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2012] [Accepted: 04/30/2012] [Indexed: 12/31/2022] Open
Abstract
Background As genome sequencing is becoming routine in biomedical research, the total number of protein sequences is increasing exponentially, recently reaching over 108 million. However, only a tiny portion of these proteins (i.e. ~75,000 or < 0.07%) have solved tertiary structures determined by experimental techniques. The gap between protein sequence and structure continues to enlarge rapidly as the throughput of genome sequencing techniques is much higher than that of protein structure determination techniques. Computational software tools for predicting protein structure and structural features from protein sequences are crucial to make use of this vast repository of protein resources. Results To meet the need, we have developed a comprehensive MULTICOM toolbox consisting of a set of protein structure and structural feature prediction tools. These tools include secondary structure prediction, solvent accessibility prediction, disorder region prediction, domain boundary prediction, contact map prediction, disulfide bond prediction, beta-sheet topology prediction, fold recognition, multiple template combination and alignment, template-based tertiary structure modeling, protein model quality assessment, and mutation stability prediction. Conclusions These tools have been rigorously tested by many users in the last several years and/or during the last three rounds of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7-9) from 2006 to 2010, achieving state-of-the-art or near performance. In order to facilitate bioinformatics research and technological development in the field, we have made the MULTICOM toolbox freely available as web services and/or software packages for academic use and scientific research. It is available at http://sysbio.rnet.missouri.edu/multicom_toolbox/.
Collapse
Affiliation(s)
- Jianlin Cheng
- Department of Computer Science, University of Missouri-Columbia, Columbia, MO 65211, USA.
| | | | | | | | | |
Collapse
|
47
|
Zhou H, Skolnick J. Template-based protein structure modeling using TASSER(VMT.). Proteins 2011; 80:352-61. [PMID: 22105797 DOI: 10.1002/prot.23183] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2011] [Revised: 08/25/2011] [Accepted: 09/04/2011] [Indexed: 12/29/2022]
Abstract
Template-based protein structure modeling is commonly used for protein structure prediction. Based on the observation that multiple template-based methods often perform better than single template-based methods, we further explore the use of a variable number of multiple templates for a given target in the latest variant of TASSER, TASSER(VMT) . We first develop an algorithm that improves the target-template alignment for a given template. The improved alignment, called the SP(3) alternative alignment, is generated by a parametric alignment method coupled with short TASSER refinement on models selected using knowledge-based scores. The refined top model is then structurally aligned to the template to produce the SP(3) alternative alignment. Templates identified using SP(3) threading are combined with the SP(3) alternative and HHEARCH alignments to provide target alignments to each template. These template models are then grouped into sets containing a variable number of template/alignment combinations. For each set, we run short TASSER simulations to build full-length models. Then, the models from all sets of templates are pooled, and the top 20-50 models selected using FTCOM ranking method. These models are then subjected to a single longer TASSER refinement run for final prediction. We benchmarked our method by comparison with our previously developed approach, pro-sp(3) -TASSER, on a set with 874 easy and 318 hard targets. The average GDT-TS score improvements for the first model are 3.5 and 4.3% for easy and hard targets, respectively. When tested on the 112 CASP9 targets, our method improves the average GDT-TS scores as compared to pro-sp3-TASSER by 8.2 and 9.3% for the 80 easy and 32 hard targets, respectively. It also shows slightly better results than the top ranked CASP9 Zhang-Server, QUARK and HHpredA methods. The program is available for download at http://cssb.biology.gatech.edu/.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318
| | | |
Collapse
|
48
|
Vishnepolsky B, Pirtskhalava M. CONTSOR--a new knowledge-based fold recognition potential, based on side chain orientation and contacts between residue terminal groups. Protein Sci 2011; 21:134-41. [PMID: 22057923 DOI: 10.1002/pro.763] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Revised: 10/18/2011] [Accepted: 10/31/2011] [Indexed: 11/09/2022]
Abstract
Recognizing the structural similarity without significant sequence identity (fold recognition) is an effective method for protein structure prediction. Previously, we developed a fold recognition potential called SORDIS, which incorporated side chain orientation in relation to hydrophobic core centers, distance of the residues from the protein globule center and secondary structure terms. But this potential does not include terms, based on close contacts between residues. In this paper a new fold recognition potential CONTSOR was presented, which based on SORDIS terms and the term, based on contacts between amino acid terminal groups. The performance of this potential was evaluated on SABmark benchmark for alignment accuracy and on SABmark and Lindahl benchmarks for fold recognition. The results show that CONTSOR has the best performance among other potentials on SABmark benchmark both for alignment accuracy and fold recognition and one of the best performances on Lindahl benchmark. CONTSOR software package is available for download at http://www.lifescience.org.ge/downloads/contsor.zip.
Collapse
Affiliation(s)
- Boris Vishnepolsky
- Life Science Research Centre, Laboratory of Bioinformatics, 14 Gotua Street, Tbilisi, Georgia.
| | | |
Collapse
|
49
|
Peng J, Xu J. RaptorX: exploiting structure information for protein alignment by statistical inference. Proteins 2011; 79 Suppl 10:161-71. [PMID: 21987485 DOI: 10.1002/prot.23175] [Citation(s) in RCA: 250] [Impact Index Per Article: 17.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2011] [Revised: 07/25/2011] [Accepted: 08/19/2011] [Indexed: 12/13/2022]
Abstract
This work presents RaptorX, a statistical method for template-based protein modeling that improves alignment accuracy by exploiting structural information in a single or multiple templates. RaptorX consists of three major components: single-template threading, alignment quality prediction, and multiple-template threading. This work summarizes the methods used by RaptorX and presents its CASP9 result analysis, aiming to identify major bottlenecks with RaptorX and template-based modeling and hopefully directions for further study. Our results show that template structural information helps a lot with both single-template and multiple-template protein threading especially when closely-related templates are unavailable, and there is still large room for improvement in both alignment and template selection. The RaptorX web server is available at http://raptorx.uchicago.edu.
Collapse
Affiliation(s)
- Jian Peng
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Avenue, Chicago, IL 60637, USA
| | | |
Collapse
|
50
|
Wang Z, Zhao F, Peng J, Xu J. Protein 8-class secondary structure prediction using conditional neural fields. Proteomics 2011; 11:3786-92. [PMID: 21805636 DOI: 10.1002/pmic.201100196] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2011] [Revised: 06/16/2011] [Accepted: 07/01/2011] [Indexed: 11/10/2022]
Abstract
Compared with the protein 3-class secondary structure (SS) prediction, the 8-class prediction gains less attention and is also much more challenging, especially for proteins with few sequence homologs. This paper presents a new probabilistic method for 8-class SS prediction using conditional neural fields (CNFs), a recently invented probabilistic graphical model. This CNF method not only models the complex relationship between sequence features and SS, but also exploits the interdependency among SS types of adjacent residues. In addition to sequence profiles, our method also makes use of non-evolutionary information for SS prediction. Tested on the CB513 and RS126 data sets, our method achieves Q8 accuracy of 64.9 and 64.7%, respectively, which are much better than the SSpro8 web server (51.0 and 48.0%, respectively). Our method can also be used to predict other structure properties (e.g. solvent accessibility) of a protein or the SS of RNA.
Collapse
Affiliation(s)
- Zhiyong Wang
- Toyota Technological Institute at Chicago, 6045 S Kenwood, Chicago, IL 60637, USA
| | | | | | | |
Collapse
|