1
|
Wang T, Wang L, Zhang X, Shen C, Zhang O, Wang J, Wu J, Jin R, Zhou D, Chen S, Liu L, Wang X, Hsieh CY, Chen G, Pan P, Kang Y, Hou T. Comprehensive assessment of protein loop modeling programs on large-scale datasets: prediction accuracy and efficiency. Brief Bioinform 2023; 25:bbad486. [PMID: 38171930 PMCID: PMC10764206 DOI: 10.1093/bib/bbad486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
Protein loops play a critical role in the dynamics of proteins and are essential for numerous biological functions, and various computational approaches to loop modeling have been proposed over the past decades. However, a comprehensive understanding of the strengths and weaknesses of each method is lacking. In this work, we constructed two high-quality datasets (i.e. the General dataset and the CASP dataset) and systematically evaluated the accuracy and efficiency of 13 commonly used loop modeling approaches from the perspective of loop lengths, protein classes and residue types. The results indicate that the knowledge-based method FREAD generally outperforms the other tested programs in most cases, but encountered challenges when predicting loops longer than 15 and 30 residues on the CASP and General datasets, respectively. The ab initio method Rosetta NGK demonstrated exceptional modeling accuracy for short loops with four to eight residues and achieved the highest success rate on the CASP dataset. The well-known AlphaFold2 and RoseTTAFold require more resources for better performance, but they exhibit promise for predicting loops longer than 16 and 30 residues in the CASP and General datasets. These observations can provide valuable insights for selecting suitable methods for specific loop modeling tasks and contribute to future advancements in the field.
Collapse
Affiliation(s)
- Tianyue Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Langcheng Wang
- Department of Pathology, New York University Medical Center, 550 First Avenue, New York, NY 10016, USA
| | - Xujun Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Odin Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jialu Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Ruofan Jin
- College of Life Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Donghao Zhou
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China
| | - Shicheng Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Shenzhen 518129, Guangdong, China
| | - Xiaorui Wang
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macao, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Guangyong Chen
- Zhejiang Lab, Zhejiang University, Hangzhou 311121, Zhejiang, China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
2
|
Liu J, Amaral LAN, Keten S. A new approach for extracting information from protein dynamics. Proteins 2023; 91:183-195. [PMID: 36094321 PMCID: PMC9844508 DOI: 10.1002/prot.26421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 08/25/2022] [Accepted: 09/06/2022] [Indexed: 01/19/2023]
Abstract
Increased ability to predict protein structures is moving research focus towards understanding protein dynamics. A promising approach is to represent protein dynamics through networks and take advantage of well-developed methods from network science. Most studies build protein dynamics networks from correlation measures, an approach that only works under very specific conditions, instead of the more robust inverse approach. Thus, we apply the inverse approach to the dynamics of protein dihedral angles, a system of internal coordinates, to avoid structural alignment. Using the well-characterized adhesion protein, FimH, we show that our method identifies networks that are physically interpretable, robust, and relevant to the allosteric pathway sites. We further use our approach to detect dynamical differences, despite structural similarity, for Siglec-8 in the immune system, and the SARS-CoV-2 spike protein. Our study demonstrates that using the inverse approach to extract a network from protein dynamics yields important biophysical insights.
Collapse
Affiliation(s)
- Jenny Liu
- Department of Mechanical Engineering, Northwestern University
| | - Luís A. N. Amaral
- Department of Chemical and Biological Engineering, Northwestern University
| | - Sinan Keten
- Department of Mechanical Engineering, Northwestern University
| |
Collapse
|
3
|
Liu J, Amaral LAN, Keten S. A new approach for extracting information from protein dynamics. ARXIV 2022:arXiv:2203.08387v1. [PMID: 35313540 PMCID: PMC8936122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Increased ability to predict protein structures is moving research focus towards understanding protein dynamics. A promising approach is to represent protein dynamics through networks and take advantage of well-developed methods from network science. Most studies build protein dynamics networks from correlation measures, an approach that only works under very specific conditions, instead of the more robust inverse approach. Thus, we apply the inverse approach to the dynamics of protein dihedral angles, a system of internal coordinates, to avoid structural alignment. Using the well-characterized adhesion protein, FimH, we show that our method identifies networks that are physically interpretable, robust, and relevant to the allosteric pathway sites. We further use our approach to detect dynamical differences, despite structural similarity, for Siglec-8 in the immune system, and the SARS-CoV-2 spike protein. Our study demonstrates that using the inverse approach to extract a network from protein dynamics yields important biophysical insights.
Collapse
Affiliation(s)
- Jenny Liu
- Department of Mechanical Engineering, Northwestern University
| | - Luís A N Amaral
- Department of Chemical and Biological Engineering, Northwestern University
| | - Sinan Keten
- Department of Mechanical Engineering, Northwestern University
| |
Collapse
|
4
|
O'Donoghue SI. Grand Challenges in Bioinformatics Data Visualization. FRONTIERS IN BIOINFORMATICS 2021; 1:669186. [PMID: 36303723 PMCID: PMC9581027 DOI: 10.3389/fbinf.2021.669186] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 04/30/2021] [Indexed: 01/17/2023] Open
Affiliation(s)
- Seán I. O'Donoghue
- Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kensington, NSW, Australia
- CSIRO Data61, Eveleigh, NSW, Australia
- *Correspondence: Seán I. O'Donoghue,
| |
Collapse
|
5
|
Hospital A, Battistini F, Soliva R, Gelpí JL, Orozco M. Surviving the deluge of biosimulation data. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1449] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Adam Hospital
- Institut de Recerca Biomèdica, IRB Barcelona, The Barcelona Institute of Science and Technology Joint IRB‐BSC Program in Computational Biology Barcelona Spain
| | - Federica Battistini
- Institut de Recerca Biomèdica, IRB Barcelona, The Barcelona Institute of Science and Technology Joint IRB‐BSC Program in Computational Biology Barcelona Spain
| | | | - Josep Lluis Gelpí
- Barcelona Supercomputing Center Join IRB‐BSC Program in Computational Biology Barcelona Spain
- Departament de Bioquímica i Biomedicina Universitat de Barcelona Barcelona Spain
| | - Modesto Orozco
- Institut de Recerca Biomèdica, IRB Barcelona, The Barcelona Institute of Science and Technology Joint IRB‐BSC Program in Computational Biology Barcelona Spain
- Departament de Bioquímica i Biomedicina Universitat de Barcelona Barcelona Spain
| |
Collapse
|
6
|
O'Donoghue SI, Baldi BF, Clark SJ, Darling AE, Hogan JM, Kaur S, Maier-Hein L, McCarthy DJ, Moore WJ, Stenau E, Swedlow JR, Vuong J, Procter JB. Visualization of Biomedical Data. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013424] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The rapid increase in volume and complexity of biomedical data requires changes in research, communication, and clinical practices. This includes learning how to effectively integrate automated analysis with high–data density visualizations that clearly express complex phenomena. In this review, we summarize key principles and resources from data visualization research that help address this difficult challenge. We then survey how visualization is being used in a selection of emerging biomedical research areas, including three-dimensional genomics, single-cell RNA sequencing (RNA-seq), the protein structure universe, phosphoproteomics, augmented reality–assisted surgery, and metagenomics. While specific research areas need highly tailored visualizations, there are common challenges that can be addressed with general methods and strategies. Also common, however, are poor visualization practices. We outline ongoing initiatives aimed at improving visualization practices in biomedical research via better tools, peer-to-peer learning, and interdisciplinary collaboration with computer scientists, science communicators, and graphic designers. These changes are revolutionizing how we see and think about our data.
Collapse
Affiliation(s)
- Seán I. O'Donoghue
- Data61, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Eveleigh NSW 2015, Australia
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW), Kensington NSW 2033, Australia
| | - Benedetta Frida Baldi
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
| | - Susan J. Clark
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
| | - Aaron E. Darling
- The ithree Institute, University of Technology Sydney, Ultimo NSW 2007, Australia
| | - James M. Hogan
- School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane QLD, 4000, Australia
| | - Sandeep Kaur
- School of Computer Science and Engineering, University of New South Wales (UNSW), Kensington NSW 2033, Australia
| | - Lena Maier-Hein
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Davis J. McCarthy
- European Bioinformatics Institute (EBI), European Molecular Biology Laboratory (EMBL), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- St. Vincent's Institute of Medical Research, Fitzroy VIC 3065, Australia
| | - William J. Moore
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - Esther Stenau
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Jason R. Swedlow
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - Jenny Vuong
- Data61, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Eveleigh NSW 2015, Australia
| | - James B. Procter
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| |
Collapse
|
7
|
Childers MC, Towse CL, Daggett V. Molecular dynamics-derived rotamer libraries for d-amino acids within homochiral and heterochiral polypeptides. Protein Eng Des Sel 2018; 31:191-204. [PMID: 29992252 PMCID: PMC6205366 DOI: 10.1093/protein/gzy016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 06/15/2018] [Indexed: 01/06/2023] Open
Abstract
Computational resources have contributed to the design and engineering of novel proteins by integrating genomic, structural and dynamic aspects of proteins. Non-canonical amino acids, such as d-amino acids, expand the available sequence space for designing and engineering proteins; however, the rotamer libraries for d-amino acids are usually constructed as the mirror images of l-amino acid rotamer libraries, an assumption that has not been tested. To this end, we have performed molecular dynamics (MD) simulations of model host-guest peptide systems containing d-amino acids. Our simulations systematically address the applicability of the mirror image convention as well as the effects of neighboring residue chirality. Rotamer libraries derived from these systems provide realistic rotamer distributions suitable for use in both rational and computational design workflows. Our simulations also address the impact of chirality on the intrinsic conformational preferences of amino acids, providing fundamental insights into the relationship between chirality and biomolecular dynamics. While d-amino acids are rare in naturally occurring proteins, they are used in designed proteins to stabilize a desired conformation, increase bioavailability or confer favorable biochemical and physical attributes. Here, we present d-amino acid rotamer libraries derived from MD simulations of alanine-based host-guest pentapeptides and show how certain residues can deviate from mirror image symmetry. Our simulations directly model d-amino acids as guest residues within the chiral l-Ala and d-Ala pentapeptide series to explicitly incorporate any contributions resulting from the chiralities of neighboring residues.
Collapse
Affiliation(s)
| | - Clare-Louise Towse
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Valerie Daggett
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
8
|
Childers MC, Towse CL, Daggett V. The effect of chirality and steric hindrance on intrinsic backbone conformational propensities: tools for protein design. Protein Eng Des Sel 2016; 29:271-80. [PMID: 27284086 DOI: 10.1093/protein/gzw023] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 05/11/2016] [Indexed: 01/30/2023] Open
Abstract
The conformational propensities of amino acids are an amalgamation of sequence effects, environmental effects and underlying intrinsic behavior. Many have attempted to investigate neighboring residue effects to aid in our understanding of protein folding and improve structure prediction efforts, especially with respect to difficult to characterize states, such as disordered or unfolded states. Host-guest peptide series are a useful tool in examining the propensities of the amino acids free from the surrounding protein structure. Here, we compare the distributions of the backbone dihedral angles (φ/ψ) of the 20 proteogenic amino acids in two different sequence contexts using the AAXAA and GGXGG host-guest pentapeptide series. We further examine their intrinsic behaviors across three environmental contexts: water at 298 K, water at 498 K, and 8 M urea at 298 K. The GGXGG systems provide the intrinsic amino acid propensities devoid of any conformational context. The alanine residues in the AAXAA series enforce backbone chirality, thereby providing a model of the intrinsic behavior of amino acids in a protein chain. Our results show modest differences in φ/ψ distributions due to the steric constraints of the Ala side chains, the magnitudes of which are dependent on the denaturing conditions. One of the strongest factors modulating φ/ψ distributions was the protonation of titratable side chains, and the largest differences observed were in the amino acid propensities for the rarely sampled αL region.
Collapse
Affiliation(s)
| | - Clare-Louise Towse
- Department of Bioengineering, University of Washington, Seattle, WA 98195-5013, USA
| | - Valerie Daggett
- Department of Bioengineering, University of Washington, Seattle, WA 98195-5013, USA
| |
Collapse
|