Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ma J, Wang S, Wang Z, Xu J. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning. Bioinformatics 2015;31:3506-13. [PMID: 26275894 DOI: 10.1093/bioinformatics/btv472] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 08/08/2015] [Indexed: 02/07/2023] Open

For:	Ma J, Wang S, Wang Z, Xu J. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning. Bioinformatics 2015;31:3506-13. [PMID: 26275894 DOI: 10.1093/bioinformatics/btv472] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 08/08/2015] [Indexed: 02/07/2023] Open

Number

Cited by Other Article(s)

Zheng W. Predicting hotspots for disease-causing single nucleotide variants using sequences-based coevolution, network analysis, and machine learning. PLoS One 2024;19:e0302504. [PMID: 38743747 PMCID: PMC11093321 DOI: 10.1371/journal.pone.0302504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 04/05/2024] [Indexed: 05/16/2024] Open

Wang X, Li A, Li X, Cui H. Empowering Protein Engineering through Recombination of Beneficial Substitutions. Chemistry 2024;30:e202303889. [PMID: 38288640 DOI: 10.1002/chem.202303889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Indexed: 02/24/2024]

Xu M, Abdullah NA, Md Sabri AQ. A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data. Comput Biol Chem 2024;108:107997. [PMID: 38154318 DOI: 10.1016/j.compbiolchem.2023.107997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 11/03/2023] [Accepted: 12/03/2023] [Indexed: 12/30/2023]

Zhao C, Wang S. AttCON: With better MSAs and attention mechanism for accurate protein contact map prediction. Comput Biol Med 2024;169:107822. [PMID: 38091726 DOI: 10.1016/j.compbiomed.2023.107822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/19/2023] [Accepted: 12/04/2023] [Indexed: 02/08/2024]

Montezano D, Bernstein R, Copeland MM, Slusky JSG. General features of transmembrane beta barrels from a large database. Proc Natl Acad Sci U S A 2023;120:e2220762120. [PMID: 37432995 PMCID: PMC10629564 DOI: 10.1073/pnas.2220762120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Accepted: 06/03/2023] [Indexed: 07/13/2023] Open

Meyer L, Crocoll C, Halkier BA, Mirza OA, Xu D. Identification of key amino acid residues in AtUMAMIT29 for transport of glucosinolates. FRONTIERS IN PLANT SCIENCE 2023;14:1219783. [PMID: 37528977 PMCID: PMC10388549 DOI: 10.3389/fpls.2023.1219783] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 06/08/2023] [Indexed: 08/03/2023]

The SspB adaptor drives structural changes in the AAA+ ClpXP protease during ssrA-tagged substrate delivery. Proc Natl Acad Sci U S A 2023;120:e2219044120. [PMID: 36730206 PMCID: PMC9963277 DOI: 10.1073/pnas.2219044120] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open

Newman KE, Tindall SN, Mader SL, Khalid S, Thomas GH, Van Der Woude MW. A novel fold for acyltransferase-3 (AT3) proteins provides a framework for transmembrane acyl-group transfer. eLife 2023;12:e81547. [PMID: 36630168 PMCID: PMC9833829 DOI: 10.7554/elife.81547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 12/04/2022] [Indexed: 01/12/2023] Open

Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Weyer R, Hellmann MJ, Hamer-Timmermann SN, Singh R, Moerschbacher BM. Customized chitooligosaccharide production-controlling their length via engineering of rhizobial chitin synthases and the choice of expression system. Front Bioeng Biotechnol 2022;10:1073447. [PMID: 36588959 PMCID: PMC9795070 DOI: 10.3389/fbioe.2022.1073447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 11/28/2022] [Indexed: 12/15/2022] Open

Newton MH, Zaman R, Mataeimoghadam F, Rahman J, Sattar A. Constraint Guided Beta-Sheet Refinement for Protein Structure Prediction. Comput Biol Chem 2022;101:107773. [DOI: 10.1016/j.compbiolchem.2022.107773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/15/2022] [Accepted: 09/16/2022] [Indexed: 11/16/2022]

An J, Weng X. Collectively encoding protein properties enriches protein language models. BMC Bioinformatics 2022;23:467. [DOI: 10.1186/s12859-022-05031-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 10/31/2022] [Indexed: 11/10/2022] Open

Gill ML. The rise of the machines in chemistry. MAGNETIC RESONANCE IN CHEMISTRY : MRC 2022;60:1044-1051. [PMID: 35976263 DOI: 10.1002/mrc.5304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 08/07/2022] [Accepted: 08/09/2022] [Indexed: 06/15/2023]

Omoboyede V, Ibrahim O, Umar HI, Bello T, Adedeji AA, Khalid A, Fayojegbe ES, Ayomide AB, Chukwuemeka PO. Designing a vaccine-based therapy against Epstein-Barr virus-associated tumors using immunoinformatics approach. Comput Biol Med 2022;150:106128. [PMID: 36179514 DOI: 10.1016/j.compbiomed.2022.106128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 08/05/2022] [Accepted: 09/18/2022] [Indexed: 11/26/2022]

Affiliation(s)

Victor Omoboyede Department of Biochemistry, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria; Computer Aided Therapeutics Laboratory (CATL) Group, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria; Computer Aided Therapeutics and Drug Design (CATDD) Group, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria.
Ochapa Ibrahim Computer Aided Therapeutics and Drug Design (CATDD) Group, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria; Faculty of Pharmaceutical Sciences, Ahmadu Bello University, Zaria, Kaduna State, Nigeria.
Haruna Isiyaku Umar Department of Biochemistry, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria; Computer Aided Therapeutics and Drug Design (CATDD) Group, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria.
Taye Bello Department of Medical Rehabilitation, College of Health Sciences, Obafemi Awolowo University, Nigeria.
Ayodeji Adeola Adedeji Department of Biochemistry, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria.
Aqsa Khalid Research Center for Modelling and Simulation (RCMS), National University of Science and Technology (NUST), Islamabad, Pakistan.
Emmanuel Sunday Fayojegbe Department of Microbiology, Osun State University, Osogbo, Nigeria.
Adunola Blessing Ayomide Computer Aided Therapeutics Laboratory (CATL) Group, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria; Department of Biotechnology, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria.
Prosper Obed Chukwuemeka Computer Aided Therapeutics Laboratory (CATL) Group, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria; Computer Aided Therapeutics and Drug Design (CATDD) Group, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria; Department of Biotechnology, School of Sciences (SOS), Federal University of Technology Akure, P.M.B 704, Akure, Nigeria.

Collapse

Yue R, Dutta A. Computational systems biology in disease modeling and control, review and perspectives. NPJ Syst Biol Appl 2022;8:37. [PMID: 36192551 PMCID: PMC9528884 DOI: 10.1038/s41540-022-00247-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Accepted: 09/05/2022] [Indexed: 02/02/2023] Open

Behkamal B, Naghibzadeh M, Pagnani A, Saberi MR, Al Nasr K. LPTD: a novel linear programming-based topology determination method for cryo-EM maps. Bioinformatics 2022;38:2734-2741. [PMID: 35561171 PMCID: PMC9306757 DOI: 10.1093/bioinformatics/btac170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2021] [Revised: 03/01/2022] [Accepted: 03/18/2022] [Indexed: 02/03/2023] Open

Abstract

SUMMARY

Topology determination is one of the most important intermediate steps toward building the atomic structure of proteins from their medium-resolution cryo-electron microscopy (cryo-EM) map. The main goal in the topology determination is to identify correct matches (i.e. assignment and direction) between secondary structure elements (SSEs) (α-helices and β-sheets) detected in a protein sequence and cryo-EM density map. Despite many recent advances in molecular biology technologies, the problem remains a challenging issue. To overcome the problem, this article proposes a linear programming-based topology determination (LPTD) method to solve the secondary structure topology problem in three-dimensional geometrical space. Through modeling of the protein's sequence with the aid of extracting highly reliable features and a distance-based scoring function, the secondary structure matching problem is transformed into a complete weighted bipartite graph matching problem. Subsequently, an algorithm based on linear programming is developed as a decision-making strategy to extract the true topology (native topology) between all possible topologies. The proposed automatic framework is verified using 12 experimental and 15 simulated α-β proteins. Results demonstrate that LPTD is highly efficient and extremely fast in such a way that for 77% of cases in the dataset, the native topology has been detected in the first rank topology in <2 s. Besides, this method is able to successfully handle large complex proteins with as many as 65 SSEs. Such a large number of SSEs have never been solved with current tools/methods.

AVAILABILITY AND IMPLEMENTATION

The LPTD package (source code and data) is publicly available at https://github.com/B-Behkamal/LPTD. Moreover, two test samples as well as the instruction of utilizing the graphical user interface have been provided in the shared readme file.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Newton MAH, Rahman J, Zaman R, Sattar A. Enhancing Protein Contact Map Prediction Accuracy via Ensembles of Inter-Residue Distance Predictors. Comput Biol Chem 2022;99:107700. [DOI: 10.1016/j.compbiolchem.2022.107700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 11/03/2022]

Santra S, Jana M. Predicting the evolution of number of native contacts of a small protein by using deep learning approach. Comput Biol Chem 2022;97:107625. [DOI: 10.1016/j.compbiolchem.2022.107625] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 01/07/2022] [Accepted: 01/09/2022] [Indexed: 11/28/2022]

Gaur A, Jindal Y, Singh V, Tiwari R, Kumar D, Kaushik D, Singh J, Narwal S, Jaiswal S, Iquebal MA, Angadi UB, Singh G, Rai A, Singh GP, Sheoran S. GWAS to Identify Novel QTNs for WSCs Accumulation in Wheat Peduncle Under Different Water Regimes. FRONTIERS IN PLANT SCIENCE 2022;13:825687. [PMID: 35310635 PMCID: PMC8928439 DOI: 10.3389/fpls.2022.825687] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 01/27/2022] [Indexed: 05/27/2023]

Abstract

Water-soluble carbohydrates (WSCs) play a vital role in water stress avoidance and buffering wheat grain yield. However, the genetic architecture of stem WSCs' accumulation is partially understood, and few candidate genes are known. This study utilizes the compressed mixed linear model-based genome wide association study (GWAS) and heuristic post GWAS analyses to identify causative quantitative trait nucleotides (QTNs) and candidate genes for stem WSCs' content at 15 days after anthesis under different water regimes (irrigated, rainfed, and drought). Glucose, fructose, sucrose, fructans, total non-structural carbohydrates (the sum of individual sugars), total WSCs (anthrone based) quantified in the peduncle of 301 bread wheat genotypes under multiple environments (E01-E08) pertaining different water regimes, and 14,571 SNPs from "35K Axiom Wheat Breeders" Array were used for analysis. As a result, 570 significant nucleotide trait associations were identified on all chromosomes except for 4D, of which 163 were considered stable. A total of 112 quantitative trait nucleotide regions (QNRs) were identified of which 47 were presumable novel. QNRs qWSC-3B.2 and qWSC-7A.2 were identified as the hotspots. Post GWAS integration of multiple data resources prioritized 208 putative candidate genes delimited into 64 QNRs, which can be critical in understanding the genetic architecture of stem WSCs accumulation in wheat under optimum and water-stressed environments. At least 19 stable QTNs were found associated with 24 prioritized candidate genes. Clusters of fructans metabolic genes reported in the QNRs qWSC-4A.2 and qWSC-7A.2. These genes can be utilized to bring an optimum combination of various fructans metabolic genes to improve the accumulation and remobilization of stem WSCs and water stress tolerance. These results will further strengthen wheat breeding programs targeting sustainable wheat production under limited water conditions.

Collapse

Tran NH, Xu J, Li M. A tale of solving two computational challenges in protein science: neoantigen prediction and protein structure prediction. Brief Bioinform 2022;23:bbab493. [PMID: 34891158 PMCID: PMC8769896 DOI: 10.1093/bib/bbab493] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/11/2021] [Accepted: 10/26/2021] [Indexed: 12/30/2022] Open

Peng CX, Zhou XG, Zhang GJ. De novo Protein Structure Prediction by Coupling Contact With Distance Profile. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:395-406. [PMID: 32750861 DOI: 10.1109/tcbb.2020.3000758] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Rahbar MR, Jahangiri A, Khalili S, Zarei M, Mehrabani-Zeinabad K, Khalesi B, Pourzardosht N, Hessami A, Nezafat N, Sadraei S, Negahdaripour M. Hotspots for mutations in the SARS-CoV-2 spike glycoprotein: a correspondence analysis. Sci Rep 2021;11:23622. [PMID: 34880279 PMCID: PMC8654821 DOI: 10.1038/s41598-021-01655-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Accepted: 11/01/2021] [Indexed: 12/19/2022] Open

Li Y, Zhang C, Zheng W, Zhou X, Bell EW, Yu DJ, Zhang Y. Protein inter-residue contact and distance prediction by coupling complementary coevolution features with deep residual networks in CASP14. Proteins 2021;89:1911-1921. [PMID: 34382712 PMCID: PMC8616805 DOI: 10.1002/prot.26211] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 07/24/2021] [Accepted: 08/05/2021] [Indexed: 01/12/2023]

Hou M, Peng C, Zhou X, Zhang B, Zhang G. Multi contact-based folding method for de novo protein structure prediction. Brief Bioinform 2021;23:6445108. [PMID: 34849573 DOI: 10.1093/bib/bbab463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/21/2021] [Accepted: 10/10/2021] [Indexed: 11/12/2022] Open

Behkamal B, Naghibzadeh M, Saberi MR, Tehranizadeh ZA, Pagnani A, Al Nasr K. Three-Dimensional Graph Matching to Identify Secondary Structure Correspondence of Medium-Resolution Cryo-EM Density Maps. Biomolecules 2021;11:1773. [PMID: 34944417 PMCID: PMC8698881 DOI: 10.3390/biom11121773] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 11/18/2021] [Accepted: 11/20/2021] [Indexed: 01/15/2023] Open

Abstract

Cryo-electron microscopy (cryo-EM) is a structural technique that has played a significant role in protein structure determination in recent years. Compared to the traditional methods of X-ray crystallography and NMR spectroscopy, cryo-EM is capable of producing images of much larger protein complexes. However, cryo-EM reconstructions are limited to medium-resolution (~4-10 Å) for some cases. At this resolution range, a cryo-EM density map can hardly be used to directly determine the structure of proteins at atomic level resolutions, or even at their amino acid residue backbones. At such a resolution, only the position and orientation of secondary structure elements (SSEs) such as α-helices and β-sheets are observable. Consequently, finding the mapping of the secondary structures of the modeled structure (SSEs-A) to the cryo-EM map (SSEs-C) is one of the primary concerns in cryo-EM modeling. To address this issue, this study proposes a novel automatic computational method to identify SSEs correspondence in three-dimensional (3D) space. Initially, through a modeling of the target sequence with the aid of extracting highly reliable features from a generated 3D model and map, the SSEs matching problem is formulated as a 3D vector matching problem. Afterward, the 3D vector matching problem is transformed into a 3D graph matching problem. Finally, a similarity-based voting algorithm combined with the principle of least conflict (PLC) concept is developed to obtain the SSEs correspondence. To evaluate the accuracy of the method, a testing set of 25 experimental and simulated maps with a maximum of 65 SSEs is selected. Comparative studies are also conducted to demonstrate the superiority of the proposed method over some state-of-the-art techniques. The results demonstrate that the method is efficient, robust, and works well in the presence of errors in the predicted secondary structures of the cryo-EM images.

Collapse

Wei H, Zhao Z, Luo R. Machine-Learned Molecular Surface and Its Application to Implicit Solvent Simulations. J Chem Theory Comput 2021;17:6214-6224. [PMID: 34516109 DOI: 10.1021/acs.jctc.1c00492] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Abstract

Implicit solvent models, such as Poisson-Boltzmann models, play important roles in computational studies of biomolecules. A vital step in almost all implicit solvent models is to determine the solvent-solute interface, and the solvent excluded surface (SES) is the most widely used interface definition in these models. However, classical algorithms used for computing SES are geometry-based, so that they are neither suitable for parallel implementations nor convenient for obtaining surface derivatives. To address the limitations, we explored a machine learning strategy to obtain a level set formulation for the SES. The training process was conducted in three steps, eventually leading to a model with over 95% agreement with the classical SES. Visualization of tested molecular surfaces shows that the machine-learned SES overlaps with the classical SES in almost all situations. Further analyses show that the machine-learned SES is incredibly stable in terms of rotational variation of tested molecules. Our timing analysis shows that the machine-learned SES is roughly 2.5 times as efficient as the classical SES routine implemented in Amber/PBSA on a tested central processing unit (CPU) platform. We expect further performance gain on massively parallel platforms such as graphics processing units (GPUs) given the ease in converting the machine-learned SES to a parallel procedure. We also implemented the machine-learned SES into the Amber/PBSA program to study its performance on reaction field energy calculation. The analysis shows that the two sets of reaction field energies are highly consistent with a 1% deviation on average. Given its level set formulation, we expect the machine-learned SES to be applied in molecular simulations that require either surface derivatives or high efficiency on parallel computing platforms.

Collapse

Laine E, Eismann S, Elofsson A, Grudinin S. Protein sequence-to-structure learning: Is this the end(-to-end revolution)? Proteins 2021;89:1770-1786. [PMID: 34519095 DOI: 10.1002/prot.26235] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/16/2021] [Accepted: 09/03/2021] [Indexed: 01/08/2023]

Hong Z, Liu J, Chen Y. An interpretable machine learning method for homo-trimeric protein interface residue-residue interaction prediction. Biophys Chem 2021;278:106666. [PMID: 34418678 DOI: 10.1016/j.bpc.2021.106666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 08/09/2021] [Accepted: 08/09/2021] [Indexed: 12/29/2022]

Abstract

Protein-protein interaction plays an important role in life activities. A more fine-grained analysis, such as residues and atoms level, will better benefit us to understand the mechanism for inter-protein interaction and drug design. The development of efficient computational methods to reduce trials and errors, as well as assisting experimental researchers to determine the complex structure are some of the ongoing studies in the field. The research of trimer protein interface, especially homotrimer, has been rarely studied. In this paper, we proposed an interpretable machine learning method for homo-trimeric protein interface residue pairs prediction. The structure, sequence, and physicochemical information are intergraded as feature input fed to model for training. Graph model is utilized to present spatial information for intra-protein. Matrix factorization captures the different features' interactions. Kernel function is designed to auto-acquire the adjacent information of our target residue pairs. The accuracy rate achieves 54.5% in an independent test set. Sequence and structure alignment exhibit the ability of model self-study. Our model indicates the biological significance between sequence and structure, and could be auxiliary for reducing trials and errors in the fields of protein complex determination and protein-protein docking, etc. SIGNIFICANCE: Protein complex structures are significant for understanding protein function and promising functional protein design. With data increasing, some computational tools have been developed for protein complex residue contact prediction, which is one of the most significant steps for complex structure prediction. But for homo-trimeric protein, the sequence-based deep learning predictors are infeasible for homologous sequences, and the algorithm black box prevents us from understanding of each step operation. In this way, we propose an interpreting machine learning method for homo-trimeric protein interface residue-residue interaction prediction, and the predictor shows a good performance. Our work provides a computational auxiliary way for determining the homo-trimeric proteins interface residue pairs which will be further verified by wet experiments, and and gives a hand for the downstream works, such as protein-protein docking, protein complex structure prediction and drug design.

Collapse

Pearce R, Zhang Y. Toward the solution of the protein structure prediction problem. J Biol Chem 2021;297:100870. [PMID: 34119522 PMCID: PMC8254035 DOI: 10.1016/j.jbc.2021.100870] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/07/2021] [Accepted: 06/09/2021] [Indexed: 11/20/2022] Open

Reza MS, Zhang H, Hossain MT, Jin L, Feng S, Wei Y. COMTOP: Protein Residue-Residue Contact Prediction through Mixed Integer Linear Optimization. MEMBRANES 2021;11:membranes11070503. [PMID: 34209399 PMCID: PMC8305966 DOI: 10.3390/membranes11070503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]

Abstract

Protein contact prediction helps reconstruct the tertiary structure that greatly determines a protein’s function; therefore, contact prediction from the sequence is an important problem. Recently there has been exciting progress on this problem, but many of the existing methods are still low quality of prediction accuracy. In this paper, we present a new mixed integer linear programming (MILP)-based consensus method: a Consensus scheme based On a Mixed integer linear opTimization method for prOtein contact Prediction (COMTOP). The MILP-based consensus method combines the strengths of seven selected protein contact prediction methods, including CCMpred, EVfold, DeepCov, NNcon, PconsC4, plmDCA, and PSICOV, by optimizing the number of correctly predicted contacts and achieving a better prediction accuracy. The proposed hybrid protein residue–residue contact prediction scheme was tested in four independent test sets. For 239 highly non-redundant proteins, the method showed a prediction accuracy of 59.68%, 70.79%, 78.86%, 89.04%, 94.51%, and 97.35% for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 contacts, respectively. When tested on the CASP13 and CASP14 test sets, the proposed method obtained accuracies of 75.91% and 77.49% for top-L/5 predictions, respectively. COMTOP was further tested on 57 non-redundant α-helical transmembrane proteins and achieved prediction accuracies of 64.34% and 73.91% for top-L/2 and top-L/5 predictions, respectively. For all test datasets, the improvement of COMTOP in accuracy over the seven individual methods increased with the increasing number of predicted contacts. For example, COMTOP performed much better for large number of contact predictions (such as top-5L and top-3L) than for small number of contact predictions such as top-L/2 and top-L/5. The results and analysis demonstrate that COMTOP can significantly improve the performance of the individual methods; therefore, COMTOP is more robust against different types of test sets. COMTOP also showed better/comparable predictions when compared with the state-of-the-art predictors.

Collapse

Affiliation(s)

Md. Selim Reza School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.) Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
Huiling Zhang School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.) Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
Md. Tofazzal Hossain School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.) Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
Langxi Jin Department of Computer Science and Technology, School of Computer Science and Technology, Harbin University of Science and Technology, 52 Xuefu Road, Nangang District, Harbin 150080, China;
Shengzhong Feng Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
Yanjie Wei School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.) Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China; Correspondence:

Collapse

Pérez-Vargas J, Teppa E, Amirache F, Boson B, Pereira de Oliveira R, Combet C, Böckmann A, Fusil F, Freitas N, Carbone A, Cosset FL. A fusion peptide in preS1 and the human protein disulfide isomerase ERp57 are involved in hepatitis B virus membrane fusion process. eLife 2021;10:64507. [PMID: 34190687 PMCID: PMC8282342 DOI: 10.7554/elife.64507] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 06/29/2021] [Indexed: 12/13/2022] Open

Affiliation(s)

Jimena Pérez-Vargas CIRI - Centre International de Recherche en Infectiologie, Univ Lyon, Université Claude Bernard Lyon 1, Inserm, U1111, CNRS, UMR5308, ENS Lyon, Lyon, France
Elin Teppa Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, Paris, France.,Sorbonne Université, Institut des Sciences du Calcul et des Données (ISCD), Paris, France
Fouzia Amirache CIRI - Centre International de Recherche en Infectiologie, Univ Lyon, Université Claude Bernard Lyon 1, Inserm, U1111, CNRS, UMR5308, ENS Lyon, Lyon, France
Bertrand Boson CIRI - Centre International de Recherche en Infectiologie, Univ Lyon, Université Claude Bernard Lyon 1, Inserm, U1111, CNRS, UMR5308, ENS Lyon, Lyon, France
Rémi Pereira de Oliveira CIRI - Centre International de Recherche en Infectiologie, Univ Lyon, Université Claude Bernard Lyon 1, Inserm, U1111, CNRS, UMR5308, ENS Lyon, Lyon, France
Christophe Combet Cancer Research Center of Lyon (CRCL), UMR Inserm 1052 - CNRS 5286 - Université Lyon 1 - Centre Léon Bérard, Lyon, France
Anja Böckmann Molecular Microbiology and Structural Biochemistry, UMR5086 CNRS-Université Lyon 1, Lyon, France
Floriane Fusil CIRI - Centre International de Recherche en Infectiologie, Univ Lyon, Université Claude Bernard Lyon 1, Inserm, U1111, CNRS, UMR5308, ENS Lyon, Lyon, France
Natalia Freitas CIRI - Centre International de Recherche en Infectiologie, Univ Lyon, Université Claude Bernard Lyon 1, Inserm, U1111, CNRS, UMR5308, ENS Lyon, Lyon, France
Alessandra Carbone Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, Paris, France
François-Loïc Cosset CIRI - Centre International de Recherche en Infectiologie, Univ Lyon, Université Claude Bernard Lyon 1, Inserm, U1111, CNRS, UMR5308, ENS Lyon, Lyon, France

Collapse

Billings WM, Morris CJ, Della Corte D. The whole is greater than its parts: ensembling improves protein contact prediction. Sci Rep 2021;11:8039. [PMID: 33850214 PMCID: PMC8044223 DOI: 10.1038/s41598-021-87524-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 03/29/2021] [Indexed: 11/30/2022] Open

Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks. PLoS Comput Biol 2021;17:e1008865. [PMID: 33770072 PMCID: PMC8026059 DOI: 10.1371/journal.pcbi.1008865] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 04/07/2021] [Accepted: 03/10/2021] [Indexed: 12/24/2022] Open

Abstract

The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library.

Ab initio protein folding has been a major unsolved problem in computational biology for more than half a century. Recent community-wide Critical Assessment of Structure Prediction (CASP) experiments have witnessed exciting progress on ab initio structure prediction, which was mainly powered by the boosting of contact-map prediction as the latter can be used as constraints to guide ab initio folding simulations. In this work, we proposed a new open-source deep-learning architecture, TripletRes, built on the residual convolutional neural networks for high-accuracy contact prediction. The large-scale benchmark and blind test results demonstrate competitive performance of the proposed methods to other top approaches in predicting medium- and long-range contact-maps that are critical for guiding protein folding simulations. Detailed data analyses showed that the major advantage of TripletRes lies in the unique protocol to fuse multiple evolutionary feature matrices which are directly extracted from whole-genome and metagenome databases and therefore minimize the information loss during the contact model training.

Collapse

Gao W, Mahajan SP, Sulam J, Gray JJ. Deep Learning in Protein Structural Modeling and Design. PATTERNS (NEW YORK, N.Y.) 2020;1:100142. [PMID: 33336200 PMCID: PMC7733882 DOI: 10.1016/j.patter.2020.100142] [Citation(s) in RCA: 100] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Kamerzell TJ, Middaugh CR. Prediction Machines: Applied Machine Learning for Therapeutic Protein Design and Development. J Pharm Sci 2020;110:665-681. [PMID: 33278409 DOI: 10.1016/j.xphs.2020.11.034] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 11/27/2020] [Accepted: 11/27/2020] [Indexed: 12/11/2022]

Chasing coevolutionary signals in intrinsically disordered proteins complexes. Sci Rep 2020;10:17962. [PMID: 33087759 PMCID: PMC7578644 DOI: 10.1038/s41598-020-74791-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 08/27/2020] [Indexed: 11/30/2022] Open

Liu J, Zhou XG, Zhang Y, Zhang GJ. CGLFold: a contact-assisted de novo protein structure prediction using global exploration and loop perturbation sampling algorithm. Bioinformatics 2020;36:2443-2450. [PMID: 31860059 DOI: 10.1093/bioinformatics/btz943] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/10/2019] [Accepted: 12/18/2019] [Indexed: 12/27/2022] Open

Forsberg BO, Aibara S, Howard RJ, Mortezaei N, Lindahl E. Arrangement and symmetry of the fungal E3BP-containing core of the pyruvate dehydrogenase complex. Nat Commun 2020;11:4667. [PMID: 32938938 PMCID: PMC7494870 DOI: 10.1038/s41467-020-18401-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 08/20/2020] [Indexed: 11/21/2022] Open

Augestad EH, Castelli M, Clementi N, Ströh LJ, Krey T, Burioni R, Mancini N, Bukh J, Prentoe J. Global and local envelope protein dynamics of hepatitis C virus determine broad antibody sensitivity. SCIENCE ADVANCES 2020;6:eabb5938. [PMID: 32923643 PMCID: PMC7449684 DOI: 10.1126/sciadv.abb5938] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Accepted: 07/13/2020] [Indexed: 05/03/2023]

Affiliation(s)

Elias H. Augestad Copenhagen Hepatitis C Program (CO-HEP), Department of Infectious Diseases, Hvidovre Hospital, and Department of Immunology and Microbiology, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark
Matteo Castelli Laboratory of Microbiology and Virology, Università “Vita-Salute” San Raffaele, Milano, 20132, Italy
Nicola Clementi Laboratory of Microbiology and Virology, Università “Vita-Salute” San Raffaele, Milano, 20132, Italy
Luisa J. Ströh Institute of Virology, Hannover Medical School, Carl-Neuberg-Str. 1, Hannover 30625, Germany
Thomas Krey Institute of Virology, Hannover Medical School, Carl-Neuberg-Str. 1, Hannover 30625, Germany German Center for Infection Research (DZIF), partner sites Hannover-Braunschweig and Hamburg-Lübeck-Borstel-Riems, Germany Center of Structural and Cell Biology in Medicine, Institute of Biochemistry, University of Luebeck, Ratzeburger Allee 160, 23562 Luebeck, Germany Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Carl-Neuberg-Str. 1, 30625 Hannover, Germany Centre for Structural Systems Biology (CSSB), Notkestraße 85, 22607 Hamburg, Germany
Roberto Burioni Laboratory of Microbiology and Virology, Università “Vita-Salute” San Raffaele, Milano, 20132, Italy
Nicasio Mancini Laboratory of Microbiology and Virology, Università “Vita-Salute” San Raffaele, Milano, 20132, Italy
Jens Bukh Copenhagen Hepatitis C Program (CO-HEP), Department of Infectious Diseases, Hvidovre Hospital, and Department of Immunology and Microbiology, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark
Jannick Prentoe Copenhagen Hepatitis C Program (CO-HEP), Department of Infectious Diseases, Hvidovre Hospital, and Department of Immunology and Microbiology, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark Corresponding author.

Collapse

Ge R, Feng G, Jing X, Zhang R, Wang P, Wu Q. EnACP: An Ensemble Learning Model for Identification of Anticancer Peptides. Front Genet 2020;11:760. [PMID: 32903636 PMCID: PMC7438906 DOI: 10.3389/fgene.2020.00760] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Accepted: 06/26/2020] [Indexed: 12/13/2022] Open

Li Y, Hu J, Zhang C, Yu DJ, Zhang Y. ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics 2020;35:4647-4655. [PMID: 31070716 DOI: 10.1093/bioinformatics/btz291] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Revised: 03/18/2019] [Accepted: 04/17/2019] [Indexed: 12/20/2022] Open

Zhang GJ, Ma LF, Wang XQ, Zhou XG. Secondary Structure and Contact Guided Differential Evolution for Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:1068-1081. [PMID: 30295627 DOI: 10.1109/tcbb.2018.2873691] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Feng J, Shukla D. FingerprintContacts: Predicting Alternative Conformations of Proteins from Coevolution. J Phys Chem B 2020;124:3605-3615. [PMID: 32283936 DOI: 10.1021/acs.jpcb.9b11869] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Liu S, Xiang X, Gao X, Liu H. Neighborhood Preference of Amino Acids in Protein Structures and its Applications in Protein Structure Assessment. Sci Rep 2020;10:4371. [PMID: 32152349 PMCID: PMC7062742 DOI: 10.1038/s41598-020-61205-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 02/24/2020] [Indexed: 12/02/2022] Open

Gutmann B, Royan S, Schallenberg-Rüdinger M, Lenz H, Castleden IR, McDowell R, Vacher MA, Tonti-Filippini J, Bond CS, Knoop V, Small ID. The Expansion and Diversification of Pentatricopeptide Repeat RNA-Editing Factors in Plants. MOLECULAR PLANT 2020;13:215-230. [PMID: 31760160 DOI: 10.1016/j.molp.2019.11.002] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 10/10/2019] [Accepted: 11/11/2019] [Indexed: 05/08/2023]

Affiliation(s)

Bernard Gutmann Australian Research Council Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, WA, Australia; School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, WA, Australia
Santana Royan Australian Research Council Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, WA, Australia; School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, WA, Australia
Mareike Schallenberg-Rüdinger IZMB - Institut für Zelluläre und Molekulare Botanik, Abteilung Molekulare Evolution, Universität Bonn, Kirschallee 1, 53115 Bonn, Germany
Henning Lenz IZMB - Institut für Zelluläre und Molekulare Botanik, Abteilung Molekulare Evolution, Universität Bonn, Kirschallee 1, 53115 Bonn, Germany
Ian R Castleden Australian Research Council Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, WA, Australia; School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, WA, Australia
Rose McDowell Australian Research Council Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, WA, Australia; School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, WA, Australia
Michael A Vacher Australian Research Council Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, WA, Australia; School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, WA, Australia
Julian Tonti-Filippini Australian Research Council Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, WA, Australia; School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, WA, Australia
Charles S Bond School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, WA, Australia
Volker Knoop IZMB - Institut für Zelluläre und Molekulare Botanik, Abteilung Molekulare Evolution, Universität Bonn, Kirschallee 1, 53115 Bonn, Germany
Ian D Small Australian Research Council Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, Perth 6009, WA, Australia; School of Molecular Sciences, The University of Western Australia, Crawley, Perth 6009, WA, Australia.

Collapse

Machine learning for protein folding and dynamics. Curr Opin Struct Biol 2020;60:77-84. [DOI: 10.1016/j.sbi.2019.12.005] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Revised: 11/21/2019] [Accepted: 12/05/2019] [Indexed: 12/17/2022]

Fukuda H, Tomii K. DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment. BMC Bioinformatics 2020;21:10. [PMID: 31918654 PMCID: PMC6953294 DOI: 10.1186/s12859-019-3190-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 11/04/2019] [Indexed: 12/30/2022] Open

Abstract

Background

Recently developed methods of protein contact prediction, a crucially important step for protein structure prediction, depend heavily on deep neural networks (DNNs) and multiple sequence alignments (MSAs) of target proteins. Protein sequences are accumulating to an increasing degree such that abundant sequences to construct an MSA of a target protein are readily obtainable. Nevertheless, many cases present different ends of the number of sequences that can be included in an MSA used for contact prediction. The abundant sequences might degrade prediction results, but opportunities remain for a limited number of sequences to construct an MSA. To resolve these persistent issues, we strove to develop a novel framework using DNNs in an end-to-end manner for contact prediction.

Results

We developed neural network models to improve precision of both deep and shallow MSAs. Results show that higher prediction accuracy was achieved by assigning weights to sequences in a deep MSA. Moreover, for shallow MSAs, adding a few sequential features was useful to increase the prediction accuracy of long-range contacts in our model. Based on these models, we expanded our model to a multi-task model to achieve higher accuracy by incorporating predictions of secondary structures and solvent-accessible surface areas. Moreover, we demonstrated that ensemble averaging of our models can raise accuracy. Using past CASP target protein domains, we tested our models and demonstrated that our final model is superior to or equivalent to existing meta-predictors.

Conclusions

The end-to-end learning framework we built can use information derived from either deep or shallow MSAs for contact prediction. Recently, an increasing number of protein sequences have become accessible, including metagenomic sequences, which might degrade contact prediction results. Under such circumstances, our model can provide a means to reduce noise automatically. According to results of tertiary structure prediction based on contacts and secondary structures predicted by our model, more accurate three-dimensional models of a target protein are obtainable than those from existing ECA methods, starting from its MSA. DeepECA is available from https://github.com/tomiilab/DeepECA.

Collapse

Rao R, Bhattacharya N, Thomas N, Duan Y, Chen X, Canny J, Abbeel P, Song YS. Evaluating Protein Transfer Learning with TAPE. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2019;32:9689-9701. [PMID: 33390682 PMCID: PMC7774645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Shrestha R, Fajardo E, Gil N, Fidelis K, Kryshtafovych A, Monastyrskyy B, Fiser A. Assessing the accuracy of contact predictions in CASP13. Proteins 2019;87:1058-1068. [PMID: 31587357 PMCID: PMC6851495 DOI: 10.1002/prot.25819] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/17/2019] [Accepted: 09/17/2019] [Indexed: 01/07/2023]

Zhang H, Zhang Q, Ju F, Zhu J, Gao Y, Xie Z, Deng M, Sun S, Zheng WM, Bu D. Predicting protein inter-residue contacts using composite likelihood maximization and deep learning. BMC Bioinformatics 2019;20:537. [PMID: 31664895 PMCID: PMC6821021 DOI: 10.1186/s12859-019-3051-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 08/22/2019] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Accurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective in inferring inter-residue contacts. The Markov random field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate; in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccurate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge.

RESULTS

In this study, we present such an approach (called clmDCA) for contact prediction. Unlike plmDCA using pseudo-likelihood, i.e., the product of conditional probability of individual residues, our approach uses composite-likelihood, i.e., the product of conditional probability of all residue pairs. Composite likelihood has been theoretically proved as a better approximation to the actual likelihood function than pseudo-likelihood. Meanwhile, composite likelihood is still efficient to maximize, thus ensuring the efficiency of clmDCA. We present comprehensive experiments on popular benchmark datasets, including PSICOV dataset and CASP-11 dataset, to show that: i) clmDCA alone outperforms the existing MRF-based approaches in prediction accuracy. ii) When equipped with deep learning technique for refinement, the prediction accuracy of clmDCA was further significantly improved, suggesting the suitability of clmDCA for subsequent refinement procedure. We further present a successful application of the predicted contacts to accurately build tertiary structures for proteins in the PSICOV dataset.

CONCLUSIONS

Composite likelihood maximization algorithm can efficiently estimate the parameters of Markov Random Fields and can improve the prediction accuracy of protein inter-residue contacts.

Collapse