Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. ACTA ACUST UNITED AC 2011;28:184-90. [PMID: 22101153 DOI: 10.1093/bioinformatics/btr638] [Citation(s) in RCA: 535] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

For:	Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. ACTA ACUST UNITED AC 2011;28:184-90. [PMID: 22101153 DOI: 10.1093/bioinformatics/btr638] [Citation(s) in RCA: 535] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Number

Cited by Other Article(s)

251

Wuyun Q, Zheng W, Peng Z, Yang J. A large-scale comparative assessment of methods for residue-residue contact prediction. Brief Bioinform 2019;19:219-230. [PMID: 27802931 DOI: 10.1093/bib/bbw106] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Indexed: 11/14/2022] Open

252

Téletchéa S, Santuz H, Léonard S, Etchebest C. Repository of Enriched Structures of Proteins Involved in the Red Blood Cell Environment (RESPIRE). PLoS One 2019;14:e0211043. [PMID: 30794542 PMCID: PMC6386447 DOI: 10.1371/journal.pone.0211043] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 01/07/2019] [Indexed: 12/25/2022] Open

253

Figliuzzi M, Barrat-Charlaix P, Weigt M. How Pairwise Coevolutionary Models Capture the Collective Residue Variability in Proteins? Mol Biol Evol 2019;35:1018-1027. [PMID: 29351669 DOI: 10.1093/molbev/msy007] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

254

Ji S, Oruç T, Mead L, Rehman MF, Thomas CM, Butterworth S, Winn PJ. DeepCDpred: Inter-residue distance and contact prediction for improved prediction of protein structure. PLoS One 2019;14:e0205214. [PMID: 30620738 PMCID: PMC6324825 DOI: 10.1371/journal.pone.0205214] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 12/13/2018] [Indexed: 11/28/2022] Open

255

Malinverni D, Barducci A. Coevolutionary Analysis of Protein Sequences for Molecular Modeling. Methods Mol Biol 2019;2022:379-397. [PMID: 31396912 DOI: 10.1007/978-1-4939-9608-7_16] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

256

Dehghani T, Naghibzadeh M, Eghdami M. BetaDL: A protein beta-sheet predictor utilizing a deep learning model and independent set solution. Comput Biol Med 2019;104:241-249. [PMID: 30530227 DOI: 10.1016/j.compbiomed.2018.11.021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Revised: 11/23/2018] [Accepted: 11/27/2018] [Indexed: 10/27/2022]

257

MacCarthy E, Perry D, Kc DB. Advances in Protein Super-Secondary Structure Prediction and Application to Protein Structure Prediction. Methods Mol Biol 2019;1958:15-45. [PMID: 30945212 DOI: 10.1007/978-1-4939-9161-7_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

258

Wan C. Background on Biology of Ageing and Bioinformatics. ADVANCED INFORMATION AND KNOWLEDGE PROCESSING 2019:25-43. [DOI: 10.1007/978-3-319-97919-9_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]

259

Schmidt M, Hamacher K. hoDCA: higher order direct-coupling analysis. BMC Bioinformatics 2018;19:546. [PMID: 30594145 PMCID: PMC6311078 DOI: 10.1186/s12859-018-2583-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2018] [Accepted: 12/12/2018] [Indexed: 11/30/2022] Open

260

Koehl P, Orland H, Delarue M. Numerical Encodings of Amino Acids in Multivariate Gaussian Modeling of Protein Multiple Sequence Alignments. Molecules 2018;24:E104. [PMID: 30597916 PMCID: PMC6337344 DOI: 10.3390/molecules24010104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 12/21/2018] [Accepted: 12/24/2018] [Indexed: 11/17/2022] Open

261

Huang YJ, Brock KP, Ishida Y, Swapna GVT, Inouye M, Marks DS, Sander C, Montelione GT. Combining Evolutionary Covariance and NMR Data for Protein Structure Determination. Methods Enzymol 2018;614:363-392. [PMID: 30611430 PMCID: PMC6640129 DOI: 10.1016/bs.mie.2018.11.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Affiliation(s)

Yuanpeng Janet Huang Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
Kelly P Brock Department of Systems Biology, Harvard Medical School, Boston, MA, United States
Yojiro Ishida Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
Gurla V T Swapna Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
Masayori Inouye Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
Debora S Marks Department of Systems Biology, Harvard Medical School, Boston, MA, United States
Chris Sander Department of Cell Biology, Harvard Medical School and cBio Center, Dana-Farber Cancer Institute, Boston, MA, United States
Gaetano T Montelione Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ, United States.

Collapse

262

Neuwald AF, Altschul SF. Statistical investigations of protein residue direct couplings. PLoS Comput Biol 2018;14:e1006237. [PMID: 30596639 PMCID: PMC6329532 DOI: 10.1371/journal.pcbi.1006237] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 01/11/2019] [Accepted: 11/23/2018] [Indexed: 12/12/2022] Open

263

Butler BM, Kazan IC, Kumar A, Ozkan SB. Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs. PLoS Comput Biol 2018;14:e1006626. [PMID: 30496278 PMCID: PMC6289467 DOI: 10.1371/journal.pcbi.1006626] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 12/11/2018] [Accepted: 11/09/2018] [Indexed: 11/18/2022] Open

Abstract

The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures.

Proteins are dynamic machines that undergo atomic fluctuations, side chain rotations, and collective domain movements that are required for biological function. There is, therefore, a need for quantitative metrics that capture the dynamic fluctuations per position to understand the critical role of protein dynamics in shaping biological functions. A limiting factor in incorporating structural dynamics information in the classification of non-synonymous single nucleotide variants (nSNVs) is the limited number of known 3D structures compared to the vast number of available sequences. We have developed a new sequence-based GNM method, termed Seq-GNM, which uses co-evolving amino acid positions based on the multiple sequence alignment of a given query sequence to estimate the thermal motions of C-alpha atoms. In this paper, we have demonstrated that the predicted thermal motions using Seq-GNM are in reasonable agreement with experimental B-factors as well as B-factors computed using 3D crystal structures. We also provide evidence that B-factors predicted by Seq-GNM are capable of distinguishing between disease-associated and neutral nSNVs.

Collapse

264

Cheung NJ, Yu W. De novo protein structure prediction using ultra-fast molecular dynamics simulation. PLoS One 2018;13:e0205819. [PMID: 30458007 PMCID: PMC6245515 DOI: 10.1371/journal.pone.0205819] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Accepted: 10/02/2018] [Indexed: 11/19/2022] Open

265

Ding W, Mao W, Shao D, Zhang W, Gong H. DeepConPred2: An Improved Method for the Prediction of Protein Residue Contacts. Comput Struct Biotechnol J 2018;16:503-510. [PMID: 30505403 PMCID: PMC6247404 DOI: 10.1016/j.csbj.2018.10.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/16/2018] [Accepted: 10/18/2018] [Indexed: 12/18/2022] Open

266

Vorberg S, Seemayer S, Söding J. Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction. PLoS Comput Biol 2018;14:e1006526. [PMID: 30395601 PMCID: PMC6237422 DOI: 10.1371/journal.pcbi.1006526] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 11/15/2018] [Accepted: 09/24/2018] [Indexed: 12/01/2022] Open

Abstract

Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny.

Knowledge about the three-dimensional structure of proteins is key to understanding their function and role in biological processes and diseases. The experimental structure determination techniques, such as X-ray crystallography or electron cryo-microscopy, are labour intensive, time-consuming and expensive. Therefore, complementary computational methods to predict a protein’s structure have become indispensable. Over the last years, immense progress has been made in predicting protein structures from their amino acid sequence by utilizing highly accurate predictions of spatial contacts between amino acid residues as constraints in folding simulations. However, contact prediction methods require large numbers of homologous protein sequences in order to discriminate between signal and noise. A major obstacle preventing progress on the statistical methodology is our limited understanding of the different components of noise that are known to affect the predictions. We provide two tools, CCMpredPy and CCMgen, that can be used to learn highly accurate statistical models for contact prediction and to simulate protein evolution according to the statistical constraints between positions of residues as specified by these models, respectively. We showcase their usefulness by quantifying the relative contribution of noise arising from entropy and phylogeny on the predicted contacts, which will facilitate the improvement of the statistical methodology.

Collapse

267

Bitbol AF. Inferring interaction partners from protein sequences using mutual information. PLoS Comput Biol 2018;14:e1006401. [PMID: 30422978 PMCID: PMC6258550 DOI: 10.1371/journal.pcbi.1006401] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 11/27/2018] [Accepted: 10/27/2018] [Indexed: 11/30/2022] Open

268

Schelling M, Hopf TA, Rost B. Evolutionary couplings and sequence variation effect predict protein binding sites. Proteins 2018;86:1064-1074. [DOI: 10.1002/prot.25585] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Revised: 06/14/2018] [Accepted: 07/04/2018] [Indexed: 01/16/2023]

269

Dhar A, Davidsen K, Matsen FA, Minin VN. Predicting B cell receptor substitution profiles using public repertoire data. PLoS Comput Biol 2018;14:e1006388. [PMID: 30332400 PMCID: PMC6205660 DOI: 10.1371/journal.pcbi.1006388] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 10/29/2018] [Accepted: 07/22/2018] [Indexed: 12/31/2022] Open

Abstract

B cells develop high affinity receptors during the course of affinity maturation, a cyclic process of mutation and selection. At the end of affinity maturation, a number of cells sharing the same ancestor (i.e. in the same “clonal family”) are released from the germinal center; their amino acid frequency profile reflects the allowed and disallowed substitutions at each position. These clonal-family-specific frequency profiles, called “substitution profiles”, are useful for studying the course of affinity maturation as well as for antibody engineering purposes. However, most often only a single sequence is recovered from each clonal family in a sequencing experiment, making it impossible to construct a clonal-family-specific substitution profile. Given the public release of many high-quality large B cell receptor datasets, one may ask whether it is possible to use such data in a prediction model for clonal-family-specific substitution profiles. In this paper, we present the method “Substitution Profiles Using Related Families” (SPURF), a penalized tensor regression framework that integrates information from a rich assemblage of datasets to predict the clonal-family-specific substitution profile for any single input sequence. Using this framework, we show that substitution profiles from similar clonal families can be leveraged together with simulated substitution profiles and germline gene sequence information to improve prediction. We fit this model on a large public dataset and validate the robustness of our approach on two external datasets. Furthermore, we provide a command-line tool in an open-source software package (https://github.com/krdav/SPURF) implementing these ideas and providing easy prediction using our pre-fit models.

Antibody engineering can be greatly informed by knowledge about the underlying affinity maturation process. As such this can be probed by sequencing, but unfortunately, in practice often only one member of the clonal family is sequenced, making it difficult to determine a set of possible amino acid mutations that would retain the original antibody antigen binding affinity. We overcome this data sparsity by developing a statistical learning approach that leverages vast information about amino acid preferences available in public immune system repertoire data. We use a penalized regression approach to devise a flexible statistical model that integrates multiple sources of information into a coherent prediction framework and validate our prediction algorithm using subsampling and held out data.

Collapse

270

Rouse SL, Matthews SJ, Dueholm MS. Ecology and Biogenesis of Functional Amyloids in Pseudomonas. J Mol Biol 2018;430:3685-3695. [PMID: 29753779 PMCID: PMC6173800 DOI: 10.1016/j.jmb.2018.05.004] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 05/03/2018] [Accepted: 05/04/2018] [Indexed: 12/02/2022]

271

How is structural divergence related to evolutionary information? Mol Phylogenet Evol 2018;127:859-866. [DOI: 10.1016/j.ympev.2018.06.033] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Revised: 06/01/2018] [Accepted: 06/19/2018] [Indexed: 12/15/2022]

272

Jones DT, Kandathil SM. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics 2018;34:3308-3315. [PMID: 29718112 PMCID: PMC6157083 DOI: 10.1093/bioinformatics/bty341] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Revised: 03/06/2018] [Accepted: 04/25/2018] [Indexed: 12/22/2022] Open

Abstract

Motivation

In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation.

Results

Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions.

Availability and implementation

DeepCov is freely available at https://github.com/psipred/DeepCov.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

273

Nerli S, Sgourakis NG. CS-ROSETTA. Methods Enzymol 2018;614:321-362. [PMID: 30611429 DOI: 10.1016/bs.mie.2018.07.005] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]

274

Wu H, Cao C, Xia X, Lu Q. Unified Deep Learning Architecture for Modeling Biology Sequence. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:1445-1452. [PMID: 28991751 DOI: 10.1109/tcbb.2017.2760832] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

275

Jakubec D, Kratochvíl M, Vymĕtal J, Vondrášek J. Widespread evolutionary crosstalk among protein domains in the context of multi-domain proteins. PLoS One 2018;13:e0203085. [PMID: 30169546 PMCID: PMC6118372 DOI: 10.1371/journal.pone.0203085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 08/14/2018] [Indexed: 11/20/2022] Open

276

Kc DB. Recent advances in sequence-based protein structure prediction. Brief Bioinform 2018;18:1021-1032. [PMID: 27562963 DOI: 10.1093/bib/bbw070] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2016] [Indexed: 11/13/2022] Open

277

Khan S, Guo TW, Misra S. A coevolution-guided model for the rotor of the bacterial flagellar motor. Sci Rep 2018;8:11754. [PMID: 30082903 PMCID: PMC6079021 DOI: 10.1038/s41598-018-30293-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Accepted: 07/19/2018] [Indexed: 01/17/2023] Open

278

Kassem MM, Christoffersen LB, Cavalli A, Lindorff-Larsen K. Enhancing coevolution-based contact prediction by imposing structural self-consistency of the contacts. Sci Rep 2018;8:11112. [PMID: 30042380 PMCID: PMC6057941 DOI: 10.1038/s41598-018-29357-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 07/10/2018] [Indexed: 11/29/2022] Open

279

Delarue M, Koehl P. Combined approaches from physics, statistics, and computer science for ab initio protein structure prediction: ex unitate vires (unity is strength)? F1000Res 2018;7. [PMID: 30079234 PMCID: PMC6058471 DOI: 10.12688/f1000research.14870.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/19/2018] [Indexed: 11/20/2022] Open

280

de Oliveira SHP, Shi J, Deane CM. Comparing co-evolution methods and their application to template-free protein structure prediction. Bioinformatics 2018;33:373-381. [PMID: 28171606 PMCID: PMC5860252 DOI: 10.1093/bioinformatics/btw618] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2016] [Revised: 09/19/2016] [Accepted: 09/22/2016] [Indexed: 02/01/2023] Open

281

Baldi P. Deep Learning in Biomedical Data Science. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013343] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

282

Holland J, Pan Q, Grigoryan G. Contact prediction is hardest for the most informative contacts, but improves with the incorporation of contact potentials. PLoS One 2018;13:e0199585. [PMID: 29953468 PMCID: PMC6023208 DOI: 10.1371/journal.pone.0199585] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 06/11/2018] [Indexed: 11/18/2022] Open

283

Nerli S, McShan AC, Sgourakis NG. Chemical shift-based methods in NMR structure determination. PROGRESS IN NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY 2018;106-107:1-25. [PMID: 31047599 PMCID: PMC6788782 DOI: 10.1016/j.pnmrs.2018.03.002] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 03/09/2018] [Accepted: 03/09/2018] [Indexed: 05/08/2023]

Abstract

Chemical shifts are highly sensitive probes harnessed by NMR spectroscopists and structural biologists as conformational parameters to characterize a range of biological molecules. Traditionally, assignment of chemical shifts has been a labor-intensive process requiring numerous samples and a suite of multidimensional experiments. Over the past two decades, the development of complementary computational approaches has bolstered the analysis, interpretation and utilization of chemical shifts for elucidation of high resolution protein and nucleic acid structures. Here, we review the development and application of chemical shift-based methods for structure determination with a focus on ab initio fragment assembly, comparative modeling, oligomeric systems, and automated assignment methods. Throughout our discussion, we point out practical uses, as well as advantages and caveats, of using chemical shifts in structure modeling. We additionally highlight (i) hybrid methods that employ chemical shifts with other types of NMR restraints (residual dipolar couplings, paramagnetic relaxation enhancements and pseudocontact shifts) that allow for improved accuracy and resolution of generated 3D structures, (ii) the utilization of chemical shifts to model the structures of sparsely populated excited states, and (iii) modeling of sidechain conformations. Finally, we briefly discuss the advantages of contemporary methods that employ sparse NMR data recorded using site-specific isotope labeling schemes for chemical shift-driven structure determination of larger molecules. With this review, we aim to emphasize the accessibility and versatility of chemical shifts for structure determination of challenging biological systems, and to point out emerging areas of development that lead us towards the next generation of tools.

Collapse

284

Szurmant H, Weigt M. Inter-residue, inter-protein and inter-family coevolution: bridging the scales. Curr Opin Struct Biol 2018;50:26-32. [PMID: 29101847 PMCID: PMC5940578 DOI: 10.1016/j.sbi.2017.10.014] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 10/12/2017] [Accepted: 10/13/2017] [Indexed: 10/18/2022]

285

Dutta S, Eckmann JP, Libchaber A, Tlusty T. Green function of correlated genes in a minimal mechanical model of protein evolution. Proc Natl Acad Sci U S A 2018;115:E4559-E4568. [PMID: 29712824 PMCID: PMC5960285 DOI: 10.1073/pnas.1716215115] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

286

Simkovic F, Thomas JMH, Rigden DJ. ConKit: a python interface to contact predictions. Bioinformatics 2018;33:2209-2211. [PMID: 28369168 PMCID: PMC5870551 DOI: 10.1093/bioinformatics/btx148] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 03/14/2017] [Indexed: 11/19/2022] Open

287

Fonseca-Júnior NJ, Afonso MQ, Oliveira LC, Bleicher L. PFstats: A Network-Based Open Tool for Protein Family Analysis. J Comput Biol 2018;25:480-486. [DOI: 10.1089/cmb.2017.0181] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

288

Raimondi D, Orlando G, Moreau Y, Vranken WF. Ultra-fast global homology detection with Discrete Cosine Transform and Dynamic Time Warping. Bioinformatics 2018;34:3118-3125. [DOI: 10.1093/bioinformatics/bty309] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Accepted: 04/18/2018] [Indexed: 11/14/2022] Open

289

Mao W, Wang T, Zhang W, Gong H. Identification of residue pairing in interacting β-strands from a predicted residue contact map. BMC Bioinformatics 2018;19:146. [PMID: 29673311 PMCID: PMC5907701 DOI: 10.1186/s12859-018-2150-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 04/09/2018] [Indexed: 12/04/2022] Open

Abstract

Background

Despite the rapid progress of protein residue contact prediction, predicted residue contact maps frequently contain many errors. However, information of residue pairing in β strands could be extracted from a noisy contact map, due to the presence of characteristic contact patterns in β-β interactions. This information may benefit the tertiary structure prediction of mainly β proteins. In this work, we propose a novel ridge-detection-based β-β contact predictor to identify residue pairing in β strands from any predicted residue contact map.

Results

Our algorithm RDb₂C adopts ridge detection, a well-developed technique in computer image processing, to capture consecutive residue contacts, and then utilizes a novel multi-stage random forest framework to integrate the ridge information and additional features for prediction. Starting from the predicted contact map of CCMpred, RDb₂C remarkably outperforms all state-of-the-art methods on two conventional test sets of β proteins (BetaSheet916 and BetaSheet1452), and achieves F1-scores of ~ 62% and ~ 76% at the residue level and strand level, respectively. Taking the prediction of the more advanced RaptorX-Contact as input, RDb₂C achieves impressively higher performance, with F1-scores reaching ~ 76% and ~ 86% at the residue level and strand level, respectively. In a test of structural modeling using the top 1 L predicted contacts as constraints, for 61 mainly β proteins, the average TM-score achieves 0.442 when using the raw RaptorX-Contact prediction, but increases to 0.506 when using the improved prediction by RDb₂C.

Conclusion

Our method can significantly improve the prediction of β-β contacts from any predicted residue contact maps. Prediction results of our algorithm could be directly applied to effectively facilitate the practical structure prediction of mainly β proteins.

Availability

All source data and codes are available at http://166.111.152.91/Downloads.html or the GitHub address of https://github.com/wzmao/RDb2C.

Electronic supplementary material

The online version of this article (10.1186/s12859-018-2150-1) contains supplementary material, which is available to authorized users.

Collapse

290

Gil N, Fiser A. Identifying functionally informative evolutionary sequence profiles. Bioinformatics 2018;34:1278-1286. [PMID: 29211823 PMCID: PMC5905606 DOI: 10.1093/bioinformatics/btx779] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Accepted: 11/29/2017] [Indexed: 01/06/2023] Open

291

de Oliveira SHP, Law EC, Shi J, Deane CM. Sequential search leads to faster, more efficient fragment-based de novo protein structure prediction. Bioinformatics 2018;34:1132-1140. [PMID: 29136098 PMCID: PMC6030820 DOI: 10.1093/bioinformatics/btx722] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 09/22/2017] [Accepted: 11/04/2017] [Indexed: 01/12/2023] Open

292

Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018;1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]

293

He B, Mortuza SM, Wang Y, Shen HB, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics 2018;33:2296-2306. [PMID: 28369334 DOI: 10.1093/bioinformatics/btx164] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 03/21/2017] [Indexed: 12/12/2022] Open

Abstract

Motivation

Recent CASP experiments have witnessed exciting progress on folding large-size non-humongous proteins with the assistance of co-evolution based contact predictions. The success is however anecdotal due to the requirement of the contact prediction methods for the high volume of sequence homologs that are not available to most of the non-humongous protein targets. Development of efficient methods that can generate balanced and reliable contact maps for different type of protein targets is essential to enhance the success rate of the ab initio protein structure prediction.

Results

We developed a new pipeline, NeBcon, which uses the naïve Bayes classifier (NBC) theorem to combine eight state of the art contact methods that are built from co-evolution and machine learning approaches. The posterior probabilities of the NBC model are then trained with intrinsic structural features through neural network learning for the final contact map prediction. NeBcon was tested on 98 non-redundant proteins, which improves the accuracy of the best co-evolution based meta-server predictor by 22%; the magnitude of the improvement increases to 45% for the hard targets that lack sequence and structural homologs in the databases. Detailed data analysis showed that the major contribution to the improvement is due to the optimized NBC combination of the complementary information from both co-evolution and machine learning predictions. The neural network training also helps to improve the coupling of the NBC posterior probability and the intrinsic structural features, which were found particularly important for the proteins that do not have sufficient number of homologous sequences to derive reliable co-evolution profiles.

Availiablity and Implementation

On-line server and standalone package of the program are available at http://zhanglab.ccmb.med.umich.edu/NeBcon/ .

Contact

zhng@umich.edu.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

294

Douam F, Fusil F, Enguehard M, Dib L, Nadalin F, Schwaller L, Hrebikova G, Mancip J, Mailly L, Montserret R, Ding Q, Maisse C, Carlot E, Xu K, Verhoeyen E, Baumert TF, Ploss A, Carbone A, Cosset FL, Lavillette D. A protein coevolution method uncovers critical features of the Hepatitis C Virus fusion mechanism. PLoS Pathog 2018;14:e1006908. [PMID: 29505618 PMCID: PMC5854445 DOI: 10.1371/journal.ppat.1006908] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Revised: 03/15/2018] [Accepted: 01/26/2018] [Indexed: 12/15/2022] Open

Abstract

Amino-acid coevolution can be referred to mutational compensatory patterns preserving the function of a protein. Viral envelope glycoproteins, which mediate entry of enveloped viruses into their host cells, are shaped by coevolution signals that confer to viruses the plasticity to evade neutralizing antibodies without altering viral entry mechanisms. The functions and structures of the two envelope glycoproteins of the Hepatitis C Virus (HCV), E1 and E2, are poorly described. Especially, how these two proteins mediate the HCV fusion process between the viral and the cell membrane remains elusive. Here, as a proof of concept, we aimed to take advantage of an original coevolution method recently developed to shed light on the HCV fusion mechanism. When first applied to the well-characterized Dengue Virus (DENV) envelope glycoproteins, coevolution analysis was able to predict important structural features and rearrangements of these viral protein complexes. When applied to HCV E1E2, computational coevolution analysis predicted that E1 and E2 refold interdependently during fusion through rearrangements of the E2 Back Layer (BL). Consistently, a soluble BL-derived polypeptide inhibited HCV infection of hepatoma cell lines, primary human hepatocytes and humanized liver mice. We showed that this polypeptide specifically inhibited HCV fusogenic rearrangements, hence supporting the critical role of this domain during HCV fusion. By combining coevolution analysis and in vitro assays, we also uncovered functionally-significant coevolving signals between E1 and E2 BL/Stem regions that govern HCV fusion, demonstrating the accuracy of our coevolution predictions. Altogether, our work shed light on important structural features of the HCV fusion mechanism and contributes to advance our functional understanding of this process. This study also provides an important proof of concept that coevolution can be employed to explore viral protein mediated-processes, and can guide the development of innovative translational strategies against challenging human-tropic viruses.

Several virus-mediated molecular processes remain poorly described, which dampen the development of potent anti-viral therapies. Hence, new experimental strategies need to be undertaken to improve and accelerate our understanding of these processes. Here, as a proof of concept, we employ amino-acid coevolution as a tool to gain insights into the structural rearrangements of Hepatitis C Virus (HCV) envelope glycoproteins E1 and E2 during virus fusion with the cell membrane, and provide a basis for the inhibition of this process. Our coevolution analysis predicted that a specific domain of E2, the Back Layer (BL) is involved into significant conformational changes with E1 during the fusion of the HCV membrane with the cellular membrane. Consistently, a recombinant, soluble form of the BL was able to inhibit E1E2 fusogenic rearrangements and HCV infection. Moreover, predicted coevolution networks involving E1 and BL residues, as well as E1 and BL-adjacent residues, were found to modulate virus fusion. Our data shows that coevolution analysis is a powerful and underused approach that can provide significant insights into the functions and structural rearrangements of viral proteins. Importantly, this approach can also provide structural and molecular basis for the design of effective anti-viral drugs, and opens new perspectives to rapidly identify effective antiviral strategies against emerging and re-emerging viral pathogens.

Collapse

Affiliation(s)

Florian Douam CIRI–International Center for Infectiology Research, Team EVIR, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, Ecole Normale Supérieure de Lyon, Univ Lyon, Lyon, France CNRS UMR5557 Microbial ecology, Université Claude Bernard Lyon 1, INRA, UMR1418, Villeurbanne, France Department of Molecular Biology, Princeton University, Princeton NJ, United States of America
Floriane Fusil CIRI–International Center for Infectiology Research, Team EVIR, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, Ecole Normale Supérieure de Lyon, Univ Lyon, Lyon, France
Margot Enguehard CNRS UMR5557 Microbial ecology, Université Claude Bernard Lyon 1, INRA, UMR1418, Villeurbanne, France University of Lyon, Université Claude Bernard Lyon1, INRA, EPHE, IVPC, Viral Infections and Comparative Pathology, UMR754, Lyon, France Institut Hospitalo-Universitaire, Pôle Hépato-digestif, Hôpitaux Universitaires de Strasbourg, Strasbourg, France
Linda Dib Molecular Phylogenetics and Speciation, Département d’écologie et évolution, Université de Lausanne, Lausanne, Suisse
Francesca Nadalin Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France
Loïc Schwaller Mathematical Institute, Leiden University, Leiden, The Netherlands
Gabriela Hrebikova Department of Molecular Biology, Princeton University, Princeton NJ, United States of America
Jimmy Mancip CIRI–International Center for Infectiology Research, Team EVIR, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, Ecole Normale Supérieure de Lyon, Univ Lyon, Lyon, France
Laurent Mailly Inserm, U1110, Institut de Recherche sur les Maladies Virales et Hépatiques, Strasbourg, France Université de Strasbourg, Strasbourg, France
Roland Montserret Institut de Biologie et Chimie des Protéines, Bases Moléculaires et Structurales des Systèmes Infectieux, Labex Ecofect, UMR 5086 CNRS, Université de Lyon, Lyon, France
Qiang Ding Department of Molecular Biology, Princeton University, Princeton NJ, United States of America
Carine Maisse University of Lyon, Université Claude Bernard Lyon1, INRA, EPHE, IVPC, Viral Infections and Comparative Pathology, UMR754, Lyon, France
Emilie Carlot CAS Key Laboratory of Molecular Virology and Immunology, Unit of interspecies transmission of arboviruses and antivirals, Institut Pasteur of Shanghai, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
Ke Xu CAS Key Laboratory of Molecular Virology and Immunology, Unit of interspecies transmission of arboviruses and antivirals, Institut Pasteur of Shanghai, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
Els Verhoeyen CIRI–International Center for Infectiology Research, Team EVIR, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, Ecole Normale Supérieure de Lyon, Univ Lyon, Lyon, France
Thomas F. Baumert Institut Hospitalo-Universitaire, Pôle Hépato-digestif, Hôpitaux Universitaires de Strasbourg, Strasbourg, France Inserm, U1110, Institut de Recherche sur les Maladies Virales et Hépatiques, Strasbourg, France Université de Strasbourg, Strasbourg, France
Alexander Ploss Department of Molecular Biology, Princeton University, Princeton NJ, United States of America
Alessandra Carbone Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative, Paris, France Institut Universitaire de France, Paris, France * E-mail: (FLC); (AC); (DL)
François-Loïc Cosset CIRI–International Center for Infectiology Research, Team EVIR, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, Ecole Normale Supérieure de Lyon, Univ Lyon, Lyon, France * E-mail: (FLC); (AC); (DL)
Dimitri Lavillette CIRI–International Center for Infectiology Research, Team EVIR, Inserm, U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, Ecole Normale Supérieure de Lyon, Univ Lyon, Lyon, France CNRS UMR5557 Microbial ecology, Université Claude Bernard Lyon 1, INRA, UMR1418, Villeurbanne, France University of Lyon, Université Claude Bernard Lyon1, INRA, EPHE, IVPC, Viral Infections and Comparative Pathology, UMR754, Lyon, France CAS Key Laboratory of Molecular Virology and Immunology, Unit of interspecies transmission of arboviruses and antivirals, Institut Pasteur of Shanghai, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China * E-mail: (FLC); (AC); (DL)

Collapse

295

Cocco S, Feinauer C, Figliuzzi M, Monasson R, Weigt M. Inverse statistical physics of protein sequences: a key issues review. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2018;81:032601. [PMID: 29120346 DOI: 10.1088/1361-6633/aa9965] [Citation(s) in RCA: 126] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

296

Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins 2018;86 Suppl 1:51-66. [PMID: 29071738 PMCID: PMC5820169 DOI: 10.1002/prot.25407] [Citation(s) in RCA: 126] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/06/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022]

297

Garrido-Martín D, Pazos F. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues. BMC Bioinformatics 2018;19:67. [PMID: 29482506 PMCID: PMC5827975 DOI: 10.1186/s12859-018-2084-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 02/21/2018] [Indexed: 11/10/2022] Open

298

Le Q, Sievers F, Higgins DG. Protein multiple sequence alignment benchmarking through secondary structure prediction. Bioinformatics 2018;33:1331-1337. [PMID: 28093407 PMCID: PMC5408826 DOI: 10.1093/bioinformatics/btw840] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 01/10/2017] [Indexed: 12/26/2022] Open

Abstract

Motivation

Multiple sequence alignment (MSA) is commonly used to analyze sets of homologous protein or DNA sequences. This has lead to the development of many methods and packages for MSA over the past 30 years. Being able to compare different methods has been problematic and has relied on gold standard benchmark datasets of ‘true’ alignments or on MSA simulations. A number of protein benchmark datasets have been produced which rely on a combination of manual alignment and/or automated superposition of protein structures. These are either restricted to very small MSAs with few sequences or require manual alignment which can be subjective. In both cases, it remains very difficult to properly test MSAs of more than a few dozen sequences. PREFAB and HomFam both rely on using a small subset of sequences of known structure and do not fairly test the quality of a full MSA.

Results

In this paper we describe QuanTest, a fully automated and highly scalable test system for protein MSAs which is based on using secondary structure prediction accuracy (SSPA) to measure alignment quality. This is based on the assumption that better MSAs will give more accurate secondary structure predictions when we include sequences of known structure. SSPA measures the quality of an entire alignment however, not just the accuracy on a handful of selected sequences. It can be scaled to alignments of any size but here we demonstrate its use on alignments of either 200 or 1000 sequences. This allows the testing of slow accurate programs as well as faster, less accurate ones. We show that the scores from QuanTest are highly correlated with existing benchmark scores. We also validate the method by comparing a wide range of MSA alignment options and by including different levels of mis-alignment into MSA, and examining the effects on the scores.

Availability and Implementation

QuanTest is available from http://www.bioinf.ucd.ie/download/QuanTest.tgz

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

299

Barrat-Charlaix P, Weigt M. [From sequence variability to structural and functional prediction: modeling of homologous protein families]. Biol Aujourdhui 2018;211:239-244. [PMID: 29412135 DOI: 10.1051/jbio/2017030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Indexed: 06/08/2023]

300

Cortés Cabrera Á, Sánchez-Murcia PA, Gago F. Making sense of the past: hyperstability of ancestral thioredoxins explained by free energy simulations. Phys Chem Chem Phys 2018;19:23239-23246. [PMID: 28825743 DOI: 10.1039/c7cp03659k] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]