1
|
Turina P, Petrosino M, Enriquez Sandoval CA, Novak L, Pasquo A, Alexov E, Alladin MA, Ascher DB, Babbi G, Bakolitsa C, Casadio R, Cheng J, Fariselli P, Folkman L, Kamandula A, Katsonis P, Li M, Li D, Lichtarge O, Mahmud S, Martelli PL, Pal D, Panday SK, Pires DEV, Portelli S, Pucci F, Rodrigues CHM, Rooman M, Savojardo C, Schwersensky M, Shen Y, Strokach AV, Sun Y, Woo J, Radivojac P, Brenner SE, Chiaraluce R, Consalvi V, Capriotti E. Assessing the predicted impact of single amino acid substitutions in MAPK proteins for CAGI6 challenges. Hum Genet 2025; 144:265-280. [PMID: 39976676 PMCID: PMC11975483 DOI: 10.1007/s00439-024-02724-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 12/27/2024] [Indexed: 03/05/2025]
Abstract
New thermodynamic and functional studies have been recently conducted to evaluate the impact of amino acid substitutions on the Mitogen Activated Protein Kinases 1 and 3 (MAPK1/3). The Critical Assessment of Genome Interpretation (CAGI) data provider, at Sapienza University of Rome, measured the unfolding free energy and the enzymatic activity of a set of variants (MAPK challenge dataset). Thermodynamic measurements for the denaturant-induced equilibrium unfolding of the phosphorylated and unphosphorylated forms of the MAPKs were obtained by monitoring the far-UV circular dichroism and intrinsic fluorescence changes as a function of denaturant concentration. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant ( Δ Δ G H 2 O ). The enzymatic activity of the phosphorylated MAPKs variants was also measured using Chelation-Enhanced Fluorescence to monitor the phosphorylation of a peptide substrate. The MAPK challenge dataset, composed of a total of 23 single amino acid substitutions (11 and 12 for MAPK1 and MAPK3, respectively), was used to assess the effectiveness of the computational methods in predicting the Δ Δ G H 2 O values, associated with the variants, and categorize them as destabilizing and not destabilizing. The data on the enzymatic activity of the MAPKs mutants were used to assess the performance of the methods for predicting the functional impact of the variants. For the sixth edition of CAGI, thirteen independent research groups from four continents (Asia, Australia, Europe and North America) submitted > 80 sets of predictions, obtained from different approaches. In this manuscript, we summarized the results of our assessment to highlight the possible limitations of the available algorithms.
Collapse
Affiliation(s)
- Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy
| | - Maria Petrosino
- Department of Biochemical Sciences "A. Rossi Fanelli", Sapienza University of Roma, 00185, Rome, Italy
| | | | - Leonore Novak
- Department of Biochemical Sciences "A. Rossi Fanelli", Sapienza University of Roma, 00185, Rome, Italy
| | - Alessandra Pasquo
- Diagnostics and Metrology Laboratory FSN-TECFIS-DIM, ENEA CR Frascati, 00044, Frascati, Italy
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC, 29634, USA
| | - Muttaqi Ahmad Alladin
- Department of Computational and Data Sciences, Indian Institute of Science, Bangaluru, 560012, India
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
- School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, University of Queensland, St Lucia, QLD, 4072, Australia
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy
| | - Constantina Bakolitsa
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, 10126, Torino, Italy
| | - Lukas Folkman
- Institute for Integrated and Intelligent Systems, Griffith University, Southport, QLD, 4222, Australia
| | - Akash Kamandula
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, 02115, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Minghui Li
- School of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow University, Suzhou, 215123, Jiangsu, China
| | - Dong Li
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050, Brussels, Belgium
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Sajid Mahmud
- Department of Electrical Engineering and Computer Science, NextGen Precision Health Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy
| | - Debnath Pal
- Department of Computational and Data Sciences, Indian Institute of Science, Bangaluru, 560012, India
| | | | - Douglas E V Pires
- School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, 3053, Australia
| | - Stephanie Portelli
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
- School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, University of Queensland, St Lucia, QLD, 4072, Australia
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050, Brussels, Belgium
| | - Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050, Brussels, Belgium
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy
| | - Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050, Brussels, Belgium
| | - Yang Shen
- Department of Electrical and Computer Engineering Texas, A&M University, College Station, TX, 77843, USA
| | - Alexey V Strokach
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering Texas, A&M University, College Station, TX, 77843, USA
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, 02115, USA
| | - Steven E Brenner
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
- Biophysics Graduate Group, University of California, Berkeley, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Roberta Chiaraluce
- Department of Biochemical Sciences "A. Rossi Fanelli", Sapienza University of Roma, 00185, Rome, Italy.
| | - Valerio Consalvi
- Department of Biochemical Sciences "A. Rossi Fanelli", Sapienza University of Roma, 00185, Rome, Italy.
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy.
- Computational Genomics Platform, IRCCS University Hospital of Bologna, 40138, Bologna, Italy.
| |
Collapse
|
2
|
Turina P, Dal Cortivo G, Enriquez Sandoval CA, Alexov E, Ascher DB, Babbi G, Bakolitsa C, Casadio R, Fariselli P, Folkman L, Kamandula A, Katsonis P, Li D, Lichtarge O, Martelli PL, Panday SK, Pires DEV, Portelli S, Pucci F, Rodrigues CHM, Rooman M, Savojardo C, Schwersensky M, Shen Y, Strokach AV, Sun Y, Woo J, Radivojac P, Brenner SE, Dell'Orco D, Capriotti E. Assessing the predicted impact of single amino acid substitutions in calmodulin for CAGI6 challenges. Hum Genet 2025; 144:113-125. [PMID: 39714488 PMCID: PMC11975486 DOI: 10.1007/s00439-024-02720-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Accepted: 12/02/2024] [Indexed: 12/24/2024]
Abstract
Recent thermodynamic and functional studies have been conducted to evaluate the impact of amino acid substitutions on Calmodulin (CaM). The Critical Assessment of Genome Interpretation (CAGI) data provider at University of Verona (Italy) measured the melting temperature (Tm) and the percentage of unfolding (%unfold) of a set of CaM variants (CaM challenge dataset). Thermodynamic measurements for the equilibrium unfolding of CaM were obtained by monitoring far-UV Circular Dichroism as a function of temperature. These measurements were used to determine the Tm and the percentage of protein remaining unfolded at the highest temperature. The CaM challenge dataset, comprising a total of 15 single amino acid substitutions, was used to evaluate the effectiveness of computational methods in predicting the Tm and unfolding percentages associated with the variants, and categorizing them as destabilizing or not. For the sixth edition of CAGI, nine independent research groups from four continents (Asia, Australia, Europe, and North America) submitted over 52 sets of predictions, derived from various approaches. In this manuscript, we summarize the results of our assessment to highlight the potential limitations of current algorithms and provide insights into the future development of more accurate prediction tools. By evaluating the thermodynamic stability of CaM variants, this study aims to enhance our understanding of the relationship between amino acid substitutions and protein stability, ultimately contributing to more accurate predictions of the effects of genetic variants.
Collapse
Affiliation(s)
- Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy
| | - Giuditta Dal Cortivo
- Department of Neurosciences, Biomedicine, and Movement Sciences, Section of Biological Chemistry, University of Verona, 37134, Verona, Italy
| | | | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC, 29634, USA
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
- School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, University of Queensland, St Lucia, QLD, 4072, Australia
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy
| | - Constantina Bakolitsa
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Lukas Folkman
- Institute for Integrated and Intelligent Systems, Griffith University, Southport, QLD, Australia
| | - Akash Kamandula
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, 02115, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Dong Li
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy
| | | | - Douglas E V Pires
- School of Computing and Information Systems, The University of Melbourne, Melbourne, VIC, 3053, Australia
| | - Stephanie Portelli
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
- School of Chemistry and Molecular Biosciences, Australian Centre for Ecogenomics, University of Queensland, St Lucia, QLD, 4072, Australia
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium
| | - Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, 3004, Australia
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy
| | - Martin Schwersensky
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 50 Roosevelt Ave, 1050, Brussels, Belgium
| | - Yang Shen
- Department of Electrical and Computer Engineering Texas, A&M University, College Station, TX, USA
| | - Alexey V Strokach
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering Texas, A&M University, College Station, TX, USA
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, 02115, USA
| | - Steven E Brenner
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA
- Biophysics Graduate Group, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Daniele Dell'Orco
- Department of Neurosciences, Biomedicine, and Movement Sciences, Section of Biological Chemistry, University of Verona, 37134, Verona, Italy.
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, 40126, Bologna, Italy.
- Computational Genomics Platform, IRCCS University Hospital of Bologna, 40138, Bologna, Italy.
| |
Collapse
|
3
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, et alJain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
4
|
Kurniawan J, Ishida T. Comparing Supervised Learning and Rigorous Approach for Predicting Protein Stability upon Point Mutations in Difficult Targets. J Chem Inf Model 2023; 63:6778-6788. [PMID: 37897811 DOI: 10.1021/acs.jcim.3c00750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/30/2023]
Abstract
Accurate prediction of protein stability upon a point mutation has important applications in drug discovery and personalized medicine. It remains a challenging issue in computational biology. Existing computational prediction methods, which range from mechanistic to supervised learning approaches, have experienced limited progress over the last few decades. This stagnation is largely due to their heavy reliance on both the quantity and quality of the training data. This is evident in recent state-of-the-art methods that continue to yield substantial errors on two challenging blind test sets: frataxin and p53, with average root-mean-square errors exceeding 3 and 1.5 kcal/mol, respectively, which is still above the theoretical 1 kcal/mol prediction barrier. Rigorous approaches, on the other hand, offer greater potential for accuracy without relying on training data but are computationally demanding and require both wild-type and mutant structure information. Although they showed high accuracy for conserving mutations, their performance is still limited for charge-changing mutation cases. This might be due to the lack of an available mutant structure, often represented by a simplified capped peptide. The recent advances in protein structure prediction methods now make it possible to obtain structures comparable to experimental ones, including complete mutant structure information. In this work, we compare the performance of supervised learning-based methods and rigorous approaches for predicting protein stability on point mutations in difficult targets: frataxin and p53. The rigorous alchemical method significantly surpasses state-of-the-art techniques in terms of both the root-mean-squared error and Pearson correlation coefficient in these two challenging blind test sets. Additionally, we propose an improved alchemical method that employs the pmx double-system/single-box approach to accurately predict the folding free energy change upon both conserving and charge-changing mutations. The enhanced protocol can accurately predict both types of mutations, thereby outperforming existing state-of-the-art methods in overall performance.
Collapse
Affiliation(s)
- Jason Kurniawan
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo 152-8550, Japan
| |
Collapse
|
5
|
David A, Sternberg MJE. Protein structure-based evaluation of missense variants: Resources, challenges and future directions. Curr Opin Struct Biol 2023; 80:102600. [PMID: 37126977 DOI: 10.1016/j.sbi.2023.102600] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 03/30/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ΔΔG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ΔΔG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning.
Collapse
Affiliation(s)
- Alessia David
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK
| |
Collapse
|
6
|
Oliveira MLG, Castelli EC, Veiga‐Castelli LC, Pereira ALE, Marcorin L, Carratto TMT, Souza AS, Andrade HS, Simões AL, Donadi EA, Courtin D, Sabbagh A, Giuliatti S, Mendes‐Junior CT. Genetic diversity of the
LILRB1
and
LILRB2
coding regions in an admixed Brazilian population sample. HLA 2022; 100:325-348. [DOI: 10.1111/tan.14725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 06/02/2022] [Accepted: 06/24/2022] [Indexed: 11/27/2022]
Affiliation(s)
| | - Erick C. Castelli
- Pathology Department, School of Medicine São Paulo State University (UNESP) Botucatu State of São Paulo Brazil
- Molecular Genetics and Bioinformatics Laboratory, School of Medicine São Paulo State University (UNESP) Botucatu State of São Paulo Brazil
| | - Luciana C. Veiga‐Castelli
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Alison Luis E. Pereira
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Letícia Marcorin
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Thássia M. T. Carratto
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Andreia S. Souza
- Molecular Genetics and Bioinformatics Laboratory, School of Medicine São Paulo State University (UNESP) Botucatu State of São Paulo Brazil
| | - Heloisa S. Andrade
- Molecular Genetics and Bioinformatics Laboratory, School of Medicine São Paulo State University (UNESP) Botucatu State of São Paulo Brazil
| | - Aguinaldo L. Simões
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Eduardo A. Donadi
- Departamento de Clínica Médica, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | | | | | - Silvana Giuliatti
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| | - Celso Teixeira Mendes‐Junior
- Departamento de Química, Laboratório de Pesquisas Forenses e Genômicas, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto Universidade de São Paulo Ribeirão Preto SP Brazil
| |
Collapse
|
7
|
Casadio R, Savojardo C, Fariselli P, Capriotti E, Martelli PL. Turning Failures into Applications: The Problem of Protein ΔΔG Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2449:169-185. [PMID: 35507262 DOI: 10.1007/978-1-0716-2095-3_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
After nearly two decades of research in the field of computational methods based on machine learning and knowledge-based potentials for ΔG and ΔΔG prediction upon variations, we now realize that all the approaches are poorly performing when tested on specific cases and that there is large space for improvement. Why this is so? Is it wrong the underlying assumption that experimental protein thermodynamics in solution reflects the thermodynamics of a single protein? Both machine learning and knowledge-based computational methods are rigorous and we know the solid theory behind. We are now in a critical situation, which suggests that predictions of protein instability upon variation should be considered with care. In the following, we will show how to cope with the problem of understanding which protein positions may be of interest for biotechnological and biomedical purposes. By applying a consensus procedure, we indicate possible strategies for the result interpretation.
Collapse
Affiliation(s)
- Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Emidio Capriotti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
8
|
A Deep-Learning Sequence-Based Method to Predict Protein Stability Changes Upon Genetic Variations. Genes (Basel) 2021; 12:genes12060911. [PMID: 34204764 PMCID: PMC8231498 DOI: 10.3390/genes12060911] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 06/08/2021] [Accepted: 06/09/2021] [Indexed: 01/17/2023] Open
Abstract
Several studies have linked disruptions of protein stability and its normal functions to disease. Therefore, during the last few decades, many tools have been developed to predict the free energy changes upon protein residue variations. Most of these methods require both sequence and structure information to obtain reliable predictions. However, the lower number of protein structures available with respect to their sequences, due to experimental issues, drastically limits the application of these tools. In addition, current methodologies ignore the antisymmetric property characterizing the thermodynamics of the protein stability: a variation from wild-type to a mutated form of the protein structure (XW→XM) and its reverse process (XM→XW) must have opposite values of the free energy difference (ΔΔGWM=−ΔΔGMW). Here we propose ACDC-NN-Seq, a deep neural network system that exploits the sequence information and is able to incorporate into its architecture the antisymmetry property. To our knowledge, this is the first convolutional neural network to predict protein stability changes relying solely on the protein sequence. We show that ACDC-NN-Seq compares favorably with the existing sequence-based methods.
Collapse
|
9
|
Petrosino M, Novak L, Pasquo A, Chiaraluce R, Turina P, Capriotti E, Consalvi V. Analysis and Interpretation of the Impact of Missense Variants in Cancer. Int J Mol Sci 2021; 22:ijms22115416. [PMID: 34063805 PMCID: PMC8196604 DOI: 10.3390/ijms22115416] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 05/03/2021] [Accepted: 05/17/2021] [Indexed: 01/10/2023] Open
Abstract
Large scale genome sequencing allowed the identification of a massive number of genetic variations, whose impact on human health is still unknown. In this review we analyze, by an in silico-based strategy, the impact of missense variants on cancer-related genes, whose effect on protein stability and function was experimentally determined. We collected a set of 164 variants from 11 proteins to analyze the impact of missense mutations at structural and functional levels, and to assess the performance of state-of-the-art methods (FoldX and Meta-SNP) for predicting protein stability change and pathogenicity. The result of our analysis shows that a combination of experimental data on protein stability and in silico pathogenicity predictions allowed the identification of a subset of variants with a high probability of having a deleterious phenotypic effect, as confirmed by the significant enrichment of the subset in variants annotated in the COSMIC database as putative cancer-driving variants. Our analysis suggests that the integration of experimental and computational approaches may contribute to evaluate the risk for complex disorders and develop more effective treatment strategies.
Collapse
Affiliation(s)
- Maria Petrosino
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Leonore Novak
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Alessandra Pasquo
- ENEA CR Frascati, Diagnostics and Metrology Laboratory FSN-TECFIS-DIM, 00044 Frascati, Italy;
| | - Roberta Chiaraluce
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
| | - Paola Turina
- Dipartimento di Farmacia e Biotecnologie (FaBiT), University of Bologna, 40126 Bologna, Italy;
| | - Emidio Capriotti
- Dipartimento di Farmacia e Biotecnologie (FaBiT), University of Bologna, 40126 Bologna, Italy;
- Correspondence: (E.C.); (V.C.)
| | - Valerio Consalvi
- Dipartimento Scienze Biochimiche “A. Rossi Fanelli”, Sapienza University of Rome, 00185 Roma, Italy; (M.P.); (L.N.); (R.C.)
- Correspondence: (E.C.); (V.C.)
| |
Collapse
|
10
|
Strokach A, Lu TY, Kim PM. ELASPIC2 (EL2): Combining Contextualized Language Models and Graph Neural Networks to Predict Effects of Mutations. J Mol Biol 2021; 433:166810. [PMID: 33450251 DOI: 10.1016/j.jmb.2021.166810] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 12/19/2020] [Accepted: 01/03/2021] [Indexed: 12/21/2022]
Abstract
The ELASPIC web server allows users to evaluate the effect of mutations on protein folding and protein-protein interaction on a proteome-wide scale. It uses homology models of proteins and protein-protein interactions, which have been precalculated for several proteomes, and machine learning models, which integrate structural information with sequence conservation scores, in order to make its predictions. Since the original publication of the ELASPIC web server, several advances have motivated a revisiting of the problem of mutation effect prediction. First, progress in neural network architectures and self-supervised pre-trained has resulted in models which provide more informative embeddings of protein sequence and structure than those used by the original version of ELASPIC. Second, the amount of training data has increased several-fold, largely driven by advances in deep mutation scanning and other multiplexed assays of variant effect. Here, we describe two machine learning models which leverage the recent advances in order to achieve superior accuracy in predicting the effect of mutation on protein folding and protein-protein interaction. The models incorporate features generated using pre-trained transformer- and graph convolution-based neural networks, and are trained to optimize a ranking objective function, which permits the use of heterogeneous training data. The outputs from the new models have been incorporated into the ELASPIC web server, available at http://elaspic.kimlab.org.
Collapse
Affiliation(s)
- Alexey Strokach
- Department of Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Tian Yu Lu
- Department of Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada
| | - Philip M Kim
- Department of Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada; Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 3E1, Canada.
| |
Collapse
|
11
|
SAAFEC-SEQ: A Sequence-Based Method for Predicting the Effect of Single Point Mutations on Protein Thermodynamic Stability. Int J Mol Sci 2021; 22:ijms22020606. [PMID: 33435356 PMCID: PMC7827184 DOI: 10.3390/ijms22020606] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 12/23/2020] [Accepted: 01/06/2021] [Indexed: 01/04/2023] Open
Abstract
Modeling the effect of mutations on protein thermodynamics stability is useful for protein engineering and understanding molecular mechanisms of disease-causing variants. Here, we report a new development of the SAAFEC method, the SAAFEC-SEQ, which is a gradient boosting decision tree machine learning method to predict the change of the folding free energy caused by amino acid substitutions. The method does not require the 3D structure of the corresponding protein, but only its sequence and, thus, can be applied on genome-scale investigations where structural information is very sparse. SAAFEC-SEQ uses physicochemical properties, sequence features, and evolutionary information features to make the predictions. It is shown to consistently outperform all existing state-of-the-art sequence-based methods in both the Pearson correlation coefficient and root-mean-squared-error parameters as benchmarked on several independent datasets. The SAAFEC-SEQ has been implemented into a web server and is available as stand-alone code that can be downloaded and embedded into other researchers’ code.
Collapse
|
12
|
Sanavia T, Birolo G, Montanucci L, Turina P, Capriotti E, Fariselli P. Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J 2020; 18:1968-1979. [PMID: 32774791 PMCID: PMC7397395 DOI: 10.1016/j.csbj.2020.07.011] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 07/10/2020] [Accepted: 07/14/2020] [Indexed: 12/13/2022] Open
Abstract
Protein stability predictions are becoming essential in medicine to develop novel immunotherapeutic agents and for drug discovery. Despite the large number of computational approaches for predicting the protein stability upon mutation, there are still critical unsolved problems: 1) the limited number of thermodynamic measurements for proteins provided by current databases; 2) the large intrinsic variability of ΔΔG values due to different experimental conditions; 3) biases in the development of predictive methods caused by ignoring the anti-symmetry of ΔΔG values between mutant and native protein forms; 4) over-optimistic prediction performance, due to sequence similarity between proteins used in training and test datasets. Here, we review these issues, highlighting new challenges required to improve current tools and to achieve more reliable predictions. In addition, we provide a perspective of how these methods will be beneficial for designing novel precision medicine approaches for several genetic disorders caused by mutations, such as cancer and neurodegenerative diseases.
Collapse
Affiliation(s)
- Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Ludovica Montanucci
- Department of Comparative Biomedicine and Food Science (BCA), University of Padova, Viale dell'Università 16, 35020 Legnaro, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| |
Collapse
|
13
|
Marabotti A, Scafuri B, Facchiano A. Predicting the stability of mutant proteins by computational approaches: an overview. Brief Bioinform 2020; 22:5850907. [PMID: 32496523 DOI: 10.1093/bib/bbaa074] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 04/07/2020] [Accepted: 04/10/2020] [Indexed: 01/06/2023] Open
Abstract
A very large number of computational methods to predict the change in thermodynamic stability of proteins due to mutations have been developed during the last 30 years, and many different web servers are currently available. Nevertheless, most of them suffer from severe drawbacks that decrease their general reliability and, consequently, their applicability to different goals such as protein engineering or the predictions of the effects of mutations in genetic diseases. In this review, we have summarized all the main approaches used to develop these tools, with a survey of the web servers currently available. Moreover, we have also reviewed the different assessments made during the years, in order to allow the reader to check directly the different performances of these tools, to select the one that best fits his/her needs, and to help naïve users in finding the best option for their needs.
Collapse
|
14
|
Savojardo C, Martelli PL, Casadio R, Fariselli P. On the critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Brief Bioinform 2019; 22:601-603. [PMID: 31885042 DOI: 10.1093/bib/bbz168] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 11/26/2019] [Accepted: 12/05/2019] [Indexed: 01/17/2023] Open
Abstract
A review, recently published in this journal by Fang (2019), showed that methods trained for the prediction of protein stability changes upon mutation have a very critical bias: they neglect that a protein variation (A- > B) and its reverse (B- > A) must have the opposite value of the free energy difference (ΔΔGAB = - ΔΔGBA). In this letter, we complement the Fang's paper presenting a more general view of the problem. In particular, a machine learning-based method, published in 2015 (INPS), addressed the bias issue directly. We include the analysis of the missing method, showing that INPS is nearly insensitive to the addressed problem.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| |
Collapse
|
15
|
Andreoletti G, Pal LR, Moult J, Brenner SE. Reports from the fifth edition of CAGI: The Critical Assessment of Genome Interpretation. Hum Mutat 2019; 40:1197-1201. [PMID: 31334884 PMCID: PMC7329230 DOI: 10.1002/humu.23876] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 07/19/2019] [Indexed: 12/20/2022]
Abstract
Interpretation of genomic variation plays an essential role in the analysis of cancer and monogenic disease, and increasingly also in complex trait disease, with applications ranging from basic research to clinical decisions. Many computational impact prediction methods have been developed, yet the field lacks a clear consensus on their appropriate use and interpretation. The Critical Assessment of Genome Interpretation (CAGI, /'kā-jē/) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. CAGI participants are provided genetic variants and make blind predictions of resulting phenotype. Independent assessors evaluate the predictions by comparing with experimental and clinical data. CAGI has completed five editions with the goals of establishing the state of art in genome interpretation and of encouraging new methodological developments. This special issue (https://onlinelibrary.wiley.com/toc/10981004/2019/40/9) comprises reports from CAGI, focusing on the fifth edition that culminated in a conference that took place 5 to 7 July 2018. CAGI5 was comprised of 14 challenges and engaged hundreds of participants from a dozen countries. This edition had a notable increase in splicing and expression regulatory variant challenges, while also continuing challenges on clinical genomics, as well as complex disease datasets and missense variants in diseases ranging from cancer to Pompe disease to schizophrenia. Full information about CAGI is at https://genomeinterpretation.org.
Collapse
Affiliation(s)
- Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Lipika R. Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
16
|
Katsonis P, Lichtarge O. CAGI5: Objective performance assessments of predictions based on the Evolutionary Action equation. Hum Mutat 2019; 40:1436-1454. [PMID: 31317604 PMCID: PMC6900054 DOI: 10.1002/humu.23873] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 07/02/2019] [Accepted: 07/11/2019] [Indexed: 12/14/2022]
Abstract
Many computational approaches estimate the effect of coding variants, but their predictions often disagree with each other. These contradictions confound users and raise questions regarding reliability. Performance assessments can indicate the expected accuracy for each method and highlight advantages and limitations. The Critical Assessment of Genome Interpretation (CAGI) community aims to organize objective and systematic assessments: They challenge predictors on unpublished experimental and clinical data and assign independent assessors to evaluate the submissions. We participated in CAGI experiments as predictors, using the Evolutionary Action (EA) method to estimate the fitness effect of coding mutations. EA is untrained, uses homology information, and relies on a formal equation: The fitness effect equals the functional sensitivity to residue changes multiplied by the magnitude of the substitution. In previous CAGI experiments (between 2011 and 2016), our submissions aimed to predict the protein activity of single mutants. In 2018 (CAGI5), we also submitted predictions regarding clinical associations, folding stability, and matching genomic data with phenotype. For all these diverse challenges, we used EA to predict the fitness effect of variants, adjusted to specifically address each question. Our submissions had consistently good performance, suggesting that EA predicts reliably the effects of genetic variants.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Department of Biochemistry & Molecular Biology, Baylor College of Medicine, Houston, Texas.,Department of Pharmacology, Baylor College of Medicine, Houston, Texas.,Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas
| |
Collapse
|
17
|
Savojardo C, Babbi G, Bovo S, Capriotti E, Martelli PL, Casadio R. Are machine learning based methods suited to address complex biological problems? Lessons from CAGI-5 challenges. Hum Mutat 2019; 40:1455-1462. [PMID: 31066146 DOI: 10.1002/humu.23784] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 04/02/2019] [Accepted: 05/04/2019] [Indexed: 11/06/2022]
Abstract
In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI-5). In particular, we analyze prediction performances of our tools on five CAGI-5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Samuele Bovo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Emidio Capriotti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| |
Collapse
|