1
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
2
|
Manfredi M, Savojardo C, Martelli PL, Casadio R. CoCoNat: A Deep Learning-Based Tool for the Prediction of Coiled-coil Domains in Protein Sequences. Bio Protoc 2024; 14:e4935. [PMID: 38405078 PMCID: PMC10883893 DOI: 10.21769/bioprotoc.4935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 12/28/2023] [Accepted: 01/05/2024] [Indexed: 02/27/2024] Open
Abstract
Coiled-coil domains (CCDs) are structural motifs observed in proteins in all organisms that perform several crucial functions. The computational identification of CCD segments over a protein sequence is of great importance for its functional characterization. This task can essentially be divided into three separate steps: the detection of segment boundaries, the annotation of the heptad repeat pattern along the segment, and the classification of its oligomerization state. Several methods have been proposed over the years addressing one or more of these predictive steps. In this protocol, we illustrate how to make use of CoCoNat, a novel approach based on protein language models, to characterize CCDs. CoCoNat is, at its release (August 2023), the state of the art for CCD detection. The web server allows users to submit input protein sequences and visualize the predicted domains after a few minutes. Optionally, precomputed segments can be provided to the model, which will predict the oligomerization state for each of them. CoCoNat can be easily integrated into biological pipelines by downloading the standalone version, which provides a single executable script to produce the output. Key features • Web server for the prediction of coiled-coil segments from a protein sequence. • Three different predictions from a single tool (segment position, heptad repeat annotation, oligomerization state). • Possibility to visualize the results online or to download the predictions in different formats for further processing. • Easy integration in automated pipelines with the local version of the tool.
Collapse
Affiliation(s)
- Matteo Manfredi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
3
|
Bertolini E, Babbi G, Savojardo C, Martelli PL, Casadio R. MultifacetedProtDB: a database of human proteins with multiple functions. Nucleic Acids Res 2024; 52:D494-D501. [PMID: 37791887 PMCID: PMC10767882 DOI: 10.1093/nar/gkad783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/29/2023] [Accepted: 09/15/2023] [Indexed: 10/05/2023] Open
Abstract
MultifacetedProtDB is a database of multifunctional human proteins deriving information from other databases, including UniProt, GeneCards, Human Protein Atlas (HPA), Human Phenotype Ontology (HPO) and MONDO. It collects under the label 'multifaceted' multitasking proteins addressed in literature as pleiotropic, multidomain, promiscuous (in relation to enzymes catalysing multiple substrates) and moonlighting (with two or more molecular functions), and difficult to be retrieved with a direct search in existing non-specific databases. The study of multifunctional proteins is an expanding research area aiming to elucidate the complexities of biological processes, particularly in humans, where multifunctional proteins play roles in various processes, including signal transduction, metabolism, gene regulation and cellular communication, and are often involved in disease insurgence and progression. The webserver allows searching by gene, protein and any associated structural and functional information, like available structures from PDB, structural models and interactors, using multiple filters. Protein entries are supplemented with comprehensive annotations including EC number, GO terms (biological pathways, molecular functions, and cellular components), pathways from Reactome, subcellular localization from UniProt, tissue and cell type expression from HPA, and associated diseases following MONDO, Orphanet and OMIM classification. MultiFacetedProtDB is freely available as a web server at: https://multifacetedprotdb.biocomp.unibo.it/.
Collapse
Affiliation(s)
- Elisa Bertolini
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Giulia Babbi
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Italy
| |
Collapse
|
4
|
Casadio R, Babu MM. Editorial overview: Sequences and topology. Curr Opin Struct Biol 2023; 82:102677. [PMID: 37595511 DOI: 10.1016/j.sbi.2023.102677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/20/2023]
Affiliation(s)
- Rita Casadio
- Biocomputing Group, University of Bologna, Italy.
| | - M Madan Babu
- Center of Excellence for Data-Driven Discovery, Department of Structural Biology, St Jude Children's Research Hospital, Memphis, TN, USA.
| |
Collapse
|
5
|
Schweke H, Xu Q, Tauriello G, Pantolini L, Schwede T, Cazals F, Lhéritier A, Fernandez-Recio J, Rodríguez-Lumbreras LÁ, Schueler-Furman O, Varga JK, Jiménez-García B, Réau MF, Bonvin A, Savojardo C, Martelli PL, Casadio R, Tubiana J, Wolfson H, Oliva R, Barradas-Bautista D, Ricciardelli T, Cavallo L, Venclovas Č, Olechnovič K, Guerois R, Andreani J, Martin J, Wang X, Kihara D, Marchand A, Correia B, Zou X, Dey S, Dunbrack R, Levy E, Wodak S. Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study. Proteomics 2023; 23:e2200323. [PMID: 37365936 PMCID: PMC10937251 DOI: 10.1002/pmic.202200323] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Revised: 05/11/2023] [Accepted: 05/11/2023] [Indexed: 06/28/2023]
Abstract
Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Julia K. Varga
- Hebrew University of Jerusalem Institute for Medical Research Israel-Canada
| | | | | | | | | | | | | | - Jérôme Tubiana
- Tel Aviv University Blavatnik School of Computer Science
| | - Haim Wolfson
- Tel Aviv University Blavatnik School of Computer Science
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Xiaoqin Zou
- Dalton Cardiovascular Research Center, Institute for Data Science and Informatics, University of Missouri
| | | | | | | | | |
Collapse
|
6
|
Stenton SL, O’Leary M, Lemire G, VanNoy GE, DiTroia S, Ganesh VS, Groopman E, O’Heir E, Mangilog B, Osei-Owusu I, Pais LS, Serrano J, Singer-Berk M, Weisburd B, Wilson M, Austin-Tse C, Abdelhakim M, Althagafi A, Babbi G, Bellazzi R, Bovo S, Carta MG, Casadio R, Coenen PJ, De Paoli F, Floris M, Gajapathy M, Hoehndorf R, Jacobsen JO, Joseph T, Kamandula A, Katsonis P, Kint C, Lichtarge O, Limongelli I, Lu Y, Magni P, Mamidi TKK, Martelli PL, Mulargia M, Nicora G, Nykamp K, Pejaver V, Peng Y, Pham THC, Podda MS, Rao A, Rizzo E, Saipradeep VG, Savojardo C, Schols P, Shen Y, Sivadasan N, Smedley D, Soru D, Srinivasan R, Sun Y, Sunderam U, Tan W, Tiwari N, Wang X, Wang Y, Williams A, Worthey EA, Yin R, You Y, Zeiberg D, Zucca S, Bakolitsa C, Brenner SE, Fullerton SM, Radivojac P, Rehm HL, O’Donnell-Luria A. Critical assessment of variant prioritization methods for rare disease diagnosis within the Rare Genomes Project. medRxiv 2023:2023.08.02.23293212. [PMID: 37577678 PMCID: PMC10418577 DOI: 10.1101/2023.08.02.23293212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Background A major obstacle faced by rare disease families is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years, and causal variants are identified in under 50%. The Rare Genomes Project (RGP) is a direct-to-participant research study on the utility of genome sequencing (GS) for diagnosis and gene discovery. Families are consented for sharing of sequence and phenotype data with researchers, allowing development of a Critical Assessment of Genome Interpretation (CAGI) community challenge, placing variant prioritization models head-to-head in a real-life clinical diagnostic setting. Methods Predictors were provided a dataset of phenotype terms and variant calls from GS of 175 RGP individuals (65 families), including 35 solved training set families, with causal variants specified, and 30 test set families (14 solved, 16 unsolved). The challenge tasked teams with identifying the causal variants in as many test set families as possible. Ranked variant predictions were submitted with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on rank position of true positive causal variants and maximum F-measure, based on precision and recall of causal variants across EPCR thresholds. Results Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performing teams recalled the causal variants in up to 13 of 14 solved families by prioritizing high quality variant calls that were rare, predicted deleterious, segregating correctly, and consistent with reported phenotype. In unsolved families, newly discovered diagnostic variants were returned to two families following confirmatory RNA sequencing, and two prioritized novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant, in an unsolved proband with phenotype overlap with asparagine synthetase deficiency. Conclusions By objective assessment of variant predictions, we provide insights into current state-of-the-art algorithms and platforms for genome sequencing analysis for rare disease diagnosis and explore areas for future optimization. Identification of diagnostic variants in unsolved families promotes synergy between researchers with clinical and computational expertise as a means of advancing the field of clinical genome interpretation.
Collapse
Affiliation(s)
- Sarah L. Stenton
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Melanie O’Leary
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Gabrielle Lemire
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Grace E. VanNoy
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Stephanie DiTroia
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vijay S. Ganesh
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Neurology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Emily Groopman
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Emily O’Heir
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Brian Mangilog
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ikeoluwa Osei-Owusu
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Lynn S. Pais
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jillian Serrano
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Moriel Singer-Berk
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ben Weisburd
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael Wilson
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christina Austin-Tse
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Marwa Abdelhakim
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Azza Althagafi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Computer Science Department, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Riccardo Bellazzi
- enGenome Srl, Pavia, Italy
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Samuele Bovo
- Department of Agricultural and Food Sciences, University of Bologna, Bologna, Italy
| | - Maria Giulia Carta
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | | | - Matteo Floris
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Manavalan Gajapathy
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Julius O.B. Jacobsen
- William Harvey Research Institute, Barts & The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
| | - Thomas Joseph
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Akash Kamandula
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Panagiotis Katsonis
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Olivier Lichtarge
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Structural and Computational Biology & Molecular Biophysics Program, Baylor College of Medicine, Houston, TX, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| | | | - Yulan Lu
- Center for molecular medicine, Pediatric Research Institute, Children’s Hospital of Fudan University, Shanghai, China
| | - Paolo Magni
- enGenome Srl, Pavia, Italy
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy
| | - Tarun Karthik Kumar Mamidi
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Marta Mulargia
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | | | | | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yisu Peng
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Thi Hong Cam Pham
- Anatomy and Surgical Training Department, University of Medicine and Pharmacy, Hue University, Vietnam
| | - Maurizio S. Podda
- Department of Biomedical Sciences, University of Sassari, Sassari, Italy
| | - Aditya Rao
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | | | - Vangala G Saipradeep
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
- Institute of Biosciences and Technology and Department of Translational Medical Sciences, College of Medicine, Texas A&M University, Houston, Texas, USA
| | - Naveen Sivadasan
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Damian Smedley
- William Harvey Research Institute, Barts & The London School of Medicine and Dentistry, Queen Mary University of London, Charterhouse Square, London, UK
| | | | - Rajgopal Srinivasan
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Uma Sunderam
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Naina Tiwari
- TCS Research, Tata Consultancy Services (TCS) Ltd, Deccan Park, Madhapur, Hyderabad, India
| | - Xiao Wang
- Center for molecular medicine, Pediatric Research Institute, Children’s Hospital of Fudan University, Shanghai, China
| | - Yaqiong Wang
- Center for molecular medicine, Pediatric Research Institute, Children’s Hospital of Fudan University, Shanghai, China
| | - Amanda Williams
- Department of Molecular & Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Elizabeth A. Worthey
- Center for Computational Genomics and Data Science, The University of Alabama at Birmingham, Birmingham, AL, USA
- Department of Genetics, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA
- Hugh Kaul Precision Medicine Institute, The University of Alabama at Birmingham, Birmingham, AL, USA
| | - Rujie Yin
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Yuning You
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
| | - Daniel Zeiberg
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | | | - Constantina Bakolitsa
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology and Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Stephanie M Fullerton
- Department of Bioethics & Humanities, University of Washington School of Medicine, Seattle, WA, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Heidi L. Rehm
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Anne O’Donnell-Luria
- Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
7
|
Aspromonte MC, Conte AD, Zhu S, Tan W, Shen Y, Zhang Y, Li Q, Wang MH, Babbi G, Bovo S, Martelli PL, Casadio R, Althagafi A, Toonsi S, Kulmanov M, Hoehndorf R, Katsonis P, Williams A, Lichtarge O, Xian S, Surento W, Pejaver V, Mooney SD, Sunderam U, Srinivasan R, Murgia A, Piovesan D, Tosatto SCE, Leonardi E. CAGI6 ID-Challenge: Assessment of phenotype and variant predictions in 415 children with Neurodevelopmental Disorders (NDDs). Res Sq 2023:rs.3.rs-3209168. [PMID: 37577579 PMCID: PMC10418555 DOI: 10.21203/rs.3.rs-3209168/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
In the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6), the Genetics of Neurodevelopmental Disorders Lab in Padua proposed a new ID-challenge to give the opportunity of developing computational methods for predicting patient's phenotype and the causal variants. Eight research teams and 30 models had access to the phenotype details and real genetic data, based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. In this study we evaluate the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and causal variants. Finally, we asked to develop a method to find new possible genetic causes for patients without a genetic diagnosis. As already done for the CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (causative, putative pathogenic and contributing factors) were provided. Considering the overall clinical manifestation of our cohort, we give out the variant data and phenotypic traits of the 150 patients from CAGI5 ID-Challenge as training and validation for the prediction methods development.
Collapse
Affiliation(s)
| | | | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | - Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843
| | | | - Qi Li
- CUHK Shenzhen Research Institute, Shenzhen
| | | | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Samuele Bovo
- Department of Agricultural and Food Sciences, University of Bologna
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna
| | - Azza Althagafi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Sumyyah Toonsi
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Maxat Kulmanov
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal 23
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030
| | - Su Xian
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Wesley Surento
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA 98195
| | - Uma Sunderam
- Innovation Labs, Tata Consultancy Services, Hyderabad
| | | | | | | | | | | |
Collapse
|
8
|
Savojardo C, Martelli PL, Casadio R. Finding functional motifs in protein sequences with deep learning and natural language models. Curr Opin Struct Biol 2023; 81:102641. [PMID: 37385080 DOI: 10.1016/j.sbi.2023.102641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 04/17/2023] [Accepted: 05/24/2023] [Indexed: 07/01/2023]
Abstract
Recently, prediction of structural/functional motifs in protein sequences takes advantage of powerful machine learning based approaches. Protein encoding adopts protein language models overpassing standard procedures. Different combinations of machine learning and encoding schemas are available for predicting different structural/functional motifs. Particularly interesting is the adoption of protein language models to encode proteins in addition to evolution information and physicochemical parameters. A thorough analysis of recent predictors developed for annotating transmembrane regions, sorting signals, lipidation and phosphorylation sites allows to investigate the state-of-the-art focusing on the relevance of protein language models for the different tasks. This highlights that more experimental data are necessary to exploit available powerful machine learning methods.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Dept. of Pharmacy and Biotechnology, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy.
| |
Collapse
|
9
|
Madeo G, Savojardo C, Manfredi M, Martelli PL, Casadio R. CoCoNat: a novel method based on deep learning for coiled-coil prediction. Bioinformatics 2023; 39:btad495. [PMID: 37540220 PMCID: PMC10425188 DOI: 10.1093/bioinformatics/btad495] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 07/31/2023] [Accepted: 08/03/2023] [Indexed: 08/05/2023] Open
Abstract
MOTIVATION Coiled-coil domains (CCD) are widespread in all organisms and perform several crucial functions. Given their relevance, the computational detection of CCD is very important for protein functional annotation. State-of-the-art prediction methods include the precise identification of CCD boundaries, the annotation of the typical heptad repeat pattern along the coiled-coil helices as well as the prediction of the oligomerization state. RESULTS In this article, we describe CoCoNat, a novel method for predicting coiled-coil helix boundaries, residue-level register annotation, and oligomerization state. Our method encodes sequences with the combination of two state-of-the-art protein language models and implements a three-step deep learning procedure concatenated with a Grammatical-Restrained Hidden Conditional Random Field for CCD identification and refinement. A final neural network predicts the oligomerization state. When tested on a blind test set routinely adopted, CoCoNat obtains a performance superior to the current state-of-the-art both for residue-level and segment-level CCD. CoCoNat significantly outperforms the most recent state-of-the-art methods on register annotation and prediction of oligomerization states. AVAILABILITY AND IMPLEMENTATION CoCoNat web server is available at https://coconat.biocomp.unibo.it. Standalone version is available on GitHub at https://github.com/BolognaBiocomp/coconat.
Collapse
Affiliation(s)
- Giovanni Madeo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Matteo Manfredi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| |
Collapse
|
10
|
Mathews DH, Casadio R, Sternberg MJE. Computational Resources for Molecular Biology 2023. J Mol Biol 2023:168160. [PMID: 37244569 DOI: 10.1016/j.jmb.2023.168160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Affiliation(s)
- David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester, Rochester, NY 14642, USA.
| | - Rita Casadio
- Biocomputing Group, FABIT-University of Bologna, Bologna I-40126, Italy.
| | - Michael J E Sternberg
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
11
|
Licata L, Via A, Turina P, Babbi G, Benevenuta S, Carta C, Casadio R, Cicconardi A, Facchiano A, Fariselli P, Giordano D, Isidori F, Marabotti A, Martelli PL, Pascarella S, Pinelli M, Pippucci T, Russo R, Savojardo C, Scafuri B, Valeriani L, Capriotti E. Resources and tools for rare disease variant interpretation. Front Mol Biosci 2023; 10:1169109. [PMID: 37234922 PMCID: PMC10206239 DOI: 10.3389/fmolb.2023.1169109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Accepted: 04/25/2023] [Indexed: 05/28/2023] Open
Abstract
Collectively, rare genetic disorders affect a substantial portion of the world's population. In most cases, those affected face difficulties in receiving a clinical diagnosis and genetic characterization. The understanding of the molecular mechanisms of these diseases and the development of therapeutic treatments for patients are also challenging. However, the application of recent advancements in genome sequencing/analysis technologies and computer-aided tools for predicting phenotype-genotype associations can bring significant benefits to this field. In this review, we highlight the most relevant online resources and computational tools for genome interpretation that can enhance the diagnosis, clinical management, and development of treatments for rare disorders. Our focus is on resources for interpreting single nucleotide variants. Additionally, we present use cases for interpreting genetic variants in clinical settings and review the limitations of these results and prediction tools. Finally, we have compiled a curated set of core resources and tools for analyzing rare disease genomes. Such resources and tools can be utilized to develop standardized protocols that will enhance the accuracy and effectiveness of rare disease diagnosis.
Collapse
Affiliation(s)
- Luana Licata
- Department of Biology, University of Rome Tor Vergata, Roma, Italy
| | - Allegra Via
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Claudio Carta
- National Centre for Rare Diseases, Istituto Superiore di Sanità, Roma, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Andrea Cicconardi
- Department of Physics, University of Genova, Genova, Italy
- Italiano di Tecnologia—IIT, Genova, Italy
| | - Angelo Facchiano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Deborah Giordano
- National Research Council, Institute of Food Science, Avellino, Italy
| | - Federica Isidori
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Anna Marabotti
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Stefano Pascarella
- Department of Biochemical Sciences “A. Rossi Fanelli”, University of Rome “La Sapienza”, Roma, Italy
| | - Michele Pinelli
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
| | - Tommaso Pippucci
- Medical Genetics Unit, IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Roberta Russo
- Department of Molecular Medicine and Medical Biotechnology, University of Naples Federico II, Napoli, Italy
- CEINGE Biotecnologie Avanzate Franco Salvatore, Napoli, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Bernardina Scafuri
- Department of Chemistry and Biology “A. Zambelli”, University of Salerno, Fisciano, SA, Italy
| | | | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
12
|
Martelossi J, Forni G, Iannello M, Savojardo C, Martelli PL, Casadio R, Mantovani B, Luchetti A, Rota-Stabelli O. Wood feeding and social living: Draft genome of the subterranean termite Reticulitermes lucifugus (Blattodea; Termitoidae). Insect Mol Biol 2023; 32:118-131. [PMID: 36366787 DOI: 10.1111/imb.12818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 11/08/2022] [Indexed: 06/16/2023]
Abstract
Termites (Insecta, Blattodea, Termitoidae) are a widespread and diverse group of eusocial insects known for their ability to digest wood matter. Herein, we report the draft genome of the subterranean termite Reticulitermes lucifugus, an economically important species and among the most studied taxa with respect to eusocial organization and mating system. The final assembly (~813 Mb) covered up to 88% of the estimated genome size and, in agreement with the Asexual Queen Succession Mating System, it was found completely homozygous. We predicted 16,349 highly supported gene models and 42% of repetitive DNA content. Transposable elements of R. lucifugus show similar evolutionary dynamics compared to that of other termites, with two main peaks of activity localized at 25% and 8% of Kimura divergence driven by DNA, LINE and SINE elements. Gene family turnover analyses identified multiple instances of gene duplication associated with R. lucifugus diversification, with significant lineage-specific gene family expansions related to development, perception and nutrient metabolism pathways. Finally, we analysed P450 and odourant receptor gene repertoires in detail, highlighting the large diversity and dynamical evolutionary history of these proteins in the R. lucifugus genome. This newly assembled genome will provide a valuable resource for further understanding the molecular basis of termites biology as well as for pest control.
Collapse
Affiliation(s)
- Jacopo Martelossi
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Giobbe Forni
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
- Dipartimento di Scienze Agrarie e Ambientali, Università degli Studi di Milano, Milano, Italy
| | - Mariangela Iannello
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Barbara Mantovani
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Andrea Luchetti
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Omar Rota-Stabelli
- Center Agriculture Food Environment C3A, University of Trento/Fondazione Edmund Mach, Trento, Italy
| |
Collapse
|
13
|
Savojardo C, Baldazzi D, Babbi G, Martelli PL, Casadio R. Mapping human disease-associated enzymes into Reactome allows characterization of disease groups and their interactions. Sci Rep 2022; 12:17963. [PMID: 36289281 PMCID: PMC9605996 DOI: 10.1038/s41598-022-22818-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/19/2022] [Indexed: 01/24/2023] Open
Abstract
According to databases such as OMIM, Humsavar, Clinvar and Monarch, 1494 human enzymes are presently associated to 2539 genetic diseases, 75% of which are rare (with an Orphanet code). The Mondo ontology initiative allows a standardization of the disease name into specific codes, making it possible a computational association between genes, variants, diseases, and their effects on biological processes. Here, we tackle the problem of which biological processes enzymes can affect when the protein variant is disease-associated. We adopt Reactome to describe human biological processes, and by mapping disease-associated enzymes in the Reactome pathways, we establish a Reactome-disease association. This allows a novel categorization of human monogenic and polygenic diseases based on Reactome pathways and reactions. Our analysis aims at dissecting the complexity of the human genetic disease universe, highlighting all the possible links within diseases and Reactome pathways. The novel mapping helps understanding the biochemical/molecular biology of the disease and allows a direct glimpse on the present knowledge of other molecules involved. This is useful for a complete overview of the disease molecular mechanism/s and for planning future investigations. Data are collected in DAR, a database that is free for search and available at https://dar.biocomp.unibo.it .
Collapse
Affiliation(s)
- Castrense Savojardo
- grid.6292.f0000 0004 1757 1758Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Davide Baldazzi
- grid.418321.d0000 0004 1757 9741CRO, Centro di Riferimento Oncologico, Aviano, Italy
| | - Giulia Babbi
- grid.6292.f0000 0004 1757 1758Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- grid.6292.f0000 0004 1757 1758Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- grid.6292.f0000 0004 1757 1758Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy ,grid.5326.20000 0001 1940 4177Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| |
Collapse
|
14
|
Manfredi M, Savojardo C, Martelli PL, Casadio R. E-SNPs&GO: embedding of protein sequence and function improves the annotation of human pathogenic variants. Bioinformatics 2022; 38:5168-5174. [PMID: 36227117 PMCID: PMC9710551 DOI: 10.1093/bioinformatics/btac678] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 09/14/2022] [Accepted: 10/10/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The advent of massive DNA sequencing technologies is producing a huge number of human single-nucleotide polymorphisms occurring in protein-coding regions and possibly changing their sequences. Discriminating harmful protein variations from neutral ones is one of the crucial challenges in precision medicine. Computational tools based on artificial intelligence provide models for protein sequence encoding, bypassing database searches for evolutionary information. We leverage the new encoding schemes for an efficient annotation of protein variants. RESULTS E-SNPs&GO is a novel method that, given an input protein sequence and a single amino acid variation, can predict whether the variation is related to diseases or not. The proposed method adopts an input encoding completely based on protein language models and embedding techniques, specifically devised to encode protein sequences and GO functional annotations. We trained our model on a newly generated dataset of 101 146 human protein single amino acid variants in 13 661 proteins, derived from public resources. When tested on a blind set comprising 10 266 variants, our method well compares to recent approaches released in literature for the same task, reaching a Matthews Correlation Coefficient score of 0.72. We propose E-SNPs&GO as a suitable, efficient and accurate large-scale annotator of protein variant datasets. AVAILABILITY AND IMPLEMENTATION The method is available as a webserver at https://esnpsandgo.biocomp.unibo.it. Datasets and predictions are available at https://esnpsandgo.biocomp.unibo.it/datasets. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna 40126, Italy
| |
Collapse
|
15
|
Babbi G, Savojardo C, Baldazzi D, Martelli PL, Casadio R. Pathogenic variation types in human genes relate to diseases through Pfam and InterPro mapping. Front Mol Biosci 2022; 9:966927. [PMID: 36188216 PMCID: PMC9523224 DOI: 10.3389/fmolb.2022.966927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Accepted: 08/31/2022] [Indexed: 11/13/2022] Open
Abstract
Grouping residue variations in a protein according to their physicochemical properties allows a dimensionality reduction of all the possible substitutions in a variant with respect to the wild type. Here, by using a large dataset of proteins with disease-related and benign variations, as derived by merging Humsavar and ClinVar data, we investigate to which extent our physicochemical grouping procedure can help in determining whether patterns of variation types are related to specific groups of diseases and whether they occur in Pfam and/or InterPro gene domains. Here, we download 75,145 germline disease-related and benign variations of 3,605 genes, group them according to physicochemical categories and map them into Pfam and InterPro gene domains. Statistically validated analysis indicates that each cluster of genes associated to Mondo anatomical system categorizations is characterized by a specific variation pattern. Patterns identify specific Pfam and InterPro domain–Mondo category associations. Our data suggest that the association of variation patterns to Mondo categories is unique and may help in associating gene variants to genetic diseases. This work corroborates in a much larger data set previous observations from our group.
Collapse
Affiliation(s)
- Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- *Correspondence: Pier Luigi Martelli, ; Rita Casadio,
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
- *Correspondence: Pier Luigi Martelli, ; Rita Casadio,
| |
Collapse
|
16
|
Casadio R, Savojardo C, Fariselli P, Capriotti E, Martelli PL. Turning Failures into Applications: The Problem of Protein ΔΔG Prediction. Methods Mol Biol 2022; 2449:169-185. [PMID: 35507262 DOI: 10.1007/978-1-0716-2095-3_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
After nearly two decades of research in the field of computational methods based on machine learning and knowledge-based potentials for ΔG and ΔΔG prediction upon variations, we now realize that all the approaches are poorly performing when tested on specific cases and that there is large space for improvement. Why this is so? Is it wrong the underlying assumption that experimental protein thermodynamics in solution reflects the thermodynamics of a single protein? Both machine learning and knowledge-based computational methods are rigorous and we know the solid theory behind. We are now in a critical situation, which suggests that predictions of protein instability upon variation should be considered with care. In the following, we will show how to cope with the problem of understanding which protein positions may be of interest for biotechnological and biomedical purposes. By applying a consensus procedure, we indicate possible strategies for the result interpretation.
Collapse
Affiliation(s)
- Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Turin, Italy
| | - Emidio Capriotti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| |
Collapse
|
17
|
Madeo G, Savojardo C, Luigi Martelli P, Casadio R. SVMyr: a web server detecting co- and post-translational myristoylation in proteins. J Mol Biol 2022; 434:167605. [DOI: 10.1016/j.jmb.2022.167605] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 03/31/2022] [Accepted: 04/19/2022] [Indexed: 12/31/2022]
|
18
|
Casadio R, Martelli PL, Savojardo C. Machine learning solutions for predicting protein–protein interactions. WIREs Comput Mol Sci 2022. [DOI: 10.1002/wcms.1618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Rita Casadio
- Biocomputing Group University of Bologna Bologna Italy
| | | | | |
Collapse
|
19
|
Savojardo C, Babbi G, Baldazzi D, Martelli PL, Casadio R. A Glance into MTHFR Deficiency at a Molecular Level. Int J Mol Sci 2021; 23:167. [PMID: 35008593 PMCID: PMC8745156 DOI: 10.3390/ijms23010167] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 12/03/2021] [Accepted: 12/21/2021] [Indexed: 12/16/2022] Open
Abstract
MTHFR deficiency still deserves an investigation to associate the phenotype to protein structure variations. To this aim, considering the MTHFR wild type protein structure, with a catalytic and a regulatory domain and taking advantage of state-of-the-art computational tools, we explore the properties of 72 missense variations known to be disease associated. By computing the thermodynamic ΔΔG change according to a consensus method that we recently introduced, we find that 61% of the disease-related variations destabilize the protein, are present both in the catalytic and regulatory domain and correspond to known biochemical deficiencies. The propensity of solvent accessible residues to be involved in protein-protein interaction sites indicates that most of the interacting residues are located in the regulatory domain, and that only three of them, located at the interface of the functional protein homodimer, are both disease-related and destabilizing. Finally, we compute the protein architecture with Hidden Markov Models, one from Pfam for the catalytic domain and the second computed in house for the regulatory domain. We show that patterns of disease-associated, physicochemical variation types, both in the catalytic and regulatory domains, are unique for the MTHFR deficiency when mapped into the protein architecture.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy; (C.S.); (G.B.); (D.B.); (R.C.)
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy; (C.S.); (G.B.); (D.B.); (R.C.)
| | - Davide Baldazzi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy; (C.S.); (G.B.); (D.B.); (R.C.)
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy; (C.S.); (G.B.); (D.B.); (R.C.)
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy; (C.S.); (G.B.); (D.B.); (R.C.)
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), 70126 Bari, Italy
| |
Collapse
|
20
|
Manfredi M, Savojardo C, Martelli PL, Casadio R. DeepREx-WS: A web server for characterising protein-solvent interaction starting from sequence. Comput Struct Biotechnol J 2021; 19:5791-5799. [PMID: 34765094 PMCID: PMC8566768 DOI: 10.1016/j.csbj.2021.10.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 10/07/2021] [Accepted: 10/07/2021] [Indexed: 11/23/2022] Open
Abstract
Protein–solvent interaction provides important features for protein surface engineering when the structure is absent or partially solved. Presently, we can integrate the notion of solvent exposed/buried residues with that of their flexibility and intrinsic disorder to highlight regions where mutations may increase or decrease protein stability in order to modify proteins for biotechnological reasons, while preserving their functional integrity. Here we describe a web server, which provides the unique possibility of integrating knowledge of solvent and non-solvent exposure with that of residue conservation, flexibility and disorder of a protein sequence, for a better understanding of which regions are relevant for protein integrity. The core of the webserver is DeepREx, a novel deep learning-based tool that classifies each residue in the sequence as buried or exposed. DeepREx is trained on a high-quality, non-redundant dataset derived from the Protein Data Bank comprising 2332 monomeric protein chains and benchmarked on a blind test set including 200 protein sequences unrelated with the training set. Results show that DeepREx performs at the state-of-the-art in the field. In turn, the Web Server, DeepREx-WS, supplements the predictions of DeepREx with features that allow a better characterisation of exposed and buried regions: i) residue conservation derived from multiple sequence alignment; ii) local sequence hydrophobicity; iii) residue flexibility computed with MEDUSA; iv) a predictor of secondary structure; v) the presence of disordered regions as derived from MobiDB-Lite3.0. The web server allows browsing, selecting and intersecting the different features. We demonstrate a possible application of the DeepREx-WS for assisting the identification of residues to be variated in protein surface engineering processes.
Collapse
Affiliation(s)
- Matteo Manfredi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- Corresponding author.
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| |
Collapse
|
21
|
Luchetti A, Forni G, Martelossi J, Savojardo C, Martelli PL, Casadio R, Skaist AM, Wheelan SJ, Mantovani B. Comparative genomics of tadpole shrimps (Crustacea, Branchiopoda, Notostraca): Dynamic genome evolution against the backdrop of morphological stasis. Genomics 2021; 113:4163-4172. [PMID: 34748900 DOI: 10.1016/j.ygeno.2021.11.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Revised: 10/26/2021] [Accepted: 11/02/2021] [Indexed: 12/21/2022]
Abstract
This analysis presents five genome assemblies of four Notostraca taxa. Notostraca origin dates to the Permian/Upper Devonian and the extant forms show a striking morphological similarity to fossil taxa. The comparison of sequenced genomes with other Branchiopoda genomes shows that, despite the morphological stasis, Notostraca share a dynamic genome evolution with high turnover for gene families' expansion/contraction and a transposable elements content comparable to other branchiopods. While Notostraca substitutions rate appears similar or lower in comparison to other branchiopods, a subset of genes shows a faster evolutionary pace, highlighting the difficulty of generalizing about genomic stasis versus dynamism. Moreover, we found that the variation of Triops cancriformis transposable elements content appeared linked to reproductive strategies, in line with theoretical expectations. Overall, besides providing new genomic resources for the study of these organisms, which appear relevant for their ecology and evolution, we also confirmed the decoupling of morphological and molecular evolution.
Collapse
Affiliation(s)
- Andrea Luchetti
- Department of Biological, Geological and Environmental Sciences, University of Bologna, via Selmi 3, 40126 Bologna, Italy.
| | - Giobbe Forni
- Department of Biological, Geological and Environmental Sciences, University of Bologna, via Selmi 3, 40126 Bologna, Italy
| | - Jacopo Martelossi
- Department of Biological, Geological and Environmental Sciences, University of Bologna, via Selmi 3, 40126 Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Alyza M Skaist
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Sarah J Wheelan
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Barbara Mantovani
- Department of Biological, Geological and Environmental Sciences, University of Bologna, via Selmi 3, 40126 Bologna, Italy
| |
Collapse
|
22
|
Andrews AJ, Puncher GN, Bernal-Casasola D, Di Natale A, Massari F, Onar V, Toker NY, Hanke A, Pavey SA, Savojardo C, Martelli PL, Casadio R, Cilli E, Morales-Muñiz A, Mantovani B, Tinti F, Cariani A. Ancient DNA SNP-panel data suggests stability in bluefin tuna genetic diversity despite centuries of fluctuating catches in the eastern Atlantic and Mediterranean. Sci Rep 2021; 11:20744. [PMID: 34671077 PMCID: PMC8528830 DOI: 10.1038/s41598-021-99708-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 09/25/2021] [Indexed: 11/10/2022] Open
Abstract
Atlantic bluefin tuna (Thunnus thynnus; BFT) abundance was depleted in the late 20th and early 21st century due to overfishing. Historical catch records further indicate that the abundance of BFT in the Mediterranean has been fluctuating since at least the 16th century. Here we build upon previous work on ancient DNA of BFT in the Mediterranean by comparing contemporary (2009–2012) specimens with archival (1911–1926) and archaeological (2nd century BCE–15th century CE) specimens that represent population states prior to these two major periods of exploitation, respectively. We successfully genotyped and analysed 259 contemporary and 123 historical (91 archival and 32 archaeological) specimens at 92 SNP loci that were selected for their ability to differentiate contemporary populations or their association with core biological functions. We found no evidence of genetic bottlenecks, inbreeding or population restructuring between temporal sample groups that might explain what has driven catch fluctuations since the 16th century. We also detected a putative adaptive response, involving the cytoskeletal protein synemin which may be related to muscle stress. However, these results require further investigation with more extensive genome-wide data to rule out demographic changes due to overfishing, and other natural and anthropogenic factors, in addition to elucidating the adaptive drivers related to these.
Collapse
Affiliation(s)
- Adam J Andrews
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Ravenna, Italy. .,Department of Cultural Heritage, University of Bologna, Ravenna, Italy.
| | - Gregory N Puncher
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Ravenna, Italy. .,Department of Biological Sciences, Canadian Rivers Institute, University of New Brunswick, Saint John, NB, Canada.
| | - Darío Bernal-Casasola
- Department of History, Geography and Philosophy, Faculty of Philosophy and Letters, University of Cádiz, Cádiz, Spain
| | | | - Francesco Massari
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Ravenna, Italy
| | - Vedat Onar
- Osteoarcheology Practice and Research Centre and Faculty of Veterinary Medicine, Istanbul University-Cerrahpaşa, Avcılar, Istanbul, Turkey
| | - Nezir Yaşar Toker
- Osteoarcheology Practice and Research Centre and Faculty of Veterinary Medicine, Istanbul University-Cerrahpaşa, Avcılar, Istanbul, Turkey
| | - Alex Hanke
- St. Andrews Biological Station, Fisheries and Oceans Canada, St. Andrews, NB, Canada
| | - Scott A Pavey
- Department of Biological Sciences, Canadian Rivers Institute, University of New Brunswick, Saint John, NB, Canada
| | | | | | - Rita Casadio
- Biocomputing Group, University of Bologna, Bologna, Italy
| | - Elisabetta Cilli
- Department of Cultural Heritage, University of Bologna, Ravenna, Italy
| | | | - Barbara Mantovani
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | - Fausto Tinti
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Ravenna, Italy
| | - Alessia Cariani
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Ravenna, Italy
| |
Collapse
|
23
|
Baldazzi D, Savojardo C, Martelli PL, Casadio R. BENZ WS: the Bologna ENZyme Web Server for four-level EC number annotation. Nucleic Acids Res 2021; 49:W60-W66. [PMID: 33963861 PMCID: PMC8262719 DOI: 10.1093/nar/gkab328] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/01/2021] [Accepted: 04/20/2021] [Indexed: 11/12/2022] Open
Abstract
The Bologna ENZyme Web Server (BENZ WS) annotates four-level Enzyme Commission numbers (EC numbers) as defined by the International Union of Biochemistry and Molecular Biology (IUBMB). BENZ WS filters a target sequence with a combined system of Hidden Markov Models, modelling protein sequences annotated with the same molecular function, and Pfams, carrying along conserved protein domains. BENZ returns, when successful, for any enzyme target sequence an associated four-level EC number. Our system can annotate both monofunctional and polyfunctional enzymes, and it can be a valuable resource for sequence functional annotation.
Collapse
Affiliation(s)
- Davide Baldazzi
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| |
Collapse
|
24
|
Savojardo C, Babbi G, Martelli PL, Casadio R. Mapping OMIM Disease-Related Variations on Protein Domains Reveals an Association Among Variation Type, Pfam Models, and Disease Classes. Front Mol Biosci 2021; 8:617016. [PMID: 34026820 PMCID: PMC8138129 DOI: 10.3389/fmolb.2021.617016] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 04/09/2021] [Indexed: 12/23/2022] Open
Abstract
Human genome resequencing projects provide an unprecedented amount of data about single-nucleotide variations occurring in protein-coding regions and often leading to observable changes in the covalent structure of gene products. For many of these variations, links to Online Mendelian Inheritance in Man (OMIM) genetic diseases are available and are reported in many databases that are collecting human variation data such as Humsavar. However, the current knowledge on the molecular mechanisms that are leading to diseases is, in many cases, still limited. For understanding the complex mechanisms behind disease insurgence, the identification of putative models, when considering the protein structure and chemico-physical features of the variations, can be useful in many contexts, including early diagnosis and prognosis. In this study, we investigate the occurrence and distribution of human disease–related variations in the context of Pfam domains. The aim of this study is the identification and characterization of Pfam domains that are statistically more likely to be associated with disease-related variations. The study takes into consideration 2,513 human protein sequences with 22,763 disease-related variations. We describe patterns of disease-related variation types in biunivocal relation with Pfam domains, which are likely to be possible markers for linking Pfam domains to OMIM diseases. Furthermore, we take advantage of the specific association between disease-related variation types and Pfam domains for clustering diseases according to the Human Disease Ontology, and we establish a relation among variation types, Pfam domains, and disease classes. We find that Pfam models are specific markers of patterns of variation types and that they can serve to bridge genes, diseases, and disease classes. Data are available as Supplementary Material for 1,670 Pfam models, including 22,763 disease-related variations associated to 3,257 OMIM diseases.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Bari, Italy
| |
Collapse
|
25
|
Bonora E, Chakrabarty S, Kellaris G, Tsutsumi M, Bianco F, Bergamini C, Ullah F, Isidori F, Liparulo I, Diquigiovanni C, Masin L, Rizzardi N, Cratere MG, Boschetti E, Papa V, Maresca A, Cenacchi G, Casadio R, Martelli P, Matera I, Ceccherini I, Fato R, Raiola G, Arrigo S, Signa S, Sementa AR, Severino M, Striano P, Fiorillo C, Goto T, Uchino S, Oyazato Y, Nakamura H, Mishra SK, Yeh YS, Kato T, Nozu K, Tanboon J, Morioka I, Nishino I, Toda T, Goto YI, Ohtake A, Kosaki K, Yamaguchi Y, Nonaka I, Iijima K, Mimaki M, Kurahashi H, Raams A, MacInnes A, Alders M, Engelen M, Linthorst G, de Koning T, den Dunnen W, Dijkstra G, van Spaendonck K, van Gent DC, Aronica EM, Picco P, Carelli V, Seri M, Katsanis N, Duijkers FAM, Taniguchi-Ikeda M, De Giorgio R. Biallelic variants in LIG3 cause a novel mitochondrial neurogastrointestinal encephalomyopathy. Brain 2021; 144:1451-1466. [PMID: 33855352 DOI: 10.1093/brain/awab056] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Revised: 11/13/2020] [Accepted: 12/09/2020] [Indexed: 12/11/2022] Open
Abstract
Abnormal gut motility is a feature of several mitochondrial encephalomyopathies, and mutations in genes such as TYMP and POLG, have been linked to these rare diseases. The human genome encodes three DNA ligases, of which only one, ligase III (LIG3), has a mitochondrial splice variant and is crucial for mitochondrial health. We investigated the effect of reduced LIG3 activity and resulting mitochondrial dysfunction in seven patients from three independent families, who showed the common occurrence of gut dysmotility and neurological manifestations reminiscent of mitochondrial neurogastrointestinal encephalomyopathy. DNA from these patients was subjected to whole exome sequencing. In all patients, compound heterozygous variants in a new disease gene, LIG3, were identified. All variants were predicted to have a damaging effect on the protein. The LIG3 gene encodes the only mitochondrial DNA (mtDNA) ligase and therefore plays a pivotal role in mtDNA repair and replication. In vitro assays in patient-derived cells showed a decrease in LIG3 protein levels and ligase activity. We demonstrated that the LIG3 gene defects affect mtDNA maintenance, leading to mtDNA depletion without the accumulation of multiple deletions as observed in other mitochondrial disorders. This mitochondrial dysfunction is likely to cause the phenotypes observed in these patients. The most prominent and consistent clinical signs were severe gut dysmotility and neurological abnormalities, including leukoencephalopathy, epilepsy, migraine, stroke-like episodes, and neurogenic bladder. A decrease in the number of myenteric neurons, and increased fibrosis and elastin levels were the most prominent changes in the gut. Cytochrome c oxidase (COX) deficient fibres in skeletal muscle were also observed. Disruption of lig3 in zebrafish reproduced the brain alterations and impaired gut transit in vivo. In conclusion, we identified variants in the LIG3 gene that result in a mitochondrial disease characterized by predominant gut dysmotility, encephalopathy, and neuromuscular abnormalities.
Collapse
Affiliation(s)
- Elena Bonora
- Department of Medical and Surgical Sciences, St. Orsola-Malpighi Hospital, University of Bologna, Bologna, 40138, Italy
| | - Sanjiban Chakrabarty
- Department of Molecular Genetics, Erasmus MC, Rotterdam, 3000 CA, The Netherlands
| | - Georgios Kellaris
- Center for Human Disease Modeling, Duke University, Durham, NC 27710, USA
| | - Makiko Tsutsumi
- Division of Molecular Genetics, Institute for Comprehensive Medical Science, Fujita Health University, Aichi, 470-1192, Japan
| | - Francesca Bianco
- Department of Medical and Surgical Sciences, St. Orsola-Malpighi Hospital, University of Bologna, Bologna, 40138, Italy
| | - Christian Bergamini
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| | - Farid Ullah
- Center for Human Disease Modeling, Duke University, Durham, NC 27710, USA
| | - Federica Isidori
- Department of Medical and Surgical Sciences, St. Orsola-Malpighi Hospital, University of Bologna, Bologna, 40138, Italy
| | - Irene Liparulo
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| | - Chiara Diquigiovanni
- Department of Medical and Surgical Sciences, St. Orsola-Malpighi Hospital, University of Bologna, Bologna, 40138, Italy
| | - Luca Masin
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| | - Nicola Rizzardi
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| | - Mariapia Giuditta Cratere
- Department of Medical and Surgical Sciences, St. Orsola-Malpighi Hospital, University of Bologna, Bologna, 40138, Italy.,Division of Genetics and Cell Biology, San Raffaele Scientific Institute, Milan, 20132, Italy
| | - Elisa Boschetti
- Department of Medical and Surgical Sciences, St. Orsola-Malpighi Hospital, University of Bologna, Bologna, 40138, Italy
| | - Valentina Papa
- Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, 40123, Italy
| | - Alessandra Maresca
- IRCCS Istituto delle Scienze Neurologiche di Bologna, Programma di Neurogenetica, Bologna, 40139, Italy
| | - Giovanna Cenacchi
- Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, 40123, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Biological, Geological, Environmental Sciences, University of Bologna, Bologna, 40126, Italy
| | - Pierluigi Martelli
- Biocomputing Group, Department of Biological, Geological, Environmental Sciences, University of Bologna, Bologna, 40126, Italy
| | - Ivana Matera
- IRCCS Istituto Giannina Gaslini, Genova, 16128, Italy
| | | | - Romana Fato
- Department of Pharmacy and Biotechnology, University of Bologna, Bologna, 40126, Italy
| | - Giuseppe Raiola
- Department of Paediatrics, Pugliese-Ciaccio Hospital, Catanzaro, 88100, Italy
| | - Serena Arrigo
- IRCCS Istituto Giannina Gaslini, Genova, 16128, Italy
| | - Sara Signa
- IRCCS Istituto Giannina Gaslini, Genova, 16128, Italy
| | | | | | | | | | - Tsuyoshi Goto
- Laboratory of Molecular Function of Food, Division of Food Science and Biotechnology, Graduate School of Agriculture, Kyoto University, Uji, 611-0011, Japan
| | - Shumpei Uchino
- Department of Pediatrics, Teikyo University School of Medicine, Tokyo, 173-8605, Japan.,Department of Pediatrics, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Yoshinobu Oyazato
- Department of Pediatrics, Kakogawa Central City Hospital, Kakogawa, Hyogo, 675-8611, Japan
| | - Hisayoshi Nakamura
- Department of Neuromuscular Research, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Tokyo, 187-8502, Japan
| | - Sushil K Mishra
- Glycoscience Group, National University of Ireland, Galway, H91 CF50, Ireland
| | - Yu-Sheng Yeh
- Laboratory of Molecular Function of Food, Division of Food Science and Biotechnology, Graduate School of Agriculture, Kyoto University, Uji, 611-0011, Japan
| | - Takema Kato
- Division of Molecular Genetics, Institute for Comprehensive Medical Science, Fujita Health University, Aichi, 470-1192, Japan
| | - Kandai Nozu
- Department of Pediatrics, Kobe University Graduate School of Medicine, Hyogo, 650-0017, Japan
| | - Jantima Tanboon
- Department of Neuromuscular Research, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Tokyo, 187-8502, Japan
| | - Ichiro Morioka
- Department of Pediatrics and Child Health, Nihon University School of Medicine, Tokyo, 173-8610, Japan
| | - Ichizo Nishino
- Department of Neuromuscular Research, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Tokyo, 187-8502, Japan
| | - Tatsushi Toda
- Department of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Yu-Ichi Goto
- Department of Mental Retardation and Birth Defect Research, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Tokyo, 187-8502, Japan
| | - Akira Ohtake
- Department of Pediatrics & Clinical Genomics, Faculty of Medicine, Saitama Medical University, Saitama, 350-0495, Japan
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Yoshiki Yamaguchi
- Laboratory of Pharmaceutical Physical Chemistry, Tohoku Medical and Pharmaceutical University, Miyagi, 981-8558, Japan
| | - Ikuya Nonaka
- Department of Neuromuscular Research, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Tokyo, 187-8502, Japan
| | - Kazumoto Iijima
- Department of Pediatrics, Kobe University Graduate School of Medicine, Hyogo, 650-0017, Japan
| | - Masakazu Mimaki
- Department of Pediatrics, Teikyo University School of Medicine, Tokyo, 173-8605, Japan
| | - Hiroki Kurahashi
- Division of Molecular Genetics, Institute for Comprehensive Medical Science, Fujita Health University, Aichi, 470-1192, Japan
| | - Anja Raams
- Department of Molecular Genetics, Erasmus MC, Rotterdam, 3000 CA, The Netherlands
| | - Alyson MacInnes
- Department of Metabolic Diseases, Amsterdam UMC, University of Amsterdam, Amsterdam, 1100 DD, The Netherlands
| | - Mariel Alders
- Department of Clinical Genetics, Amsterdam UMC, University of Amsterdam, Amsterdam, 1100 DD, The Netherlands
| | - Marc Engelen
- Department of Neurology, Amsterdam UMC, University of Amsterdam, Amsterdam, 1100 DD, The Netherlands
| | - Gabor Linthorst
- Department of Metabolic Diseases, Amsterdam UMC, University of Amsterdam, Amsterdam, 1100 DD, The Netherlands
| | - Tom de Koning
- Department of Metabolic Diseases, UMCG, Groningen, 9700 RB, The Netherlands
| | | | - Gerard Dijkstra
- Department of Gastroenterology, UMCG, Groningen, 9700 RB, The Netherlands
| | - Karin van Spaendonck
- Department of Clinical Genetics, Amsterdam UMC, University of Amsterdam, Amsterdam, 1100 DD, The Netherlands
| | - Dik C van Gent
- Department of Molecular Genetics, Erasmus MC, Rotterdam, 3000 CA, The Netherlands
| | - Eleonora M Aronica
- Department of Pathology, Amsterdam UMC, University of Amsterdam, Amsterdam, 1100 DD, The Netherlands
| | - Paolo Picco
- IRCCS Istituto Giannina Gaslini, Genova, 16128, Italy
| | - Valerio Carelli
- Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna, 40123, Italy.,IRCCS Istituto delle Scienze Neurologiche di Bologna, Programma di Neurogenetica, Bologna, 40139, Italy
| | - Marco Seri
- Department of Medical and Surgical Sciences, St. Orsola-Malpighi Hospital, University of Bologna, Bologna, 40138, Italy
| | - Nicholas Katsanis
- Center for Human Disease Modeling, Duke University, Durham, NC 27710, USA
| | - Floor A M Duijkers
- Department of Clinical Genetics, Amsterdam UMC, University of Amsterdam, Amsterdam, 1100 DD, The Netherlands
| | - Mariko Taniguchi-Ikeda
- Division of Molecular Genetics, Institute for Comprehensive Medical Science, Fujita Health University, Aichi, 470-1192, Japan.,Department of Pediatrics, Kobe University Graduate School of Medicine, Hyogo, 650-0017, Japan.,Department of Clinical Genetics, Fujita Health University Hospital, Aichi, 470-1192, Japan
| | - Roberto De Giorgio
- Department of Morphology, Surgery and Experimental Medicine, St. Anna Hospital, University of Ferrara, Ferrara, 44124, Italy
| |
Collapse
|
26
|
Casadio R, Lenhard B, Sternberg MJE. Computational Resources for Molecular Biology 2021. J Mol Biol 2021; 433:166962. [PMID: 33774035 DOI: 10.1016/j.jmb.2021.166962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Rita Casadio
- Biocomputing Group, FABIT-University of Bologna, Italy
| | - Boris Lenhard
- Institute of Clinical Sciences, Faculty of Medicine. Imperial College London, Hammersmith Campus, Du Cane Road, London W12 0NN, UK; Computational Regulatory Genomics, MRC London Institute of Medical Sciences, Du Cane Road, London W12 0NN, UK
| | - Michael J E Sternberg
- Structural Bioinformatics Group, Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London SW7 2AZ, UK.
| |
Collapse
|
27
|
Babbi G, Savojardo C, Martelli PL, Casadio R. Huntingtin: A Protein with a Peculiar Solvent Accessible Surface. Int J Mol Sci 2021; 22:ijms22062878. [PMID: 33809039 PMCID: PMC8001614 DOI: 10.3390/ijms22062878] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/04/2021] [Accepted: 03/04/2021] [Indexed: 11/30/2022] Open
Abstract
Taking advantage of the last cryogenic electron microscopy structure of human huntingtin, we explored with computational methods its physicochemical properties, focusing on the solvent accessible surface of the protein and highlighting a quite interesting mix of hydrophobic and hydrophilic patterns, with the prevalence of the latter ones. We then evaluated the probability of exposed residues to be in contact with other proteins, discovering that they tend to cluster in specific regions of the protein. We then found that the remaining portions of the protein surface can contain calcium-binding sites that we propose here as putative mediators for the protein to interact with membranes. Our findings are justified in relation to the present knowledge of huntingtin functional annotation.
Collapse
Affiliation(s)
- Giulia Babbi
- Biocomputing Group, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy; (G.B.); (C.S.); (R.C.)
| | - Castrense Savojardo
- Biocomputing Group, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy; (G.B.); (C.S.); (R.C.)
| | - Pier Luigi Martelli
- Biocomputing Group, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy; (G.B.); (C.S.); (R.C.)
- Correspondence: ; Tel.: +39-051-2094005
| | - Rita Casadio
- Biocomputing Group, University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy; (G.B.); (C.S.); (R.C.)
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council, Via Giovanni Amendola 122/O, 70126 Bari, Italy
| |
Collapse
|
28
|
Savojardo C, Manfredi M, Martelli PL, Casadio R. Solvent Accessibility of Residues Undergoing Pathogenic Variations in Humans: From Protein Structures to Protein Sequences. Front Mol Biosci 2021; 7:626363. [PMID: 33490109 PMCID: PMC7817970 DOI: 10.3389/fmolb.2020.626363] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 12/07/2020] [Indexed: 01/08/2023] Open
Abstract
Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA is computed from protein structures with different algorithms, and from protein sequences with machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure of residues can be associated to the pathogenicity of the variation. By this, SASA of the wild-type residue acquires a role in the context of functional annotation of protein single-residue variations (SRVs). By mapping variations on a curated database of human protein structures, we found that residues targeted by disease related SRVs are less accessible to solvent than residues involved in polymorphisms. The disease association is not evenly distributed among the different residue types: SRVs targeting glycine, tryptophan, tyrosine, and cysteine are more frequently disease associated than others. For all residues, the proportion of disease related SRVs largely increases when the wild-type residue is buried and decreases when it is exposed. The extent of the increase depends on the residue type. With the aid of an in house developed predictor, based on a deep learning procedure and performing at the state-of-the-art, we are able to confirm the above tendency by analyzing a large data set of residues subjected to variations and occurring in some 12,494 human protein sequences still lacking three-dimensional structure (derived from HUMSAVAR). Our data support the notion that surface accessible area is a distinguished property of residues that undergo variation and that pathogenicity is more frequently associated to the buried property than to the exposed one.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Bologna, Italy
| | - Matteo Manfredi
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council, Bari, Italy
| |
Collapse
|
29
|
Madeo G, Savojardo C, Martelli PL, Casadio R. BetAware-Deep: An Accurate Web Server for Discrimination and Topology Prediction of Prokaryotic Transmembrane β-barrel Proteins. J Mol Biol 2020; 433:166729. [PMID: 33972021 DOI: 10.1016/j.jmb.2020.166729] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/27/2020] [Accepted: 11/30/2020] [Indexed: 11/25/2022]
Abstract
TransMembrane β-Barrel (TMBB) proteins located in the outer membranes of Gram-negative bacteria are crucial for many important biological processes and primary candidates as drug targets. Structure determination of TMBB proteins is challenging and hence computational methods devised for the analysis of TMBB proteins are important for complementing experimental approaches. Here, we present a novel web server called BetAware-Deep that is able to accurately identify the topology of TMBB proteins (i.e. the number and orientation of membrane-spanning segments along the protein sequence) and to discriminate them from other protein types. The method in BetAware-Deep defines new features by exploiting a non-canonical computation of the hydrophobic moment and by adopting sequence-profile weighting of the White&Wimley hydrophobicity scale. These features are processed using a two-step approach based on deep learning and probabilistic graphical models. BetAware-Deep has been trained on a dataset comprising 58 TMBBs and benchmarked on a novel set of 15 TMBB proteins. Results showed that BetAware-Deep outperforms two recently released state-of-the-art methods for topology prediction, predicting correct topologies of 10 out of 15 proteins. TMBB detection was also assessed on a larger dataset comprising 1009 TMBB proteins and 7571 non-TMBB proteins. Even in this benchmark, BetAware-Deep scored at the level of top-performing methods. A web server has been developed allowing users to analyze input protein sequences and providing topology prediction together with a rich set of information including a graphical representation of the residue-level annotations and prediction probabilities. BetAware-Deep is available at https://busca.biocomp.unibo.it/betaware2.
Collapse
Affiliation(s)
- Giovanni Madeo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy.
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy; Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (CNR), Bari, Italy
| |
Collapse
|
30
|
Savojardo C, Bruciaferri N, Tartari G, Martelli PL, Casadio R. DeepMito: accurate prediction of protein sub-mitochondrial localization using convolutional neural networks. Bioinformatics 2020; 36:56-64. [PMID: 31218353 PMCID: PMC6956790 DOI: 10.1093/bioinformatics/btz512] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 05/31/2019] [Accepted: 06/17/2019] [Indexed: 11/18/2022] Open
Abstract
Motivation The correct localization of proteins in cell compartments is a key issue for their function. Particularly, mitochondrial proteins are physiologically active in different compartments and their aberrant localization contributes to the pathogenesis of human mitochondrial pathologies. Many computational methods exist to assign protein sequences to subcellular compartments such as nucleus, cytoplasm and organelles. However, a substantial lack of experimental evidence in public sequence databases hampered so far a finer grain discrimination, including also intra-organelle compartments. Results We describe DeepMito, a novel method for predicting protein sub-mitochondrial cellular localization. Taking advantage of powerful deep-learning approaches, such as convolutional neural networks, our method is able to achieve very high prediction performances when discriminating among four different mitochondrial compartments (matrix, outer, inner and intermembrane regions). The method is trained and tested in cross-validation on a newly generated, high-quality dataset comprising 424 mitochondrial proteins with experimental evidence for sub-organelle localizations. We benchmark DeepMito towards the only one recent approach developed for the same task. Results indicate that DeepMito performances are superior. Finally, genomic-scale prediction on a highly-curated dataset of human mitochondrial proteins further confirms the effectiveness of our approach and suggests that DeepMito is a good candidate for genome-scale annotation of mitochondrial protein subcellular localization. Availability and implementation The DeepMito web server as well as all datasets used in this study are available at http://busca.biocomp.unibo.it/deepmito. A standalone version of DeepMito is available on DockerHub at https://hub.docker.com/r/bolognabiocomp/deepmito. DeepMito source code is available on GitHub at https://github.com/BolognaBiocomp/deepmito Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Niccolò Bruciaferri
- Biocomputing Group, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Giacomo Tartari
- Biocomputing Group, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| |
Collapse
|
31
|
Babbi G, Baldazzi D, Savojardo C, Martelli PL, Casadio R. Highlighting Human Enzymes Active in Different Metabolic Pathways and Diseases: The Case Study of EC 1.2.3.1 and EC 2.3.1.9. Biomedicines 2020; 8:biomedicines8080250. [PMID: 32751059 PMCID: PMC7459455 DOI: 10.3390/biomedicines8080250] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 07/22/2020] [Accepted: 07/24/2020] [Indexed: 11/22/2022] Open
Abstract
Enzymes are key proteins performing the basic functional activities in cells. In humans, enzymes can be also responsible for diseases, and the molecular mechanisms underlying the genotype to phenotype relationship are under investigation for diagnosis and medical care. Here, we focus on highlighting enzymes that are active in different metabolic pathways and become relevant hubs in protein interaction networks. We perform a statistics to derive our present knowledge on human metabolic pathways (the Kyoto Encyclopaedia of Genes and Genomes (KEGG)), and we found that activity aldehyde dehydrogenase (NAD(+)), described by Enzyme Commission number EC 1.2.1.3, and activity acetyl-CoA C-acetyltransferase (EC 2.3.1.9) are the ones most frequently involved. By associating functional activities (EC numbers) to enzyme proteins, we found the proteins most frequently involved in metabolic pathways. With our analysis, we found that these proteins are endowed with the highest numbers of interaction partners when compared to all the enzymes in the pathways and with the highest numbers of predicted interaction sites. As specific enzyme protein test cases, we focus on Alpha-Aminoadipic Semialdehyde Dehydrogenase (ALDH7A1, EC 2.3.1.9) and Acetyl-CoA acetyltransferase, cytosolic and mitochondrial (gene products of ACAT2 and ACAT1, respectively; EC 2.3.1.9). With computational approaches we show that it is possible, by starting from the enzyme structure, to highlight clues of their multiple roles in different pathways and of putative mechanisms promoting the association of genes to disease.
Collapse
|
32
|
Abstract
In the last decade, newly developed experimental methods have made it possible to highlight that macromolecules in the cell milieu physically interact to support physiology. This has shifted the problem of protein–protein interaction from a microscopic, electron-density scale to a mesoscopic one. Further, nowadays there is increasing evidence that proteins in the nucleus and in the cytoplasm can aggregate in membraneless organelles for different physiological reasons. In this scenario, it is urgent to face the problem of biomolecule functional annotation with efficient computational methods, suited to extract knowledge from reliable data and transfer information across different domains of investigation. Here, we revise the present state of the art of our knowledge of protein–protein interaction and the computational methods that differently implement it. Furthermore, we explore experimental and computational features of a set of proteins involved in phase separation.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology and Interdepartmental Center “Luigi Galvani” for Integrated Studies of Bioinformatics, Biophysics, and Biocomplexity, University of Bologna, 40126 Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology and Interdepartmental Center “Luigi Galvani” for Integrated Studies of Bioinformatics, Biophysics, and Biocomplexity, University of Bologna, 40126 Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology and Interdepartmental Center “Luigi Galvani” for Integrated Studies of Bioinformatics, Biophysics, and Biocomplexity, University of Bologna, 40126 Bologna, Italy
- Institute of Biomembranes, Bioenergetics, and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), 70126 Bari, Italy
| |
Collapse
|
33
|
Abstract
Omics techniques provide a spectrum of information at the genomic level, whose analysis can characterize complex traits at a molecular level. The relationship among genotype and phenotype implies that from genome information the molecular pathways and biological processes underlying a given phenotype are discovered. In dealing with this problem, gene enrichment analysis has become the most widely adopted strategy. Here we present NETGE-PLUS, a Web server for standard and network-based functional interpretation of gene sets of human and of model organisms, including Sus scrofa, Saccharomyces cerevisiae, Escherichia coli, and Arabidopsis thaliana. NETGE-PLUS enables the functional enrichment of both simple and ranked lists of genes, introducing also the possibility of exploring relationships among KEGG pathways. A Web interface makes data retrieval complete and user-friendly. NETGE-PLUS is publicly available at http://net-ge2.biocomp.unibo.it.
Collapse
Affiliation(s)
- Samuele Bovo
- Biocomputing Group, Department of Pharmacy and Biotechnology (FABIT), University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy.,Department of Agricultural and Food Sciences (DISTAL), Division of Animal Sciences, University of Bologna, Viale Fanin 46, 40127 Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology (FABIT), University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy
| | - Pietro Di Lena
- Department of Computer Science and Engineering (DISI), University of Bologna, Mura Anteo Zamboni 7, 40126 Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology (FABIT), University of Bologna, Via San Giacomo 9/2, 40126 Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), 70126 Bari, Italy
| |
Collapse
|
34
|
Savojardo C, Martelli PL, Casadio R, Fariselli P. On the critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Brief Bioinform 2019; 22:601-603. [PMID: 31885042 DOI: 10.1093/bib/bbz168] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 11/26/2019] [Accepted: 12/05/2019] [Indexed: 01/17/2023] Open
Abstract
A review, recently published in this journal by Fang (2019), showed that methods trained for the prediction of protein stability changes upon mutation have a very critical bias: they neglect that a protein variation (A- > B) and its reverse (B- > A) must have the opposite value of the free energy difference (ΔΔGAB = - ΔΔGBA). In this letter, we complement the Fang's paper presenting a more general view of the problem. In particular, a machine learning-based method, published in 2015 (INPS), addressed the bias issue directly. We include the analysis of the missing method, showing that INPS is nearly insensitive to the addressed problem.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| |
Collapse
|
35
|
Zhou N, Jiang Y, Bergquist TR, Lee AJ, Kacsoh BZ, Crocker AW, Lewis KA, Georghiou G, Nguyen HN, Hamid MN, Davis L, Dogan T, Atalay V, Rifaioglu AS, Dalkıran A, Cetin Atalay R, Zhang C, Hurto RL, Freddolino PL, Zhang Y, Bhat P, Supek F, Fernández JM, Gemovic B, Perovic VR, Davidović RS, Sumonja N, Veljkovic N, Asgari E, Mofrad MRK, Profiti G, Savojardo C, Martelli PL, Casadio R, Boecker F, Schoof H, Kahanda I, Thurlby N, McHardy AC, Renaux A, Saidi R, Gough J, Freitas AA, Antczak M, Fabris F, Wass MN, Hou J, Cheng J, Wang Z, Romero AE, Paccanaro A, Yang H, Goldberg T, Zhao C, Holm L, Törönen P, Medlar AJ, Zosa E, Borukhov I, Novikov I, Wilkins A, Lichtarge O, Chi PH, Tseng WC, Linial M, Rose PW, Dessimoz C, Vidulin V, Dzeroski S, Sillitoe I, Das S, Lees JG, Jones DT, Wan C, Cozzetto D, Fa R, Torres M, Warwick Vesztrocy A, Rodriguez JM, Tress ML, Frasca M, Notaro M, Grossi G, Petrini A, Re M, Valentini G, Mesiti M, Roche DB, Reeb J, Ritchie DW, Aridhi S, Alborzi SZ, Devignes MD, Koo DCE, Bonneau R, Gligorijević V, Barot M, Fang H, Toppo S, Lavezzo E, Falda M, Berselli M, Tosatto SCE, Carraro M, Piovesan D, Ur Rehman H, Mao Q, Zhang S, Vucetic S, Black GS, Jo D, Suh E, Dayton JB, Larsen DJ, Omdahl AR, McGuffin LJ, Brackenridge DA, Babbitt PC, Yunes JM, Fontana P, Zhang F, Zhu S, You R, Zhang Z, Dai S, Yao S, Tian W, Cao R, Chandler C, Amezola M, Johnson D, Chang JM, Liao WH, Liu YW, Pascarelli S, Frank Y, Hoehndorf R, Kulmanov M, Boudellioua I, Politano G, Di Carlo S, Benso A, Hakala K, Ginter F, Mehryary F, Kaewphan S, Björne J, Moen H, Tolvanen MEE, Salakoski T, Kihara D, Jain A, Šmuc T, Altenhoff A, Ben-Hur A, Rost B, Brenner SE, Orengo CA, Jeffery CJ, Bosco G, Hogan DA, Martin MJ, O'Donovan C, Mooney SD, Greene CS, Radivojac P, Friedberg I. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol 2019; 20:244. [PMID: 31744546 PMCID: PMC6864930 DOI: 10.1186/s13059-019-1835-8] [Citation(s) in RCA: 166] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 09/24/2019] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND The Critical Assessment of Functional Annotation (CAFA) is an ongoing, global, community-driven effort to evaluate and improve the computational annotation of protein function. RESULTS Here, we report on the results of the third CAFA challenge, CAFA3, that featured an expanded analysis over the previous CAFA rounds, both in terms of volume of data analyzed and the types of analysis performed. In a novel and major new development, computational predictions and assessment goals drove some of the experimental assays, resulting in new functional annotations for more than 1000 genes. Specifically, we performed experimental whole-genome mutation screening in Candida albicans and Pseudomonas aureginosa genomes, which provided us with genome-wide experimental data for genes associated with biofilm formation and motility. We further performed targeted assays on selected genes in Drosophila melanogaster, which we suspected of being involved in long-term memory. CONCLUSION We conclude that while predictions of the molecular function and biological process annotations have slightly improved over time, those of the cellular component have not. Term-centric prediction of experimental annotations remains equally challenging; although the performance of the top methods is significantly better than the expectations set by baseline methods in C. albicans and D. melanogaster, it leaves considerable room and need for improvement. Finally, we report that the CAFA community now involves a broad range of participants with expertise in bioinformatics, biological experimentation, biocuration, and bio-ontologies, working together to improve functional annotation, computational function prediction, and our ability to manage big data in the era of large experimental screens.
Collapse
Affiliation(s)
- Naihui Zhou
- Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA.,Program in Bioinformatics and Computational Biology, Ames, IA, USA
| | - Yuxiang Jiang
- Indiana University Bloomington, Bloomington, Indiana, USA
| | - Timothy R Bergquist
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Alexandra J Lee
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
| | - Balint Z Kacsoh
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Department of Molecular and Systems Biology, Hanover, NH, USA
| | - Alex W Crocker
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Kimberley A Lewis
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - George Georghiou
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
| | - Huy N Nguyen
- Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA.,Program in Computer Science, Ames, IA, USA
| | - Md Nafiz Hamid
- Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA.,Program in Bioinformatics and Computational Biology, Ames, IA, USA
| | - Larry Davis
- Program in Bioinformatics and Computational Biology, Ames, IA, USA
| | - Tunca Dogan
- Department of Computer Engineering, Hacettepe University, Ankara, Turkey.,European Molecular Biolo gy Labora tory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Volkan Atalay
- Department of Computer Engineering, Middle East Technical University (METU), Ankara, Turkey
| | - Ahmet S Rifaioglu
- Department of Computer Engineering, Middle East Technical University (METU), Ankara, Turkey.,Department of Computer Engineering, Iskenderun Technical University, Hatay, Turkey
| | - Alperen Dalkıran
- Department of Computer Engineering, Middle East Technical University (METU), Ankara, Turkey
| | - Rengul Cetin Atalay
- CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Rebecca L Hurto
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
| | | | - Fran Supek
- Institute for Research in Biomedicine (IRB Barcelona), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - José M Fernández
- INB Coordination Unit, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Catalonia, Spain.,(former) INB GN2, Structural and Computational Biology Programme, Spanish National Cancer Research Centre, Barcelona, Catalonia, Spain
| | - Branislava Gemovic
- Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade, Serbia
| | - Vladimir R Perovic
- Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade, Serbia
| | - Radoslav S Davidović
- Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade, Serbia
| | - Neven Sumonja
- Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade, Serbia
| | - Nevena Veljkovic
- Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade, Serbia
| | - Ehsaneddin Asgari
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering, University of California Berkeley, Berkeley, CA, USA.,Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Berkeley, CA, USA
| | | | - Giuseppe Profiti
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.,National Research Council, IBIOM, Bologna, Italy
| | - Castrense Savojardo
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Florian Boecker
- University of Bonn: INRES Crop Bioinformatics, Bonn, North Rhine-Westphalia, Germany
| | - Heiko Schoof
- INRES Crop Bioinformatics, University of Bonn, Bonn, Germany
| | - Indika Kahanda
- Gianforte School of Computing, Montana State University, Bozeman, Montana, USA
| | - Natalie Thurlby
- University of Bristol, Computer Science, Bristol, Bristol, United Kingdom
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Brunswick, Germany.,RESIST, DFG Cluster of Excellence 2155, Brunswick, Germany
| | - Alexandre Renaux
- Interuniversity Institute of Bioinformatics in Brussels, Université libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium.,Machine Learning Group, Université libre de Bruxelles, Brussels, Belgium.,Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Rabie Saidi
- European Molecular Biolo gy Labora tory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Julian Gough
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Alex A Freitas
- University of Kent, School of Computing, Canterbury, United Kingdom
| | - Magdalena Antczak
- School of Biosciences, University of Kent, Canterbury, Kent, United Kingdom
| | - Fabio Fabris
- University of Kent, School of Computing, Canterbury, United Kingdom
| | - Mark N Wass
- School of Biosciences, University of Kent, Canterbury, Kent, United Kingdom
| | - Jie Hou
- University of Missouri, Computer Science, Columbia, Missouri, USA.,Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
| | - Zheng Wang
- University of Miami, Coral Gables, Florida, USA
| | - Alfonso E Romero
- Centre for Systems and Synthetic Biology, Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
| | - Alberto Paccanaro
- Centre for Systems and Synthetic Biology, Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
| | - Haixuan Yang
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway, Galway, Ireland.,Technical University of Munich, Garching, Germany
| | - Tatyana Goldberg
- Department of Informatics, Bioinformatics & Computational Biology-i12, Technische Universitat Munchen, Munich, Germany
| | - Chenguang Zhao
- Faculty for Informatics, Garching, Germany.,Department for Bioinformatics and Computational Biology, Garching, Germany.,School of Computing Sciences and Computer Engineering, Hattiesburg, Mississippi, USA
| | - Liisa Holm
- Institute of Biotechnology, Helsinki Institute of Life Sciences, University of Helsinki, Finland, Helsinki, Finland
| | - Petri Törönen
- Institute of Biotechnology, Helsinki Institute of Life Sciences, University of Helsinki, Finland, Helsinki, Finland
| | - Alan J Medlar
- Institute of Biotechnology, Helsinki Institute of Life Sciences, University of Helsinki, Finland, Helsinki, Finland
| | - Elaine Zosa
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | | | - Ilya Novikov
- Baylor College of Medicine, Department of Biochemistry and Molecular Biology, Houston, TX, USA
| | - Angela Wilkins
- Baylor College of Medicine, Department of Molecular and Human Genetics, Houston, TX, USA
| | - Olivier Lichtarge
- Baylor College of Medicine, Department of Molecular and Human Genetics, Houston, TX, USA
| | - Po-Han Chi
- National TsingHua University, Hsinchu, Taiwan
| | - Wei-Cheng Tseng
- Department of Electrical Engineering in National Tsing Hua University, Hsinchu City, Taiwan
| | - Michal Linial
- The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Peter W Rose
- University of California San Diego, San Diego Supercomputer Center, La Jolla, California, USA
| | - Christophe Dessimoz
- Department of Computational Biology and Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution & Environment, and Department of Computer Science, University College London, London, UK.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Vedrana Vidulin
- Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana, Slovenia
| | - Saso Dzeroski
- Jozef Stefan Institute, Ljubljana, Slovenia.,Jozef Stefan International Postgraduate School, Ljubljana, Slovenia
| | - Ian Sillitoe
- Research Department of Structural and Molecular Biology, University College London, London, England
| | - Sayoni Das
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Jonathan Gill Lees
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom.,Department of Health and Life Sciences, Oxford Brookes University, London, UK
| | - David T Jones
- The Francis Crick Institute, Biomedical Data Science Laboratory, London, United Kingdom.,Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - Cen Wan
- Department of Computer Science, University College London, London, United Kingdom.,The Francis Crick Institute, Biomedical Data Science Laboratory, London, United Kingdom
| | - Domenico Cozzetto
- Department of Computer Science, University College London, London, United Kingdom.,The Francis Crick Institute, Biomedical Data Science Laboratory, London, United Kingdom
| | - Rui Fa
- Department of Computer Science, University College London, London, United Kingdom.,The Francis Crick Institute, Biomedical Data Science Laboratory, London, United Kingdom
| | - Mateo Torres
- Centre for Systems and Synthetic Biology, Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
| | - Alex Warwick Vesztrocy
- Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, United Kingdom.,SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
| | - Jose Manuel Rodriguez
- Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
| | - Michael L Tress
- Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Marco Frasca
- Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
| | - Marco Notaro
- Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
| | - Giuliano Grossi
- Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
| | - Alessandro Petrini
- Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
| | - Matteo Re
- Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
| | - Giorgio Valentini
- Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
| | - Marco Mesiti
- Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy.,Institut de Biologie Computationnelle, LIRMM, CNRS-UMR 5506, Universite de Montpellier, Montpellier, France
| | - Daniel B Roche
- Department of Informatics, Bioinformatics and Computational Biology-i12, Technische Universitat Munchen, Munich, Germany
| | - Jonas Reeb
- Department of Informatics, Bioinformatics and Computational Biology-i12, Technische Universitat Munchen, Munich, Germany
| | - David W Ritchie
- University of Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France
| | - Sabeur Aridhi
- University of Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France
| | | | - Marie-Dominique Devignes
- University of Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France.,University of Lorraine, Nancy, Lorraine, France.,Inria, Nancy, France
| | | | - Richard Bonneau
- NYU Center for Data Science, New York, 10010, NY, USA.,Flatiron Institute, CCB, New York, 10010, NY, USA
| | - Vladimir Gligorijević
- Center for Computational Biology (CCB), Flatiron Institute, Simons Foundation, New York, New York, USA
| | - Meet Barot
- Center for Data Science, New York University, New York, 10011, NY, USA
| | - Hai Fang
- Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Enrico Lavezzo
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Marco Falda
- Department of Biology, University of Padova, Padova, Italy
| | - Michele Berselli
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Silvio C E Tosatto
- CNR Institute of Neuroscience, Padova, Italy.,Department of Biomedical Sciences, University of Padua, Padova, Italy
| | - Marco Carraro
- Department of Biomedical Sciences, University of Padua, Padova, Italy
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padua, Padova, Italy
| | - Hafeez Ur Rehman
- Department of Computer Science, National University of Computer and Emerging Sciences, Peshawar, Khyber Pakhtoonkhwa, Pakistan
| | - Qizhong Mao
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA.,University of California, Riverside, Philadelphia, PA, USA
| | - Shanshan Zhang
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Slobodan Vucetic
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Gage S Black
- Department of Biology, Brigham Young University, Provo, UT, USA.,Bioinformatics Research Group, Provo, UT, USA
| | - Dane Jo
- Department of Biology, Brigham Young University, Provo, UT, USA.,Bioinformatics Research Group, Provo, UT, USA
| | - Erica Suh
- Department of Biology, Brigham Young University, Provo, UT, USA
| | - Jonathan B Dayton
- Department of Biology, Brigham Young University, Provo, UT, USA.,Bioinformatics Research Group, Provo, UT, USA
| | - Dallas J Larsen
- Department of Biology, Brigham Young University, Provo, UT, USA.,Bioinformatics Research Group, Provo, UT, USA
| | - Ashton R Omdahl
- Department of Biology, Brigham Young University, Provo, UT, USA.,Bioinformatics Research Group, Provo, UT, USA
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, England, United Kingdom
| | | | - Patricia C Babbitt
- Department of Pharmaceutical Chemistry, San Francisco, CA, USA.,Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, 94158, CA, USA
| | - Jeffrey M Yunes
- UC Berkeley - UCSF Graduate Program in Bioengineering, University of California, San Francisco, 94158, CA, USA.,Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, 94158, CA, USA
| | - Paolo Fontana
- Research and Innovation Center, Edmund Mach Foundation, San Michele all'Adige, Italy
| | - Feng Zhang
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, Shanghai, China.,Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, Shanghai, China
| | - Shanfeng Zhu
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence and Shanghai Institute of Artificial Intelligence Algorithms, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Ronghui You
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence and Shanghai Institute of Artificial Intelligence Algorithms, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Zihan Zhang
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Suyang Dai
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Shuwei Yao
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence and Shanghai Institute of Artificial Intelligence Algorithms, Fudan University, Shanghai, China
| | - Weidong Tian
- State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, Shanghai, China.,Department of Pediatrics, Brain Tumor Center, Division of Experimental Hematology and Cancer Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, USA
| | - Caleb Chandler
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, USA
| | - Miguel Amezola
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, USA
| | - Devon Johnson
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, USA
| | - Jia-Ming Chang
- Department of Computer Science, National Chengchi University, Taipei, Taiwan
| | - Wen-Hung Liao
- Department of Computer Science, National Chengchi University, Taipei, Taiwan
| | - Yi-Wei Liu
- Department of Computer Science, National Chengchi University, Taipei, Taiwan
| | | | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Jeddah, Saudi Arabia
| | - Maxat Kulmanov
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Jeddah, Saudi Arabia
| | - Imane Boudellioua
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.,Computer, Electrical and Mathematical Sciences Engineering Division (CEMSE), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Gianfranco Politano
- Control and Computer Engineering Department, Politecnico di Torino, Torino, TO, Italy
| | - Stefano Di Carlo
- Control and Computer Engineering Department, Politecnico di Torino, Torino, TO, Italy
| | - Alfredo Benso
- Control and Computer Engineering Department, Politecnico di Torino, Torino, TO, Italy
| | - Kai Hakala
- Department of Future Technologies, Turku NLP Group, University of Turku, Turku, Finland.,University of Turku Graduate School (UTUGS), Turku, Finland
| | - Filip Ginter
- Department of Future Technologies, Turku NLP Group, University of Turku, Turku, Finland.,University of Turku, Turku, Finland
| | - Farrokh Mehryary
- Department of Future Technologies, Turku NLP Group, University of Turku, Turku, Finland.,University of Turku Graduate School (UTUGS), Turku, Finland
| | - Suwisa Kaewphan
- Department of Future Technologies, Turku NLP Group, University of Turku, Turku, Finland.,University of Turku Graduate School (UTUGS), Turku, Finland.,Turku Centre for Computer Science (TUCS), Turku, Finland
| | - Jari Björne
- Department of Future Technologies, Faculty of Science and Engineering, University of Turku, Turku, FI-20014, Finland.,Turku Centre for Computer Science (TUCS), Agora, Vesilinnantie 3, Turku, FI-20500, Finland
| | | | | | - Tapio Salakoski
- Department of Future Technologies, Faculty of Science and Engineering, University of Turku, Turku, FI-20014, Finland.,Turku Centre for Computer Science (TUCS), Agora, Vesilinnantie 3, Turku, FI-20500, Finland
| | - Daisuke Kihara
- Department of Biological Sciences, Department of Computer Science, Purdue University, 47907, IN, USA.,Department of Pediatrics, University of Cincinnati, Cincinnati, 45229, OH, USA
| | - Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Tomislav Šmuc
- Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia
| | - Adrian Altenhoff
- Department of Computer Science, ETH Zurich, Zurich, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Asa Ben-Hur
- Department of Computer Science, Colorado State University, Fort Collins, CO, USA
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology-i12, Technische Universitat Munchen, Munich, Germany.,Institute for Food and Plant Sciences WZW, Technische Universität München, Freising, Germany
| | | | - Christine A Orengo
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Constance J Jeffery
- Biological Sciences, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Giovanni Bosco
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Deborah A Hogan
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
| | - Claire O'Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, Pennsylvania, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
| | - Iddo Friedberg
- Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA.
| |
Collapse
|
36
|
Zhang J, Kinch LN, Cong Q, Katsonis P, Lichtarge O, Savojardo C, Babbi G, Martelli PL, Capriotti E, Casadio R, Garg A, Pal D, Weile J, Sun S, Verby M, Roth FP, Grishin NV. Assessing predictions on fitness effects of missense variants in calmodulin. Hum Mutat 2019; 40:1463-1473. [PMID: 31283071 DOI: 10.1002/humu.23857] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 06/26/2019] [Accepted: 06/27/2019] [Indexed: 12/19/2022]
Abstract
This paper reports the evaluation of predictions for the "CALM1" challenge in the fifth round of the Critical Assessment of Genome Interpretation held in 2018. In the challenge, the participants were asked to predict effects on yeast growth caused by missense variants of human calmodulin, a highly conserved protein in eukaryotic cells sensing calcium concentration. The performance of predictors implementing different algorithms and methods is similar. Most predictors are able to identify the deleterious or tolerated variants with modest accuracy, with a baseline predictor based purely on sequence conservation slightly outperforming the submitted predictions. Nevertheless, we think that the accuracy of predictions remains far from satisfactory, and the field awaits substantial improvements. The most poorly predicted variants in this round surround functional CALM1 sites that bind calcium or peptide, which suggests that better incorporation of structural analysis may help improve predictions.
Collapse
Affiliation(s)
- Jing Zhang
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Qian Cong
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Department of Biochemistry & Molecular Biology, Department of Pharmacology, Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas
| | - Castrense Savojardo
- Biocomputing Group, FABIT/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Biocomputing Group, FABIT/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, FABIT/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Bologna, Italy
| | - Emidio Capriotti
- Biocomputing Group, FABIT/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, FABIT/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Bologna, Italy
| | - Aditi Garg
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
| | - Debnath Pal
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
| | - Jochen Weile
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada.,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Song Sun
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada.,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Marta Verby
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada.,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Frederick P Roth
- Lunenfeld-Tanenbaum Research Institute, Toronto, Ontario, Canada.,The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.,Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Nick V Grishin
- Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas.,Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas
| |
Collapse
|
37
|
Pejaver V, Babbi G, Casadio R, Folkman L, Katsonis P, Kundu K, Lichtarge O, Martelli PL, Miller M, Moult J, Pal LR, Savojardo C, Yin Y, Zhou Y, Radivojac P, Bromberg Y. Assessment of methods for predicting the effects of PTEN and TPMT protein variants. Hum Mutat 2019; 40:1495-1506. [PMID: 31184403 PMCID: PMC6744362 DOI: 10.1002/humu.23838] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 05/27/2019] [Accepted: 06/06/2019] [Indexed: 01/16/2023]
Abstract
Thermodynamic stability is a fundamental property shared by all proteins. Changes in stability due to mutation are a widespread molecular mechanism in genetic diseases. Methods for the prediction of mutation-induced stability change have typically been developed and evaluated on incomplete and/or biased data sets. As part of the Critical Assessment of Genome Interpretation, we explored the utility of high-throughput variant stability profiling (VSP) assay data as an alternative for the assessment of computational methods and evaluated state-of-the-art predictors against over 7,000 nonsynonymous variants from two proteins. We found that predictions were modestly correlated with actual experimental values. Predictors fared better when evaluated as classifiers of extreme stability effects. While different methods emerging as top performers depending on the metric, it is nontrivial to draw conclusions on their adoption or improvement. Our analyses revealed that only 16% of all variants in VSP assays could be confidently defined as stability-affecting. Furthermore, it is unclear as to what extent VSP abundance scores were reasonable proxies for the stability-related quantities that participating methods were designed to predict. Overall, our observations underscore the need for clearly defined objectives when developing and using both computational and experimental methods in the context of measuring variant impact.
Collapse
Affiliation(s)
- Vikas Pejaver
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington
- The eScience Institute, University of Washington, Seattle, Washington
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Lukas Folkman
- School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Kunal Kundu
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, Maryland
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, Houston, Texas
- Department of Pharmacology, Baylor College of Medicine, Houston, Texas
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| | - Lipika R Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Yizhou Yin
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey
- Department of Genetics, Human Genetics Institute, Rutgers University, Piscataway, New Jersey
- Institute for Advanced Study at Technische Universität München (TUM-IAS), Garching/Munich, Germany
| |
Collapse
|
38
|
Kasak L, Bakolitsa C, Hu Z, Yu C, Rine J, Dimster-Denk DF, Pandey G, Baets GD, Bromberg Y, Cao C, Capriotti E, Casadio R, Durme JV, Giollo M, Karchin R, Katsonis P, Leonardi E, Lichtarge O, Martelli PL, Masica D, Mooney SD, Olatubosun A, Pal LR, Radivojac P, Rousseau F, Savojardo C, Schymkowitz J, Thusberg J, Tosatto SC, Vihinen M, Väliaho J, Repo S, Moult J, Brenner SE, Friedberg I. Assessing computational predictions of the phenotypic effect of cystathionine-beta-synthase variants. Hum Mutat 2019; 40:1530-1545. [PMID: 31301157 PMCID: PMC7325732 DOI: 10.1002/humu.23868] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 06/22/2019] [Accepted: 07/09/2019] [Indexed: 12/28/2022]
Abstract
Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.
Collapse
Affiliation(s)
- Laura Kasak
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
- Institute of Biomedicine and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Constantina Bakolitsa
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Changhua Yu
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Jasper Rine
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Dago F. Dimster-Denk
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Gaurav Pandey
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Greet De Baets
- Switch Laboratory, VIB Center for Brain and Disease Research, Leuven, Belgium
- Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA
| | - Chen Cao
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD, USA
| | - Emidio Capriotti
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Joost Van Durme
- Switch Laboratory, VIB Center for Brain and Disease Research, Leuven, Belgium
- Vrije Universiteit Brussel, Brussels, Belgium
| | - Manuel Giollo
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | - Rachel Karchin
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - David Masica
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, USA
| | | | - Ayodeji Olatubosun
- Institute of Medical Technology, University of Tampere, Tampere, Finland
| | - Lipika R. Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
| | - Predrag Radivojac
- School of Informatics and Computing, Indiana University, Bloomington, IN, USA
| | - Frederic Rousseau
- Switch Laboratory, VIB Center for Brain and Disease Research, Leuven, Belgium
- Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Joost Schymkowitz
- Switch Laboratory, VIB Center for Brain and Disease Research, Leuven, Belgium
- Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | | | | | - Mauno Vihinen
- Institute of Medical Technology, University of Tampere, Tampere, Finland
| | - Jouni Väliaho
- Institute of Medical Technology, University of Tampere, Tampere, Finland
| | - Susanna Repo
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - John Moult
- Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Iddo Friedberg
- Department of Microbiology, Miami University, Oxford, OH, USA
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA USA
| |
Collapse
|
39
|
Clark WT, Kasak L, Bakolitsa C, Hu Z, Andreoletti G, Babbi G, Bromberg Y, Casadio R, Dunbrack R, Folkman L, Ford CT, Jones D, Katsonis P, Kundu K, Lichtarge O, Martelli PL, Mooney SD, Nodzak C, Pal LR, Radivojac P, Savojardo C, Shi X, Zhou Y, Uppal A, Xu Q, Yin Y, Pejaver V, Wang M, Wei L, Moult J, Yu GK, Brenner SE, LeBowitz JH. Assessment of predicted enzymatic activity of α-N-acetylglucosaminidase variants of unknown significance for CAGI 2016. Hum Mutat 2019; 40:1519-1529. [PMID: 31342580 PMCID: PMC7156275 DOI: 10.1002/humu.23875] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Revised: 06/27/2019] [Accepted: 07/15/2019] [Indexed: 12/25/2022]
Abstract
The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis.
Collapse
Affiliation(s)
| | - Laura Kasak
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
- Institute of Biomedicine and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Constantina Bakolitsa
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Lukas Folkman
- School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Colby T. Ford
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, NC, USA
| | - David Jones
- Bioinformatics Group, Department of Computer Science, University College London, UK
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Kunal Kundu
- University of Maryland, College Park, MD, USA
| | - Olivier Lichtarge
- Departments of Molecular and Human Genetics, Biochemistry & Molecular Biology, Pharmacology, and Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Conor Nodzak
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, NC, USA
| | | | - Predrag Radivojac
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, NC, USA
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Aneeta Uppal
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, NC, USA
| | - Qifang Xu
- Fox Chase Cancer Center, Philadelphia, PA, USA
| | - Yizhou Yin
- University of Maryland, College Park, MD, USA
| | - Vikas Pejaver
- Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA
| | - Meng Wang
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, P.R. China
| | - Liping Wei
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, P.R. China
| | - John Moult
- University of Maryland, College Park, MD, USA
| | - G. Karen Yu
- BioMarin Pharmaceutical, San Rafael, California, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | | |
Collapse
|
40
|
Voskanian A, Katsonis P, Lichtarge O, Pejaver V, Radivojac P, Mooney SD, Capriotti E, Bromberg Y, Wang Y, Miller M, Martelli PL, Savojardo C, Babbi G, Casadio R, Cao Y, Sun Y, Shen Y, Garg A, Pal D, Yu Y, Huff CD, Tavtigian SV, Young E, Neuhausen SL, Ziv E, Pal LR, Andreoletti G, Brenner S, Kann MG. Assessing the performance of in silico methods for predicting the pathogenicity of variants in the gene CHEK2, among Hispanic females with breast cancer. Hum Mutat 2019; 40:1612-1622. [PMID: 31241222 PMCID: PMC6744287 DOI: 10.1002/humu.23849] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2019] [Revised: 05/23/2019] [Accepted: 06/21/2019] [Indexed: 01/22/2023]
Abstract
The availability of disease-specific genomic data is critical for developing new computational methods that predict the pathogenicity of human variants and advance the field of precision medicine. However, the lack of gold standards to properly train and benchmark such methods is one of the greatest challenges in the field. In response to this challenge, the scientific community is invited to participate in the Critical Assessment for Genome Interpretation (CAGI), where unpublished disease variants are available for classification by in silico methods. As part of the CAGI-5 challenge, we evaluated the performance of 18 submissions and three additional methods in predicting the pathogenicity of single nucleotide variants (SNVs) in checkpoint kinase 2 (CHEK2) for cases of breast cancer in Hispanic females. As part of the assessment, the efficacy of the analysis method and the setup of the challenge were also considered. The results indicated that though the challenge could benefit from additional participant data, the combined generalized linear model analysis and odds of pathogenicity analysis provided a framework to evaluate the methods submitted for SNV pathogenicity identification and for comparison to other available methods. The outcome of this challenge and the approaches used can help guide further advancements in identifying SNV-disease relationships.
Collapse
Affiliation(s)
- Alin Voskanian
- Department of Biological Sciences, University of Maryland, Baltimore County, MD, U.S.A
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, U.S.A
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, U.S.A
- Department of Pharmacology, Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Vikas Pejaver
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, U.S.A
- The eScience Institute, University of Washington, Seattle, Washington, U.S.A
| | - Predrag Radivojac
- Khoury College of Computer and Information Sciences, Northeastern University, Boston, Massachusetts, U.S.A
| | - Sean D. Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, U.S.A
| | - Emidio Capriotti
- BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via Selmi 3, 40126 Bologna, Italy
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey, U.S.A
- Department of Genetics, Rutgers University, New Brunswick, New Jersey, U.S.A
- Technical University of Munich Institute for Advanced Study, (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany
| | - Yanran Wang
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey, U.S.A
| | - Max Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey, U.S.A
| | - Pier Luigi Martelli
- Biocomputing Group, BiGeA/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Via F. Selmi 3, Bologna, 40126, Italy
| | - Castrense Savojardo
- Biocomputing Group, BiGeA/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Via F. Selmi 3, Bologna, 40126, Italy
| | - Giulia Babbi
- Biocomputing Group, BiGeA/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Via F. Selmi 3, Bologna, 40126, Italy
| | - Rita Casadio
- Biocomputing Group, BiGeA/Giorgio Prodi Interdepartmental Center for Cancer Research, University of Bologna, Via F. Selmi 3, Bologna, 40126, Italy
| | - Yue Cao
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128, U.S.A
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128, U.S.A
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128, U.S.A
| | - Aditi Garg
- Department of Computational and Data Sciences Indian Institute of Science, Bengaluru 560 012, India
| | - Debnath Pal
- Department of Computational and Data Sciences Indian Institute of Science, Bengaluru 560 012, India
| | - Yao Yu
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, U.S.A
| | - Chad D. Huff
- Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, U.S.A
| | - Sean V. Tavtigian
- Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, UT 84132, U.S.A
| | - Erin Young
- Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, UT 84132, U.S.A
| | - Susan L. Neuhausen
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, CA, 91010 U.S.A
| | - Elad Ziv
- Division of General Internal Medicine, Department of Medicine, Institute of Human Genetics, Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA,U.S.A
| | - Lipika R. Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Steven Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Maricel G. Kann
- Department of Biological Sciences, University of Maryland, Baltimore County, MD, U.S.A
| |
Collapse
|
41
|
Savojardo C, Petrosino M, Babbi G, Bovo S, Corbi-Verge C, Casadio R, Fariselli P, Folkman L, Garg A, Karimi M, Katsonis P, Kim PM, Lichtarge O, Martelli PL, Pasquo A, Pal D, Shen Y, Strokach AV, Turina P, Zhou Y, Andreoletti G, Brenner S, Chiaraluce R, Consalvi V, Capriotti E. Evaluating the predictions of the protein stability change upon single amino acid substitutions for the FXN CAGI5 challenge. Hum Mutat 2019; 40:1392-1399. [PMID: 31209948 PMCID: PMC6744327 DOI: 10.1002/humu.23843] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 06/02/2019] [Accepted: 06/09/2019] [Indexed: 12/31/2022]
Abstract
Frataxin (FXN) is a highly conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the FXN to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (FXN challenge data set) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant ( Δ Δ G H 2 O ) . The FXN challenge data set, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the Δ Δ G H 2 O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe, and North America submitted 12 sets of predictions from different approaches. In this paper, we report the results of our assessment and discuss the limitations of the tested algorithms.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Maria Petrosino
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Roma, Roma, Italy
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Samuele Bovo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Carles Corbi-Verge
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| | - Piero Fariselli
- Department of Medical Sciences University of Torino, 10126 Torino, Italy
| | - Lukas Folkman
- School of Information and Communication Technology, Griffith University, Parklands Dr, Southport, QLD 4222, Australia
| | - Aditi Garg
- Department of Computational and Data Sciences. Indian Institute of Science, Bengaluru 560 012, India
| | - Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77840, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Philip M. Kim
- Donnelly Center for Cellular and Biomolecular Research, University of Toronto, 160 College St, Toronto, ON M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, 1 King’s College Cir, Toronto, ON M5S 1A8, Canada
- Department of Computer Science, University of Toronto, 214 College St, Toronto, ON M5T 3A1, Canada
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Biochemistry & Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
- Department of Pharmacology, Baylor College of Medicine, Houston, Texas 77030, USA
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Italy
| | - Alessandra Pasquo
- ENEA CR Frascati, Diagnostics and Metrology Laboratory,FSN-TECFIS-DIM, Frascati, Italy
| | - Debnath Pal
- Department of Computational and Data Sciences. Indian Institute of Science, Bengaluru 560 012, India
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77840, USA
| | - Alexey V. Strokach
- Department of Computer Science, University of Toronto, 214 College St, Toronto, ON M5T 3A1, Canada
| | - Paola Turina
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Parklands Dr, Southport, QLD 4222, Australia
- Institute for Glycomics, Griffith University, Parklands Dr, Southport QLD 4222, Australia
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Steven Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Roberta Chiaraluce
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Roma, Roma, Italy
| | - Valerio Consalvi
- Department of Biochemical Sciences “A. Rossi Fanelli”, Sapienza University of Roma, Roma, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy
| |
Collapse
|
42
|
Kasak L, Hunter JM, Udani R, Bakolitsa C, Hu Z, Adhikari AN, Babbi G, Casadio R, Gough J, Guerrero RF, Jiang Y, Joseph T, Katsonis P, Kotte S, Kundu K, Lichtarge O, Martelli PL, Mooney SD, Moult J, Pal LR, Poitras J, Radivojac P, Rao A, Sivadasan N, Sunderam U, VG S, Yin Y, Zaucha J, Brenner SE, Meyn MS. CAGI SickKids challenges: Assessment of phenotype and variant predictions derived from clinical and genomic data of children with undiagnosed diseases. Hum Mutat 2019; 40:1373-1391. [PMID: 31322791 PMCID: PMC7318886 DOI: 10.1002/humu.23874] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 07/15/2019] [Accepted: 07/15/2019] [Indexed: 01/02/2023]
Abstract
Whole-genome sequencing (WGS) holds great potential as a diagnostic test. However, the majority of patients currently undergoing WGS lack a molecular diagnosis, largely due to the vast number of undiscovered disease genes and our inability to assess the pathogenicity of most genomic variants. The CAGI SickKids challenges attempted to address this knowledge gap by assessing state-of-the-art methods for clinical phenotype prediction from genomes. CAGI4 and CAGI5 participants were provided with WGS data and clinical descriptions of 25 and 24 undiagnosed patients from the SickKids Genome Clinic Project, respectively. Predictors were asked to identify primary and secondary causal variants. In addition, for CAGI5, groups had to match each genome to one of three disorder categories (neurologic, ophthalmologic, and connective), and separately to each patient. The performance of matching genomes to categories was no better than random but two groups performed significantly better than chance in matching genomes to patients. Two of the ten variants proposed by two groups in CAGI4 were deemed to be diagnostic, and several proposed pathogenic variants in CAGI5 are good candidates for phenotype expansion. We discuss implications for improving in silico assessment of genomic variants and identifying new disease genes.
Collapse
Affiliation(s)
- Laura Kasak
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
- Institute of Biomedicine and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Jesse M. Hunter
- Department of Pediatrics and Wisconsin State Lab of Hygiene, University of Wisconsin Madison, WI, USA
| | - Rupa Udani
- Department of Pediatrics and Wisconsin State Lab of Hygiene, University of Wisconsin Madison, WI, USA
| | - Constantina Bakolitsa
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Aashish N. Adhikari
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Julian Gough
- Department of Computer Science, University of Bristol, Bristol, UK
| | | | - Yuxiang Jiang
- Department of Computer Science, Indiana University, IN, USA
| | | | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Kunal Kundu
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Department of Biochemistry & Molecular Biology, Department of Pharmacology, Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Sean D. Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, WA, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, MD, USA
| | - Lipika R. Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, MA, USA
| | | | | | | | | | - Yizhou Yin
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD, USA
| | - Jan Zaucha
- Department of Computer Science, University of Bristol, Bristol, UK
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - M. Stephen Meyn
- Center for Human Genomics and Precision Medicine, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA
- Department of Paediatrics, The Hospital for Sick Children, Toronto, Canada
| |
Collapse
|
43
|
Monzon AM, Carraro M, Chiricosta L, Reggiani F, Han J, Ozturk K, Wang Y, Miller M, Bromberg Y, Capriotti E, Savojardo C, Babbi G, Martelli PL, Casadio R, Katsonis P, Lichtarge O, Carter H, Kousi M, Katsanis N, Andreoletti G, Moult J, Brenner SE, Ferrari C, Leonardi E, Tosatto SCE. Performance of computational methods for the evaluation of pericentriolar material 1 missense variants in CAGI-5. Hum Mutat 2019; 40:1474-1485. [PMID: 31260570 PMCID: PMC7354699 DOI: 10.1002/humu.23856] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 05/30/2019] [Accepted: 06/23/2019] [Indexed: 12/11/2022]
Abstract
The CAGI-5 pericentriolar material 1 (PCM1) challenge aimed to predict the effect of 38 transgenic human missense mutations in the PCM1 protein implicated in schizophrenia. Participants were provided with 16 benign variants (negative controls), 10 hypomorphic, and 12 loss of function variants. Six groups participated and were asked to predict the probability of effect and standard deviation associated to each mutation. Here, we present the challenge assessment. Prediction performance was evaluated using different measures to conclude in a final ranking which highlights the strengths and weaknesses of each group. The results show a great variety of predictions where some methods performed significantly better than others. Benign variants played an important role as negative controls, highlighting predictors biased to identify disease phenotypes. The best predictor, Bromberg lab, used a neural-network-based method able to discriminate between neutral and non-neutral single nucleotide polymorphisms. The CAGI-5 PCM1 challenge allowed us to evaluate the state of the art techniques for interpreting the effect of novel variants for a difficult target protein.
Collapse
Affiliation(s)
| | - Marco Carraro
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | - Luigi Chiricosta
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | - Francesco Reggiani
- Department of Biomedical Sciences, University of Padua, Padua, Italy
- Department of Information Engineering, University of Padua, Padua, Italy
| | - James Han
- Department of Medicine, University of California San Diego, La Jolla, California
| | - Kivilcim Ozturk
- Department of Medicine, University of California San Diego, La Jolla, California
| | - Yanran Wang
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey
| | - Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey
- Institute for Advanced Study, Technical University of Munich (TUM), Munich, Germany
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology, BioFolD Unit, University of Bologna, Bologna, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Pier L Martelli
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Hannah Carter
- Department of Medicine, University of California San Diego, La Jolla, California
| | - Maria Kousi
- MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), Broad Institute of MIT and Harvard, Cambridge, Massachusetts
| | - Nicholas Katsanis
- Center for Human Disease Modeling, Duke University Medical Center, Durham, North Carolina
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| | - Steven E Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Carlo Ferrari
- Department of Information Engineering, University of Padua, Padua, Italy
| | | | - Silvio C E Tosatto
- Department of Biomedical Sciences, University of Padua, Padua, Italy
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, California
| |
Collapse
|
44
|
Cline MS, Babbi G, Bonache S, Cao Y, Casadio R, de la Cruz X, Díez O, Gutiérrez-Enríquez S, Katsonis P, Lai C, Lichtarge O, Martelli PL, Mishne G, Moles-Fernández A, Montalban G, Mooney SD, O’Conner R, Ootes L, Özkan S, Padilla N, Pagel KA, Pejaver V, Radivojac P, Riera C, Savojardo C, Shen Y, Sun Y, Topper S, Parsons MT, Spurdle AB, Goldgar DE. Assessment of blind predictions of the clinical significance of BRCA1 and BRCA2 variants. Hum Mutat 2019; 40:1546-1556. [PMID: 31294896 PMCID: PMC6744348 DOI: 10.1002/humu.23861] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 07/01/2019] [Accepted: 07/02/2019] [Indexed: 12/31/2022]
Abstract
Testing for variation in BRCA1 and BRCA2 (commonly referred to as BRCA1/2), has emerged as a standard clinical practice and is helping countless women better understand and manage their heritable risk of breast and ovarian cancer. Yet the increased rate of BRCA1/2 testing has led to an increasing number of Variants of Uncertain Significance (VUS), and the rate of VUS discovery currently outpaces the rate of clinical variant interpretation. Computational prediction is a key component of the variant interpretation pipeline. In the CAGI5 ENIGMA Challenge, six prediction teams submitted predictions on 326 newly-interpreted variants from the ENIGMA Consortium. By evaluating these predictions against the new interpretations, we have gained a number of insights on the state of the art of variant prediction and specific steps to further advance this state of the art.
Collapse
Affiliation(s)
| | - Giulia Babbi
- Biocomputing Group, FaBiT Department, University of
Bologna, Bologna, Italy
| | - Sandra Bonache
- Oncogenetics Group, Vall d’Hebron Institute of
Oncology (VHIO), Barcelona, Spain
| | - Yue Cao
- Texas A&M University, College Station, TX, USA
| | - Rita Casadio
- Biocomputing Group, FaBiT Department, University of
Bologna, Bologna, Italy
| | - Xavier de la Cruz
- Clinical and Translational Bioinformatics Research Unit,
Vall d’Hebron Institute of Research (VHIR), Universitat Autònoma de
Barcelona, Barcelona, Spain
- Institució Catalana de Recerca i Estudis
Avançats (ICREA), Barcelona, Spain
| | - Orland Díez
- Oncogenetics Group, Vall d’Hebron Institute of
Oncology (VHIO), Barcelona, Spain
- Clinical and Translational Bioinformatics Research Unit,
Vall d’Hebron Institute of Research (VHIR), Universitat Autònoma de
Barcelona, Barcelona, Spain
| | | | | | - Carmen Lai
- Department of Biochemistry & Molecular Biology, Baylor
College of Medicine, Houston, TX, USA
| | - Olivier Lichtarge
- Department of Medical and Human Genetics, Baylor College
of Medicine, Houston, TX, USA
- Department of Biochemistry & Molecular Biology, Baylor
College of Medicine, Houston, TX, USA
- Department of Pharmacology, Baylor College of Medicine,
Houston, TX, USA
- Computational and Integrative Biomedical Research
Center, Baylor College of Medicine, Houston, TX, USA
| | | | | | - Alejandro Moles-Fernández
- Clinical and Translational Bioinformatics Research Unit,
Vall d’Hebron Institute of Research (VHIR), Universitat Autònoma de
Barcelona, Barcelona, Spain
| | - Gemma Montalban
- Clinical and Translational Bioinformatics Research Unit,
Vall d’Hebron Institute of Research (VHIR), Universitat Autònoma de
Barcelona, Barcelona, Spain
| | | | | | - Lars Ootes
- Clinical and Translational Bioinformatics Research Unit,
Vall d’Hebron Institute of Research (VHIR), Universitat Autònoma de
Barcelona, Barcelona, Spain
| | - Selen Özkan
- Clinical and Translational Bioinformatics Research Unit,
Vall d’Hebron Institute of Research (VHIR), Universitat Autònoma de
Barcelona, Barcelona, Spain
| | - Natalia Padilla
- Clinical and Translational Bioinformatics Research Unit,
Vall d’Hebron Institute of Research (VHIR), Universitat Autònoma de
Barcelona, Barcelona, Spain
| | | | | | - Predrag Radivojac
- Indiana University, Bloomington, IN, USA
- Northeastern University, Boston, MA, USA
| | - Casandra Riera
- Clinical and Translational Bioinformatics Research Unit,
Vall d’Hebron Institute of Research (VHIR), Universitat Autònoma de
Barcelona, Barcelona, Spain
| | | | - Yang Shen
- Texas A&M University, College Station, TX, USA
| | - Yuanfei Sun
- Texas A&M University, College Station, TX, USA
| | | | | | | | - David E. Goldgar
- Huntsman Cancer Institute, University of Utah, Salt Lake
City, UT, USA
| | | |
Collapse
|
45
|
Mount SM, Avsec Ž, Carmel L, Casadio R, Çelik MH, Chen K, Cheng J, Cohen NE, Fairbrother WG, Fenesh T, Gagneur J, Gotea V, Holzer T, Lin CF, Martelli PL, Naito T, Nguyen TYD, Savojardo C, Unger R, Wang R, Yang Y, Zhao H. Assessing predictions of the impact of variants on splicing in CAGI5. Hum Mutat 2019; 40:1215-1224. [PMID: 31301154 DOI: 10.1002/humu.23869] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 06/20/2019] [Accepted: 07/10/2019] [Indexed: 12/28/2022]
Abstract
Precision medicine and sequence-based clinical diagnostics seek to predict disease risk or to identify causative variants from sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype-phenotype prediction challenges; participants build models, undergo assessment, and share key findings. In the past, few CAGI challenges have addressed the impact of sequence variants on splicing. In CAGI5, two challenges (Vex-seq and MaPSY) involved prediction of the effect of variants, primarily single-nucleotide changes, on splicing. Although there are significant differences between these two challenges, both involved prediction of results from high-throughput exon inclusion assays. Here, we discuss the methods used to predict the impact of these variants on splicing, their performance, strengths, and weaknesses, and prospects for predicting the impact of sequence variation on splicing and disease phenotypes.
Collapse
Affiliation(s)
- Stephen M Mount
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| | - Žiga Avsec
- Department of Informatics, Technical University of Munich, Garching, Germany
| | - Liran Carmel
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | | | - Ken Chen
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| | - Jun Cheng
- Department of Informatics, Technical University of Munich, Garching, Germany
| | - Noa E Cohen
- Department of Genetics, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.,The integrated program for Computer Science and Computational Biology, School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - William G Fairbrother
- Department of Molecular Biology, Cell Biology, and Biochemistry, Center For Computational Biology, Brown University, Providence, Rhode Island
| | - Tzila Fenesh
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, Garching, Germany
| | - Valer Gotea
- National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH), Bethesda, Maryland
| | - Tamar Holzer
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Chiao-Feng Lin
- Translational Informatics, DNAnexus, Mountain View, California
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Tatsuhiko Naito
- Department of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | | | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, Biocomputing Group, University of Bologna, Bologna, Italy
| | - Ron Unger
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Robert Wang
- Department of Bioengineering, University of California, Berkeley, California.,Department of Plant and Molecular Biology, University of California, Berkeley, California
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| | - Huiying Zhao
- Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
46
|
Abstract
Background Many diseases are associated with complex patterns of symptoms and phenotypic manifestations. Parsimonious explanations aim at reconciling the multiplicity of phenotypic traits with the perturbation of one or few biological functions. For this, it is necessary to characterize human phenotypes at the molecular and functional levels, by exploiting gene annotations and known relations among genes, diseases and phenotypes. This characterization makes it possible to implement tools for retrieving functions shared among phenotypes, co-occurring in the same patient and facilitating the formulation of hypotheses about the molecular causes of the disease. Results We introduce PhenPath, a new resource consisting of two parts: PhenPathDB and PhenPathTOOL. The former is a database collecting the human genes associated with the phenotypes described in Human Phenotype Ontology (HPO) and OMIM Clinical Synopses. Phenotypes are then associated with biological functions and pathways by means of NET-GE, a network-based method for functional enrichment of sets of genes. The present version considers only phenotypes related to diseases. PhenPathDB collects information for 18 OMIM Clinical synopses and 7137 HPO phenotypes, related to 4292 diseases and 3446 genes. Enrichment of Gene Ontology annotations endows some 87.7, 86.9 and 73.6% of HPO phenotypes with Biological Process, Molecular Function and Cellular Component terms, respectively. Furthermore, 58.8 and 77.8% of HPO phenotypes are also enriched for KEGG and Reactome pathways, respectively. Based on PhenPathDB, PhenPathTOOL analyzes user-defined sets of phenotypes retrieving diseases, genes and functional terms which they share. This information can provide clues for interpreting the co-occurrence of phenotypes in a patient. Conclusions The resource allows finding molecular features useful to investigate diseases characterized by multiple phenotypes, and by this, it can help researchers and physicians in identifying molecular mechanisms and biological functions underlying the concomitant manifestation of phenotypes. The resource is freely available at http://phenpath.biocomp.unibo.it. Electronic supplementary material The online version of this article (10.1186/s12864-019-5868-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Giulia Babbi
- University of Bologna, FABIT, Via San Donato 15, 40126, Bologna, Italy.,Department of BIGEA, University of Bologna, Piazza di Porta S. Donato, 1, 40126, Bologna, Italy
| | - Pier Luigi Martelli
- University of Bologna, FABIT, Via San Donato 15, 40126, Bologna, Italy. .,Interdepartmental Center "Luigi Galvani" for integrated studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, CIG, Via G. Petroni 26, 40126, Bologna, Italy.
| | - Rita Casadio
- University of Bologna, FABIT, Via San Donato 15, 40126, Bologna, Italy.,Interdepartmental Center "Luigi Galvani" for integrated studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, CIG, Via G. Petroni 26, 40126, Bologna, Italy.,CNR, Institute of Biomembrane and Bioenergetics (IBIOM), Via Giovanni Amendola 165/A, 70126, Bari, Italy
| |
Collapse
|
47
|
Profiti G, Martelli PL, Casadio R. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation. Nucleic Acids Res 2019; 45:W285-W290. [PMID: 28453653 PMCID: PMC5570247 DOI: 10.1093/nar/gkx330] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 04/18/2017] [Indexed: 01/03/2023] Open
Abstract
BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3.
Collapse
Affiliation(s)
- Giuseppe Profiti
- Biocomputing Group, BiGeA/CIG, 'Luigi Galvani' Interdepartmental Center for Integrated Studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, Bologna 40126, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, BiGeA/CIG, 'Luigi Galvani' Interdepartmental Center for Integrated Studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, Bologna 40126, Italy
| | - Rita Casadio
- Biocomputing Group, BiGeA/CIG, 'Luigi Galvani' Interdepartmental Center for Integrated Studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, Bologna 40126, Italy
| |
Collapse
|
48
|
Savojardo C, Martelli PL, Fariselli P, Casadio R. DeepSig: deep learning improves signal peptide detection in proteins. Bioinformatics 2019; 34:1690-1696. [PMID: 29280997 PMCID: PMC5946842 DOI: 10.1093/bioinformatics/btx818] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 12/20/2017] [Indexed: 11/15/2022] Open
Abstract
Motivation The identification of signal peptides in protein sequences is an important step toward protein localization and function characterization. Results Here, we present DeepSig, an improved approach for signal peptide detection and cleavage-site prediction based on deep learning methods. Comparative benchmarks performed on an updated independent dataset of proteins show that DeepSig is the current best performing method, scoring better than other available state-of-the-art approaches on both signal peptide detection and precise cleavage-site identification. Availability and implementation DeepSig is available as both standalone program and web server at https://deepsig.biocomp.unibo.it. All datasets used in this study can be obtained from the same website. Contact pierluigi.martelli@unibo.it. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology - Interdepartmental Centre 'L. Galvani' for Integrated Studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, 40126 Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology - Interdepartmental Centre 'L. Galvani' for Integrated Studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, 40126 Bologna, Italy
| | - Piero Fariselli
- Department of Comparative Biomedicine and Food Science (BCA), University of Padova, Padova, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology - Interdepartmental Centre 'L. Galvani' for Integrated Studies of Bioinformatics, Biophysics and Biocomplexity, University of Bologna, 40126 Bologna, Italy
| |
Collapse
|
49
|
McInnes G, Daneshjou R, Katsonis P, Lichtarge O, Srinivasan R, Rana S, Radivojac P, Mooney SD, Pagel KA, Stamboulian M, Jiang Y, Capriotti E, Wang Y, Bromberg Y, Bovo S, Savojardo C, Martelli PL, Casadio R, Pal LR, Moult J, Brenner SE, Altman R. Predicting venous thromboembolism risk from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges. Hum Mutat 2019; 40:1314-1320. [PMID: 31140652 DOI: 10.1002/humu.23825] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 05/07/2019] [Accepted: 05/27/2019] [Indexed: 01/14/2023]
Abstract
Genetics play a key role in venous thromboembolism (VTE) risk, however established risk factors in European populations do not translate to individuals of African descent because of the differences in allele frequencies between populations. As part of the fifth iteration of the Critical Assessment of Genome Interpretation, participants were asked to predict VTE status in exome data from African American subjects. Participants were provided with 103 unlabeled exomes from patients treated with warfarin for non-VTE causes or VTE and asked to predict which disease each subject had been treated for. Given the lack of training data, many participants opted to use unsupervised machine learning methods, clustering the exomes by variation in genes known to be associated with VTE. The best performing method using only VTE related genes achieved an area under the ROC curve of 0.65. Here, we discuss the range of methods used in the prediction of VTE from sequence data and explore some of the difficulties of conducting a challenge with known confounders. In addition, we show that an existing genetic risk score for VTE that was developed in European subjects works well in African Americans.
Collapse
Affiliation(s)
- Gregory McInnes
- Biomedical Informatics Training Program, Stanford University, Stanford, California
| | - Roxana Daneshjou
- Department of Dermatology, Stanford School of Medicine, Stanford, California
| | - Panagiostis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Department of Biochemistry & Molecular Biology, Baylor College of Medicine, Houston, Texas.,Department of Pharmacology, Baylor College of Medicine, Houston, Texas.,Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas
| | | | - Sadhna Rana
- Innovations Labs, Tata Consultancy Services, Hyderabad, India
| | - Predrag Radivojac
- Khoury College of Computer and Information Sciences, Northeastern University, Boston, Massachusetts
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington
| | - Kymberleigh A Pagel
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana
| | - Moses Stamboulian
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana
| | - Yuxiang Jiang
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana
| | - Emidio Capriotti
- BioFolD Unit, Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Yanran Wang
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, New Jersey
| | - Samuele Bovo
- Department of Pharmacy and Biotechnology, Bologna Biocomputing Group, University of Bologna, Italy
| | - Castrense Savojardo
- Department of Pharmacy and Biotechnology, Bologna Biocomputing Group, University of Bologna, Italy
| | - Pier Luigi Martelli
- Department of Pharmacy and Biotechnology, Bologna Biocomputing Group, University of Bologna, Italy
| | - Rita Casadio
- Department of Pharmacy and Biotechnology, Bologna Biocomputing Group, University of Bologna, Italy.,Institute of Biomembrane and Bioenergetics, Consiglio Nazionale delle Ricerche, Bari, Italy
| | - Lipika R Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland.,Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| | - Steven E Brenner
- Department of Plant and Microbial biology, University of California Berkeley, Berkeley, California
| | - Russ Altman
- Departments of Bioengineering, Biomedical Data Science, Genetics, and Medicine, Stanford University, Stanford, California
| |
Collapse
|
50
|
Savojardo C, Babbi G, Bovo S, Capriotti E, Martelli PL, Casadio R. Are machine learning based methods suited to address complex biological problems? Lessons from CAGI-5 challenges. Hum Mutat 2019; 40:1455-1462. [PMID: 31066146 DOI: 10.1002/humu.23784] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 04/02/2019] [Accepted: 05/04/2019] [Indexed: 11/06/2022]
Abstract
In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI-5). In particular, we analyze prediction performances of our tools on five CAGI-5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool.
Collapse
Affiliation(s)
- Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Samuele Bovo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Emidio Capriotti
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies (IBIOM), Italian National Research Council (CNR), Bari, Italy
| |
Collapse
|