1
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors. Hum Genomics 2024; 18:90. [PMID: 39198917 PMCID: PMC11360829 DOI: 10.1186/s40246-024-00663-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2024] [Accepted: 08/19/2024] [Indexed: 09/01/2024] Open
Abstract
BACKGROUND Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). RESULTS The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past three decades, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 190 VIPs, resulting in a total of 407 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. CONCLUSIONS VIPdb version 2 summarizes 407 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. VIPdb is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA
| | - Arul S Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA
- Illumina, Foster City, CA, 94404, USA
| | - Steven E Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, 94720, USA.
- Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
- College of Computing, Data Science, and Society, University of California, Berkeley, CA, 94720, USA.
- Department of Plant and Microbial Biology, University of California, 111 Koshland Hall #3102, Berkeley, CA, 94720-3102, USA.
| |
Collapse
|
2
|
Lin YJ, Menon AS, Hu Z, Brenner SE. Variant Impact Predictor database (VIPdb), version 2: Trends from 25 years of genetic variant impact predictors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600283. [PMID: 38979289 PMCID: PMC11230257 DOI: 10.1101/2024.06.25.600283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Background Variant interpretation is essential for identifying patients' disease-causing genetic variants amongst the millions detected in their genomes. Hundreds of Variant Impact Predictors (VIPs), also known as Variant Effect Predictors (VEPs), have been developed for this purpose, with a variety of methodologies and goals. To facilitate the exploration of available VIP options, we have created the Variant Impact Predictor database (VIPdb). Results The Variant Impact Predictor database (VIPdb) version 2 presents a collection of VIPs developed over the past 25 years, summarizing their characteristics, ClinGen calibrated scores, CAGI assessment results, publication details, access information, and citation patterns. We previously summarized 217 VIPs and their features in VIPdb in 2019. Building upon this foundation, we identified and categorized an additional 186 VIPs, resulting in a total of 403 VIPs in VIPdb version 2. The majority of the VIPs have the capacity to predict the impacts of single nucleotide variants and nonsynonymous variants. More VIPs tailored to predict the impacts of insertions and deletions have been developed since the 2010s. In contrast, relatively few VIPs are dedicated to the prediction of splicing, structural, synonymous, and regulatory variants. The increasing rate of citations to VIPs reflects the ongoing growth in their use, and the evolving trends in citations reveal development in the field and individual methods. Conclusions VIPdb version 2 summarizes 403 VIPs and their features, potentially facilitating VIP exploration for various variant interpretation applications. Availability VIPdb version 2 is available at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Yu-Jen Lin
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
| | - Arul S. Menon
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Currently at: Illumina, Foster City, California 94404, USA
| | - Steven E. Brenner
- Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA
- Center for Computational Biology, University of California, Berkeley, California 94720, USA
- College of Computing, Data Science, and Society, University of California, Berkeley, California 94720, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
3
|
Robson ES, Ioannidis NM. GUANinE v1.0: Benchmark Datasets for Genomic AI Sequence-to-Function Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.12.562113. [PMID: 37904945 PMCID: PMC10614795 DOI: 10.1101/2023.10.12.562113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Computational genomics increasingly relies on machine learning methods for genome interpretation, and the recent adoption of neural sequence-to-function models highlights the need for rigorous model specification and controlled evaluation, problems familiar to other fields of AI. Research strategies that have greatly benefited other fields - including benchmarking, auditing, and algorithmic fairness - are also needed to advance the field of genomic AI and to facilitate model development. Here we propose a genomic AI benchmark, GUANinE, for evaluating model generalization across a number of distinct genomic tasks. Compared to existing task formulations in computational genomics, GUANinE is large-scale, de-noised, and suitable for evaluating pretrained models. GUANinE v1.0 primarily focuses on functional genomics tasks such as functional element annotation and gene expression prediction, and it also draws upon connections to evolutionary biology through sequence conservation tasks. The current GUANinE tasks provide insight into the performance of existing genomic AI models and non-neural baselines, with opportunities to be refined, revisited, and broadened as the field matures. Finally, the GUANinE benchmark allows us to evaluate new self-supervised T5 models and explore the tradeoffs between tokenization and model performance, while showcasing the potential for self-supervision to complement existing pretraining procedures.
Collapse
Affiliation(s)
- Eyes S Robson
- Center for Computational Biology, UC Berkeley, Berkeley, CA 94720
| | - Nilah M Ioannidis
- Department of Electrical Engineering and Computer Sciences, UC Berkeley, Berkeley, CA 94720
| |
Collapse
|
4
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, et alJain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
5
|
Katsonis P, Wilhelm K, Williams A, Lichtarge O. Genome interpretation using in silico predictors of variant impact. Hum Genet 2022; 141:1549-1577. [PMID: 35488922 PMCID: PMC9055222 DOI: 10.1007/s00439-022-02457-6] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 04/17/2022] [Indexed: 02/06/2023]
Abstract
Estimating the effects of variants found in disease driver genes opens the door to personalized therapeutic opportunities. Clinical associations and laboratory experiments can only characterize a tiny fraction of all the available variants, leaving the majority as variants of unknown significance (VUS). In silico methods bridge this gap by providing instant estimates on a large scale, most often based on the numerous genetic differences between species. Despite concerns that these methods may lack reliability in individual subjects, their numerous practical applications over cohorts suggest they are already helpful and have a role to play in genome interpretation when used at the proper scale and context. In this review, we aim to gain insights into the training and validation of these variant effect predicting methods and illustrate representative types of experimental and clinical applications. Objective performance assessments using various datasets that are not yet published indicate the strengths and limitations of each method. These show that cautious use of in silico variant impact predictors is essential for addressing genome interpretation challenges.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| | - Kevin Wilhelm
- Graduate School of Biomedical Sciences, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Amanda Williams
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Biochemistry, Human Genetics and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Department of Pharmacology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
- Computational and Integrative Biomedical Research Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
6
|
Horne J, Shukla D. Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering. Ind Eng Chem Res 2022; 61:6235-6245. [PMID: 36051311 PMCID: PMC9432854 DOI: 10.1021/acs.iecr.1c04943] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Proteins are Nature's molecular machinery and comprise diverse roles while consisting of chemically similar building blocks. In recent years, protein engineering and design have become important research areas, with many applications in the pharmaceutical, energy, and biocatalysis fields, among others-where the aim is to ultimately create a protein given desired structural and functional properties. It is often critical to model the relationship between a protein's sequence, folded structure, and biological function to assist in such protein engineering pursuits. However, significant challenges remain in concretely mapping an amino acid sequence to specific protein properties and biological activities. Mutations may enhance or diminish molecular protein function, and the epistatic interactions between mutations result in an inherently complex mapping between genetic modifications and protein function. Therefore, estimating the quantitative effects of mutations on protein function(s) remains a grand challenge of biology, bioinformatics, and many related fields and would rapidly accelerate protein engineering tasks when successful. Such estimation is often known as variant effect prediction (VEP). However, progress has been demonstrated in recent years with the development of machine learning (ML) methods in modeling the relationship between mutations and protein function. In this Review, recent advances in variant effect prediction (VEP) are discussed as tools for protein engineering, focusing on techniques incorporating gains from the broader ML community and challenges in estimating biomolecular functional differences. Primary developments highlighted include convolutional neural networks, graph neural networks, and natural language embeddings for protein sequences.
Collapse
Affiliation(s)
- Jesse Horne
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering and Department of Bioengineering, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States; Department of Plant Biology, Cancer Center at Illinois, and Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Champaign, Illinois 61801, United States
| |
Collapse
|
7
|
Olson ND, Wagner J, McDaniel J, Stephens SH, Westreich ST, Prasanna AG, Johanson E, Boja E, Maier EJ, Serang O, Jáspez D, Lorenzo-Salazar JM, Muñoz-Barrera A, Rubio-Rodríguez LA, Flores C, Kyriakidis K, Malousi A, Shafin K, Pesout T, Jain M, Paten B, Chang PC, Kolesnikov A, Nattestad M, Baid G, Goel S, Yang H, Carroll A, Eveleigh R, Bourgey M, Bourque G, Li G, Ma C, Tang L, Du Y, Zhang S, Morata J, Tonda R, Parra G, Trotta JR, Brueffer C, Demirkaya-Budak S, Kabakci-Zorlu D, Turgut D, Kalay Ö, Budak G, Narcı K, Arslan E, Brown R, Johnson IJ, Dolgoborodov A, Semenyuk V, Jain A, Tetikol HS, Jain V, Ruehle M, Lajoie B, Roddey C, Catreux S, Mehio R, Ahsan MU, Liu Q, Wang K, Ebrahim Sahraeian SM, Fang LT, Mohiyuddin M, Hung C, Jain C, Feng H, Li Z, Chen L, Sedlazeck FJ, Zook JM. PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions. CELL GENOMICS 2022; 2:S2666-979X(22)00058-1. [PMID: 35720974 PMCID: PMC9205427 DOI: 10.1016/j.xgen.2022.100129] [Citation(s) in RCA: 73] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 11/01/2021] [Accepted: 04/08/2022] [Indexed: 11/19/2022]
Abstract
The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.
Collapse
Affiliation(s)
- Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | | | | | | | - Elaine Johanson
- Office of Health Informatics, Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, USA
| | - Emily Boja
- Office of Health Informatics, Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, USA
| | - Ezekiel J. Maier
- Booz Allen Hamilton, 8283 Greensboro Drive, Mclean, VA 22102, USA
| | - Omar Serang
- DNAnexus, Inc., 1975 W El Camino Real #204, Mountain View, CA 94040, USA
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - José M. Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Luis A. Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
- Research Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
- Instituto de Tecnologías Biomédicas (ITB), Universidad de La Laguna, 38200 San Cristóbal de La Laguna, Spain
| | - Konstantinos Kyriakidis
- School of Pharmacy, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, Greece
- Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation, 570 01 Thessaloniki, Greece
| | - Andigoni Malousi
- Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation, 570 01 Thessaloniki, Greece
- Laboratory of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, Greece
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Pi-Chuan Chang
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | | | - Maria Nattestad
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Gunjan Baid
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Sidharth Goel
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Howard Yang
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Robert Eveleigh
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Mathieu Bourgey
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Guillaume Bourque
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Gen Li
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - ChouXian Ma
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - LinQi Tang
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - YuanPing Du
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - ShaoWei Zhang
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - Jordi Morata
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Raúl Tonda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Genís Parra
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jean-Rémi Trotta
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Christian Brueffer
- Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | | | | | - Deniz Turgut
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Özem Kalay
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Gungor Budak
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Kübra Narcı
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Elif Arslan
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | | | | | | | | | - Amit Jain
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | | | | | | | | | | | | | | | - Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Li Tai Fang
- Roche Sequencing Solutions, Santa Clara, CA 95050, USA
| | | | | | - Chirag Jain
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| |
Collapse
|
8
|
Kim HY, Jeon W, Kim D. An enhanced variant effect predictor based on a deep generative model and the Born-Again Networks. Sci Rep 2021; 11:19127. [PMID: 34580383 PMCID: PMC8476491 DOI: 10.1038/s41598-021-98693-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 09/07/2021] [Indexed: 11/09/2022] Open
Abstract
The development of an accurate and reliable variant effect prediction tool is important for research in human genetic diseases. A large number of predictors have been developed towards this goal, yet many of these predictors suffer from the problem of data circularity. Here we present MTBAN (Mutation effect predictor using the Temporal convolutional network and the Born-Again Networks), a method for predicting the deleteriousness of variants. We apply a form of knowledge distillation technique known as the Born-Again Networks (BAN) to a previously developed deep autoregressive generative model, mutationTCN, to achieve an improved performance in variant effect prediction. As the model is fully unsupervised and trained only on the evolutionarily related sequences of a protein, it does not suffer from the problem of data circularity which is common across supervised predictors. When evaluated on a test dataset consisting of deleterious and benign human protein variants, MTBAN shows an outstanding predictive ability compared to other well-known variant effect predictors. We also offer a user-friendly web server to predict variant effects using MTBAN, freely accessible at http://mtban.kaist.ac.kr . To our knowledge, MTBAN is the first variant effect prediction tool based on a deep generative model that provides a user-friendly web server for the prediction of deleteriousness of variants.
Collapse
Affiliation(s)
- Ha Young Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Republic of Korea
| | - Woosung Jeon
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
9
|
Seaby EG, Ennis S. Challenges in the diagnosis and discovery of rare genetic disorders using contemporary sequencing technologies. Brief Funct Genomics 2021; 19:243-258. [PMID: 32393978 DOI: 10.1093/bfgp/elaa009] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Next generation sequencing (NGS) has revolutionised rare disease diagnostics. Concomitant with advancing technologies has been a rise in the number of new gene disorders discovered and diagnoses made for patients and their families. However, despite the trend towards whole exome and whole genome sequencing, diagnostic rates remain suboptimal. On average, only ~30% of patients receive a molecular diagnosis. National sequencing projects launched in the last 5 years are integrating clinical diagnostic testing with research avenues to widen the spectrum of known genetic disorders. Consequently, efforts to diagnose genetic disorders in a clinical setting are now often shared with efforts to prioritise candidate variants for the detection of new disease genes. Herein we discuss some of the biggest obstacles precluding molecular diagnosis and discovery of new gene disorders. We consider bioinformatic and analytical challenges faced when interpreting next generation sequencing data and showcase some of the newest tools available to mitigate these issues. We consider how incomplete penetrance, non-coding variation and structural variants are likely to impact diagnostic rates, and we further discuss methods for uplifting novel gene discovery by adopting a gene-to-patient-based approach.
Collapse
|
10
|
Tarca AL, Pataki BÁ, Romero R, Sirota M, Guan Y, Kutum R, Gomez-Lopez N, Done B, Bhatti G, Yu T, Andreoletti G, Chaiworapongsa T, Hassan SS, Hsu CD, Aghaeepour N, Stolovitzky G, Csabai I, Costello JC. Crowdsourcing assessment of maternal blood multi-omics for predicting gestational age and preterm birth. Cell Rep Med 2021; 2:100323. [PMID: 34195686 PMCID: PMC8233692 DOI: 10.1016/j.xcrm.2021.100323] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 01/18/2021] [Accepted: 05/20/2021] [Indexed: 12/15/2022]
Abstract
Identification of pregnancies at risk of preterm birth (PTB), the leading cause of newborn deaths, remains challenging given the syndromic nature of the disease. We report a longitudinal multi-omics study coupled with a DREAM challenge to develop predictive models of PTB. The findings indicate that whole-blood gene expression predicts ultrasound-based gestational ages in normal and complicated pregnancies (r = 0.83) and, using data collected before 37 weeks of gestation, also predicts the delivery date in both normal pregnancies (r = 0.86) and those with spontaneous preterm birth (r = 0.75). Based on samples collected before 33 weeks in asymptomatic women, our analysis suggests that expression changes preceding preterm prelabor rupture of the membranes are consistent across time points and cohorts and involve leukocyte-mediated immunity. Models built from plasma proteomic data predict spontaneous preterm delivery with intact membranes with higher accuracy and earlier in pregnancy than transcriptomic models (AUROC = 0.76 versus AUROC = 0.6 at 27-33 weeks of gestation).
Collapse
Affiliation(s)
- Adi L. Tarca
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Department of Computer Science, Wayne State University College of Engineering, Detroit, MI 48202, USA
| | - Bálint Ármin Pataki
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
| | - Roberto Romero
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48201, USA
- Detroit Medical Center, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Florida International University, Miami, FL 33199, USA
| | - Marina Sirota
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Rintu Kutum
- Informatics and Big Data Unit, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
| | - Nardhy Gomez-Lopez
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Department of Biochemistry, Microbiology, and Immunology, Wayne State University School of Medicine, Detroit, MI 48201 USA
| | - Bogdan Done
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
| | - Gaurav Bhatti
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
| | | | - Gaia Andreoletti
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
| | - Tinnakorn Chaiworapongsa
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
| | - The DREAM Preterm Birth Prediction Challenge Consortium
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Department of Computer Science, Wayne State University College of Engineering, Detroit, MI 48202, USA
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
- Department of Obstetrics and Gynecology, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA
- Center for Molecular Medicine and Genetics, Wayne State University, Detroit, MI 48201, USA
- Detroit Medical Center, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Florida International University, Miami, FL 33199, USA
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Informatics and Big Data Unit, CSIR-Institute of Genomics and Integrative Biology, New Delhi, India
- Department of Biochemistry, Microbiology, and Immunology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Sage Bionetworks, Seattle, WA, USA
- Office of Women’s Health, Integrative Biosciences Center, Wayne State University, Detroit, MI 48202, USA
- Department of Physiology, Wayne State University School of Medicine, Detroit, MI 48201, USA
- Department of Anesthesiology, Perioperative, and Pain Medicine, Department of Pediatrics, and Department of Biomedical Data Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Sonia S. Hassan
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Office of Women’s Health, Integrative Biosciences Center, Wayne State University, Detroit, MI 48202, USA
- Department of Physiology, Wayne State University School of Medicine, Detroit, MI 48201, USA
| | - Chaur-Dong Hsu
- Perinatology Research Branch, Division of Obstetrics and Maternal-Fetal Medicine, Division of Intramural Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, US Department of Health and Human Services, Detroit, MI 48201, USA
- Department of Obstetrics and Gynecology, Wayne State University School of Medicine, Detroit, MI 48201 USA
- Department of Physiology, Wayne State University School of Medicine, Detroit, MI 48201, USA
| | - Nima Aghaeepour
- Department of Anesthesiology, Perioperative, and Pain Medicine, Department of Pediatrics, and Department of Biomedical Data Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Gustavo Stolovitzky
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
- IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
| | - Istvan Csabai
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary
| | - James C. Costello
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| |
Collapse
|
11
|
SAAFEC-SEQ: A Sequence-Based Method for Predicting the Effect of Single Point Mutations on Protein Thermodynamic Stability. Int J Mol Sci 2021; 22:ijms22020606. [PMID: 33435356 PMCID: PMC7827184 DOI: 10.3390/ijms22020606] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 12/23/2020] [Accepted: 01/06/2021] [Indexed: 01/04/2023] Open
Abstract
Modeling the effect of mutations on protein thermodynamics stability is useful for protein engineering and understanding molecular mechanisms of disease-causing variants. Here, we report a new development of the SAAFEC method, the SAAFEC-SEQ, which is a gradient boosting decision tree machine learning method to predict the change of the folding free energy caused by amino acid substitutions. The method does not require the 3D structure of the corresponding protein, but only its sequence and, thus, can be applied on genome-scale investigations where structural information is very sparse. SAAFEC-SEQ uses physicochemical properties, sequence features, and evolutionary information features to make the predictions. It is shown to consistently outperform all existing state-of-the-art sequence-based methods in both the Pearson correlation coefficient and root-mean-squared-error parameters as benchmarked on several independent datasets. The SAAFEC-SEQ has been implemented into a web server and is available as stand-alone code that can be downloaded and embedded into other researchers’ code.
Collapse
|
12
|
Wells DK, van Buuren MM, Dang KK, Hubbard-Lucey VM, Sheehan KCF, Campbell KM, Lamb A, Ward JP, Sidney J, Blazquez AB, Rech AJ, Zaretsky JM, Comin-Anduix B, Ng AHC, Chour W, Yu TV, Rizvi H, Chen JM, Manning P, Steiner GM, Doan XC, Merghoub T, Guinney J, Kolom A, Selinsky C, Ribas A, Hellmann MD, Hacohen N, Sette A, Heath JR, Bhardwaj N, Ramsdell F, Schreiber RD, Schumacher TN, Kvistborg P, Defranoux NA. Key Parameters of Tumor Epitope Immunogenicity Revealed Through a Consortium Approach Improve Neoantigen Prediction. Cell 2020; 183:818-834.e13. [PMID: 33038342 DOI: 10.1016/j.cell.2020.09.015] [Citation(s) in RCA: 312] [Impact Index Per Article: 62.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 07/08/2020] [Accepted: 09/03/2020] [Indexed: 12/15/2022]
Abstract
Many approaches to identify therapeutically relevant neoantigens couple tumor sequencing with bioinformatic algorithms and inferred rules of tumor epitope immunogenicity. However, there are no reference data to compare these approaches, and the parameters governing tumor epitope immunogenicity remain unclear. Here, we assembled a global consortium wherein each participant predicted immunogenic epitopes from shared tumor sequencing data. 608 epitopes were subsequently assessed for T cell binding in patient-matched samples. By integrating peptide features associated with presentation and recognition, we developed a model of tumor epitope immunogenicity that filtered out 98% of non-immunogenic peptides with a precision above 0.70. Pipelines prioritizing model features had superior performance, and pipeline alterations leveraging them improved prediction performance. These findings were validated in an independent cohort of 310 epitopes prioritized from tumor sequencing data and assessed for T cell binding. This data resource enables identification of parameters underlying effective anti-tumor immunity and is available to the research community.
Collapse
Affiliation(s)
- Daniel K Wells
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA.
| | - Marit M van Buuren
- Division of Molecular Oncology and Immunology, the Netherlands Cancer Institute, Amsterdam, the Netherlands; T Cell Immunology, Biopharmaceutical New Technologies (BioNTech) Corporation, BioNTech US, Cambridge, MA, USA
| | - Kristen K Dang
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | | | - Kathleen C F Sheehan
- Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, St. Louis, MO, USA; The Andrew M. and Jane M. Bursky Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine, St. Louis, MO, USA
| | - Katie M Campbell
- Division of Hematology and Oncology, Department of Medicine, Johnson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Andrew Lamb
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Jeffrey P Ward
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - John Sidney
- Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, La Jolla, CA, USA
| | - Ana B Blazquez
- Division of Hematology and Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Andrew J Rech
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia, PA, USA
| | - Jesse M Zaretsky
- Division of Hematology and Oncology, Department of Medicine, Johnson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Begonya Comin-Anduix
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Department of Surgery, David Geffen School of Medicine, Johnson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - William Chour
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Thomas V Yu
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Hira Rizvi
- Druckenmiller Center for Lung Cancer Research, MSKCC, New York, NY, USA
| | - Jia M Chen
- Division of Hematology and Oncology, Department of Medicine, Johnson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Patrice Manning
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
| | | | - Xengie C Doan
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA
| | - Taha Merghoub
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Department of Medicine, MSKCC, New York, NY, USA; Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Justin Guinney
- Computational Oncology, Sage Bionetworks, Seattle, WA, USA; Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
| | - Adam Kolom
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Anna-Maria Kellen Clinical Accelerator, Cancer Research Institute, New York, NY, USA
| | - Cheryl Selinsky
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
| | - Antoni Ribas
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Division of Hematology and Oncology, Department of Medicine, Johnson Comprehensive Cancer Center, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Matthew D Hellmann
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Druckenmiller Center for Lung Cancer Research, MSKCC, New York, NY, USA; Department of Medicine, MSKCC, New York, NY, USA; Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Nir Hacohen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Massachusetts General Hospital Cancer Center, Boston, MA, USA
| | - Alessandro Sette
- Division of Hematology and Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - James R Heath
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Institute for Systems Biology, Seattle, WA, USA
| | - Nina Bhardwaj
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Division of Hematology and Oncology, Tisch Cancer Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Fred Ramsdell
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
| | - Robert D Schreiber
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA; Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, St. Louis, MO, USA; The Andrew M. and Jane M. Bursky Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine, St. Louis, MO, USA
| | - Ton N Schumacher
- Division of Molecular Oncology and Immunology, Oncode Institute, the Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Pia Kvistborg
- Division of Molecular Oncology and Immunology, the Netherlands Cancer Institute, Amsterdam, the Netherlands
| | | |
Collapse
|
13
|
Pey AL. Towards Accurate Genotype-Phenotype Correlations in the CYP2D6 Gene. J Pers Med 2020; 10:jpm10040158. [PMID: 33049937 PMCID: PMC7711719 DOI: 10.3390/jpm10040158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 10/01/2020] [Indexed: 12/17/2022] Open
Abstract
Establishing accurate and large-scale genotype-phenotype correlations and predictions of individual response to pharmacological treatments are two of the holy grails of Personalized Medicine. These tasks are challenging and require an integrated knowledge of the complex processes that regulate gene expression and, ultimately, protein functionality in vivo, the effects of mutations/polymorphisms and the different sources of interindividual phenotypic variability. A remarkable example of our advances in these challenging tasks is the highly polymorphic CYP2D6 gene, which encodes a cytochrome P450 enzyme involved in the metabolization of many of the most marketed drugs (including SARS-Cov-2 therapies such as hydroxychloroquine). Since the introduction of simple activity scores (AS) over 10 years ago, its ability to establish genotype-phenotype correlations on the drug metabolizing capacity of this enzyme in human population has provided lessons that will help to improve this type of score for this, and likely many other human genes and proteins. Multidisciplinary research emerges as the best approach to incorporate additional concepts to refine and improve such functional/activity scores for the CYP2D6 gene, as well as for many other human genes associated with simple and complex genetic diseases.
Collapse
Affiliation(s)
- Angel L Pey
- Departamento de Química Física, Unidad de Excelencia de Química aplicada a Biomedicina y Medioambiente, Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain
| |
Collapse
|
14
|
Livesey BJ, Marsh JA. Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations. Mol Syst Biol 2020; 16:e9380. [PMID: 32627955 PMCID: PMC7336272 DOI: 10.15252/msb.20199380] [Citation(s) in RCA: 97] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 05/18/2020] [Accepted: 05/26/2020] [Indexed: 12/23/2022] Open
Abstract
To deal with the huge number of novel protein-coding variants identified by genome and exome sequencing studies, many computational variant effect predictors (VEPs) have been developed. Such predictors are often trained and evaluated using different variant data sets, making a direct comparison between VEPs difficult. In this study, we use 31 previously published deep mutational scanning (DMS) experiments, which provide quantitative, independent phenotypic measurements for large numbers of single amino acid substitutions, in order to benchmark and compare 46 different VEPs. We also evaluate the ability of DMS measurements and VEPs to discriminate between pathogenic and benign missense variants. We find that DMS experiments tend to be superior to the top-ranking predictors, demonstrating the tremendous potential of DMS for identifying novel human disease mutations. Among the VEPs, DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method. We further recommend SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL based upon their performance in these analyses.
Collapse
Affiliation(s)
- Benjamin J Livesey
- MRC Human Genetics UnitInstitute of Genetics and Molecular MedicineUniversity of EdinburghEdinburghUK
| | - Joseph A Marsh
- MRC Human Genetics UnitInstitute of Genetics and Molecular MedicineUniversity of EdinburghEdinburghUK
| |
Collapse
|
15
|
Pal LR, Kundu K, Yin Y, Moult J. Matching whole genomes to rare genetic disorders: Identification of potential causative variants using phenotype-weighted knowledge in the CAGI SickKids5 clinical genomes challenge. Hum Mutat 2020; 41:347-362. [PMID: 31680375 PMCID: PMC7182498 DOI: 10.1002/humu.23933] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 09/26/2019] [Accepted: 10/13/2019] [Indexed: 02/06/2023]
Abstract
Precise identification of causative variants from whole-genome sequencing data, including both coding and noncoding variants, is challenging. The Critical Assessment of Genome Interpretation 5 SickKids clinical genome challenge provided an opportunity to assess our ability to extract such information. Participants in the challenge were required to match each of the 24 whole-genome sequences to the correct phenotypic profile and to identify the disease class of each genome. These are all rare disease cases that have resisted genetic diagnosis in a state-of-the-art pipeline. The patients have a range of eye, neurological, and connective-tissue disorders. We used a gene-centric approach to address this problem, assigning each gene a multiphenotype-matching score. Mutations in the top-scoring genes for each phenotype profile were ranked on a 6-point scale of pathogenicity probability, resulting in an approximately equal number of top-ranked coding and noncoding candidate variants overall. We were able to assign the correct disease class for 12 cases and the correct genome to a clinical profile for five cases. The challenge assessor found genes in three of these five cases as likely appropriate. In the postsubmission phase, after careful screening of the genes in the correct genome, we identified additional potential diagnostic variants, a high proportion of which are noncoding.
Collapse
Affiliation(s)
- Lipika R. Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
| | - Kunal Kundu
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD 20742, USA
| | - Yizhou Yin
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
16
|
Cao Y, Sun Y, Karimi M, Chen H, Moronfoye O, Shen Y. Predicting pathogenicity of missense variants with weakly supervised regression. Hum Mutat 2019; 40:1579-1592. [PMID: 31144781 PMCID: PMC6744350 DOI: 10.1002/humu.23826] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 05/23/2019] [Accepted: 05/27/2019] [Indexed: 12/27/2022]
Abstract
Quickly growing genetic variation data of unknown clinical significance demand computational methods that can reliably predict clinical phenotypes and deeply unravel molecular mechanisms. On the platform enabled by the Critical Assessment of Genome Interpretation (CAGI), we develop a novel "weakly supervised" regression (WSR) model that not only predicts precise clinical significance (probability of pathogenicity) from inexact training annotations (class of pathogenicity) but also infers underlying molecular mechanisms in a variant-specific manner. Compared to multiclass logistic regression, a representative multiclass classifier, our kernelized WSR improves the performance for the ENIGMA Challenge set from 0.72 to 0.97 in binary area under the receiver operating characteristic curve (AUC) and from 0.64 to 0.80 in ordinal multiclass AUC. WSR model interpretation and protein structural interpretation reach consensus in corroborating the most probable molecular mechanisms by which some pathogenic BRCA1 variants confer clinical significance, namely metal-binding disruption for p.C44F and p.C47Y, protein-binding disruption for p.M18T, and structure destabilization for p.S1715N.
Collapse
Affiliation(s)
- Yue Cao
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Mostafa Karimi
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Haoran Chen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Oluwaseyi Moronfoye
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, 77843-3128, United States
| |
Collapse
|
17
|
Clark WT, Kasak L, Bakolitsa C, Hu Z, Andreoletti G, Babbi G, Bromberg Y, Casadio R, Dunbrack R, Folkman L, Ford CT, Jones D, Katsonis P, Kundu K, Lichtarge O, Martelli PL, Mooney SD, Nodzak C, Pal LR, Radivojac P, Savojardo C, Shi X, Zhou Y, Uppal A, Xu Q, Yin Y, Pejaver V, Wang M, Wei L, Moult J, Yu GK, Brenner SE, LeBowitz JH. Assessment of predicted enzymatic activity of α-N-acetylglucosaminidase variants of unknown significance for CAGI 2016. Hum Mutat 2019; 40:1519-1529. [PMID: 31342580 PMCID: PMC7156275 DOI: 10.1002/humu.23875] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2019] [Revised: 06/27/2019] [Accepted: 07/15/2019] [Indexed: 12/25/2022]
Abstract
The NAGLU challenge of the fourth edition of the Critical Assessment of Genome Interpretation experiment (CAGI4) in 2016, invited participants to predict the impact of variants of unknown significance (VUS) on the enzymatic activity of the lysosomal hydrolase α-N-acetylglucosaminidase (NAGLU). Deficiencies in NAGLU activity lead to a rare, monogenic, recessive lysosomal storage disorder, Sanfilippo syndrome type B (MPS type IIIB). This challenge attracted 17 submissions from 10 groups. We observed that top models were able to predict the impact of missense mutations on enzymatic activity with Pearson's correlation coefficients of up to .61. We also observed that top methods were significantly more correlated with each other than they were with observed enzymatic activity values, which we believe speaks to the importance of sequence conservation across the different methods. Improved functional predictions on the VUS will help population-scale analysis of disease epidemiology and rare variant association analysis.
Collapse
Affiliation(s)
| | - Laura Kasak
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
- Institute of Biomedicine and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Constantina Bakolitsa
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Giulia Babbi
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Lukas Folkman
- School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Colby T. Ford
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, NC, USA
| | - David Jones
- Bioinformatics Group, Department of Computer Science, University College London, UK
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Kunal Kundu
- University of Maryland, College Park, MD, USA
| | - Olivier Lichtarge
- Departments of Molecular and Human Genetics, Biochemistry & Molecular Biology, Pharmacology, and Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | | | - Conor Nodzak
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, NC, USA
| | | | - Predrag Radivojac
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, NC, USA
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Aneeta Uppal
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, NC, USA
| | - Qifang Xu
- Fox Chase Cancer Center, Philadelphia, PA, USA
| | - Yizhou Yin
- University of Maryland, College Park, MD, USA
| | - Vikas Pejaver
- Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA
| | - Meng Wang
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, P.R. China
| | - Liping Wei
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, P.R. China
| | - John Moult
- University of Maryland, College Park, MD, USA
| | - G. Karen Yu
- BioMarin Pharmaceutical, San Rafael, California, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | | |
Collapse
|
18
|
Kasak L, Bakolitsa C, Hu Z, Yu C, Rine J, Dimster-Denk DF, Pandey G, Baets GD, Bromberg Y, Cao C, Capriotti E, Casadio R, Durme JV, Giollo M, Karchin R, Katsonis P, Leonardi E, Lichtarge O, Martelli PL, Masica D, Mooney SD, Olatubosun A, Pal LR, Radivojac P, Rousseau F, Savojardo C, Schymkowitz J, Thusberg J, Tosatto SC, Vihinen M, Väliaho J, Repo S, Moult J, Brenner SE, Friedberg I. Assessing computational predictions of the phenotypic effect of cystathionine-beta-synthase variants. Hum Mutat 2019; 40:1530-1545. [PMID: 31301157 PMCID: PMC7325732 DOI: 10.1002/humu.23868] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Revised: 06/22/2019] [Accepted: 07/09/2019] [Indexed: 12/28/2022]
Abstract
Accurate prediction of the impact of genomic variation on phenotype is a major goal of computational biology and an important contributor to personalized medicine. Computational predictions can lead to a better understanding of the mechanisms underlying genetic diseases, including cancer, but their adoption requires thorough and unbiased assessment. Cystathionine-beta-synthase (CBS) is an enzyme that catalyzes the first step of the transsulfuration pathway, from homocysteine to cystathionine, and in which variations are associated with human hyperhomocysteinemia and homocystinuria. We have created a computational challenge under the CAGI framework to evaluate how well different methods can predict the phenotypic effect(s) of CBS single amino acid substitutions using a blinded experimental data set. CAGI participants were asked to predict yeast growth based on the identity of the mutations. The performance of the methods was evaluated using several metrics. The CBS challenge highlighted the difficulty of predicting the phenotype of an ex vivo system in a model organism when classification models were trained on human disease data. We also discuss the variations in difficulty of prediction for known benign and deleterious variants, as well as identify methodological and experimental constraints with lessons to be learned for future challenges.
Collapse
Affiliation(s)
- Laura Kasak
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
- Institute of Biomedicine and Translational Medicine, University of Tartu, Tartu, Estonia
| | - Constantina Bakolitsa
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Changhua Yu
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Jasper Rine
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Dago F. Dimster-Denk
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Gaurav Pandey
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Greet De Baets
- Switch Laboratory, VIB Center for Brain and Disease Research, Leuven, Belgium
- Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, USA
| | - Chen Cao
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD, USA
| | - Emidio Capriotti
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Rita Casadio
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Joost Van Durme
- Switch Laboratory, VIB Center for Brain and Disease Research, Leuven, Belgium
- Vrije Universiteit Brussel, Brussels, Belgium
| | - Manuel Giollo
- Department of Biomedical Sciences, University of Padua, Padua, Italy
| | - Rachel Karchin
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | | | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Pier Luigi Martelli
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - David Masica
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, USA
| | | | - Ayodeji Olatubosun
- Institute of Medical Technology, University of Tampere, Tampere, Finland
| | - Lipika R. Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, USA
| | - Predrag Radivojac
- School of Informatics and Computing, Indiana University, Bloomington, IN, USA
| | - Frederic Rousseau
- Switch Laboratory, VIB Center for Brain and Disease Research, Leuven, Belgium
- Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - Castrense Savojardo
- Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
| | - Joost Schymkowitz
- Switch Laboratory, VIB Center for Brain and Disease Research, Leuven, Belgium
- Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | | | | | - Mauno Vihinen
- Institute of Medical Technology, University of Tampere, Tampere, Finland
| | - Jouni Väliaho
- Institute of Medical Technology, University of Tampere, Tampere, Finland
| | - Susanna Repo
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - John Moult
- Department of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Iddo Friedberg
- Department of Microbiology, Miami University, Oxford, OH, USA
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA USA
| |
Collapse
|
19
|
Andreoletti G, Pal LR, Moult J, Brenner SE. Reports from the fifth edition of CAGI: The Critical Assessment of Genome Interpretation. Hum Mutat 2019; 40:1197-1201. [PMID: 31334884 PMCID: PMC7329230 DOI: 10.1002/humu.23876] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 07/19/2019] [Indexed: 12/20/2022]
Abstract
Interpretation of genomic variation plays an essential role in the analysis of cancer and monogenic disease, and increasingly also in complex trait disease, with applications ranging from basic research to clinical decisions. Many computational impact prediction methods have been developed, yet the field lacks a clear consensus on their appropriate use and interpretation. The Critical Assessment of Genome Interpretation (CAGI, /'kā-jē/) is a community experiment to objectively assess computational methods for predicting the phenotypic impacts of genomic variation. CAGI participants are provided genetic variants and make blind predictions of resulting phenotype. Independent assessors evaluate the predictions by comparing with experimental and clinical data. CAGI has completed five editions with the goals of establishing the state of art in genome interpretation and of encouraging new methodological developments. This special issue (https://onlinelibrary.wiley.com/toc/10981004/2019/40/9) comprises reports from CAGI, focusing on the fifth edition that culminated in a conference that took place 5 to 7 July 2018. CAGI5 was comprised of 14 challenges and engaged hundreds of participants from a dozen countries. This edition had a notable increase in splicing and expression regulatory variant challenges, while also continuing challenges on clinical genomics, as well as complex disease datasets and missense variants in diseases ranging from cancer to Pompe disease to schizophrenia. Full information about CAGI is at https://genomeinterpretation.org.
Collapse
Affiliation(s)
- Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
| | - Lipika R. Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA
- Center for Computational Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
20
|
Hu Z, Yu C, Furutsuki M, Andreoletti G, Ly M, Hoskins R, Adhikari AN, Brenner SE. VIPdb, a genetic Variant Impact Predictor Database. Hum Mutat 2019; 40:1202-1214. [PMID: 31283070 PMCID: PMC7288905 DOI: 10.1002/humu.23858] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 06/27/2019] [Indexed: 12/30/2022]
Abstract
Genome sequencing identifies vast number of genetic variants. Predicting these variants' molecular and clinical effects is one of the preeminent challenges in human genetics. Accurate prediction of the impact of genetic variants improves our understanding of how genetic information is conveyed to molecular and cellular functions, and is an essential step towards precision medicine. Over one hundred tools/resources have been developed specifically for this purpose. We summarize these tools as well as their characteristics, in the genetic Variant Impact Predictor Database (VIPdb). This database will help researchers and clinicians explore appropriate tools, and inform the development of improved methods. VIPdb can be browsed and downloaded at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Changhua Yu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Bioengineering, University of California, Berkeley, California 94720, USA
| | - Mabel Furutsuki
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Melissa Ly
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Division of Data Sciences, University of California, Berkeley, California 94720, USA
| | - Roger Hoskins
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Aashish N. Adhikari
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
21
|
Adhikari AN. Gene-specific features enhance interpretation of mutational impact on acid α-glucosidase enzyme activity. Hum Mutat 2019; 40:1507-1518. [PMID: 31228295 DOI: 10.1002/humu.23846] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 05/21/2019] [Accepted: 06/17/2019] [Indexed: 01/30/2023]
Abstract
We present a computational model for predicting mutational impact on enzymatic activity of human acid α-glucosidase (GAA), an enzyme associated with Pompe disease. Using a model that combines features specific to GAA with other general evolutionary and physiochemical features, we made blind predictions of enzymatic activity relative to wildtype human GAA for >300 GAA mutants, as part of the Critical Assessment of Genome Interpretation 5 GAA challenge. We found that gene-specific features can improve the performance of existing impact prediction tools that mostly rely on general features for pathogenicity prediction. Majority of the poorly predicted mutants that lower wildtype GAA enzyme activity occurred on the surface of the GAA protein. We also found that gene-specific features were uncorrelated with existing methods and provided orthogonal information for interpreting the origin of pathogenicity, particular in variants that are poorly predicted by existing general methods. Specific variants in GAA, when investigated in the context of its protein structure, suggested gene-specific information like the disruption of local backbone torsional geometry and disruption of particular sidechain-sidechain hydrogen bonds as some potential sources for pathogenicity.
Collapse
Affiliation(s)
- Aashish N Adhikari
- Department of Plant and Microbial Biology, University of California, Berkeley, California
| |
Collapse
|
22
|
Katsonis P, Lichtarge O. CAGI5: Objective performance assessments of predictions based on the Evolutionary Action equation. Hum Mutat 2019; 40:1436-1454. [PMID: 31317604 PMCID: PMC6900054 DOI: 10.1002/humu.23873] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 07/02/2019] [Accepted: 07/11/2019] [Indexed: 12/14/2022]
Abstract
Many computational approaches estimate the effect of coding variants, but their predictions often disagree with each other. These contradictions confound users and raise questions regarding reliability. Performance assessments can indicate the expected accuracy for each method and highlight advantages and limitations. The Critical Assessment of Genome Interpretation (CAGI) community aims to organize objective and systematic assessments: They challenge predictors on unpublished experimental and clinical data and assign independent assessors to evaluate the submissions. We participated in CAGI experiments as predictors, using the Evolutionary Action (EA) method to estimate the fitness effect of coding mutations. EA is untrained, uses homology information, and relies on a formal equation: The fitness effect equals the functional sensitivity to residue changes multiplied by the magnitude of the substitution. In previous CAGI experiments (between 2011 and 2016), our submissions aimed to predict the protein activity of single mutants. In 2018 (CAGI5), we also submitted predictions regarding clinical associations, folding stability, and matching genomic data with phenotype. For all these diverse challenges, we used EA to predict the fitness effect of variants, adjusted to specifically address each question. Our submissions had consistently good performance, suggesting that EA predicts reliably the effects of genetic variants.
Collapse
Affiliation(s)
- Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas.,Department of Biochemistry & Molecular Biology, Baylor College of Medicine, Houston, Texas.,Department of Pharmacology, Baylor College of Medicine, Houston, Texas.,Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas
| |
Collapse
|
23
|
Padilla N, Moles-Fernández A, Riera C, Montalban G, Özkan S, Ootes L, Bonache S, Díez O, Gutiérrez-Enríquez S, de la Cruz X. BRCA1- and BRCA2-specific in silico tools for variant interpretation in the CAGI 5 ENIGMA challenge. Hum Mutat 2019; 40:1593-1611. [PMID: 31112341 DOI: 10.1002/humu.23802] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Revised: 05/15/2019] [Accepted: 05/17/2019] [Indexed: 11/09/2022]
Abstract
BRCA1 and BRCA2 (BRCA1/2) germline variants disrupting the DNA protective role of these genes increase the risk of hereditary breast and ovarian cancers. Correct identification of these variants then becomes clinically relevant, because it may increase the survival rates of the carriers. Unfortunately, we are still unable to systematically predict the impact of BRCA1/2 variants. In this article, we present a family of in silico predictors that address this problem, using a gene-specific approach. For each protein, we have developed two tools, aimed at predicting the impact of a variant at two different levels: Functional and clinical. Testing their performance in different datasets shows that specific information compensates the small number of predictive features and the reduced training sets employed to develop our models. When applied to the variants of the BRCA1/2 (ENIGMA) challenge in the fifth Critical Assessment of Genome Interpretation (CAGI 5) we find that these methods, particularly those predicting the functional impact of variants, have a good performance, identifying the large compositional bias towards neutral variants in the CAGI sample. This performance is further improved when incorporating to our prediction protocol estimates of the impact on splicing of the target variant.
Collapse
Affiliation(s)
- Natàlia Padilla
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR). Universitat Autònoma de Barcelona, Barcelona, Spain
| | | | - Casandra Riera
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR). Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Gemma Montalban
- Oncogenetics Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain
| | - Selen Özkan
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR). Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Lars Ootes
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR). Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Sandra Bonache
- Oncogenetics Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain
| | - Orland Díez
- Oncogenetics Group, Vall d'Hebron Institute of Oncology (VHIO), Barcelona, Spain.,Area of Clinical and Molecular Genetics, University Hospital of Vall d'Hebron, Barcelona, Spain
| | | | - Xavier de la Cruz
- Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR). Universitat Autònoma de Barcelona, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
24
|
Cheng J, Nguyen TYD, Cygan KJ, Çelik MH, Fairbrother WG, Avsec Ž, Gagneur J. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol 2019; 20:48. [PMID: 30823901 PMCID: PMC6396468 DOI: 10.1186/s13059-019-1653-z] [Citation(s) in RCA: 136] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 02/12/2019] [Indexed: 12/15/2022] Open
Abstract
Predicting the effects of genetic variants on splicing is highly relevant for human genetics. We describe the framework MMSplice (modular modeling of splicing) with which we built the winning model of the CAGI5 exon skipping prediction challenge. The MMSplice modules are neural networks scoring exon, intron, and splice sites, trained on distinct large-scale genomics datasets. These modules are combined to predict effects of variants on exon skipping, splice site choice, splicing efficiency, and pathogenicity, with matched or higher performance than state-of-the-art. Our models, available in the repository Kipoi, apply to variants including indels directly from VCF files.
Collapse
Affiliation(s)
- Jun Cheng
- Department of Informatics, Technical University of Munich, Boltzmannstraße, Garching, 85748 Germany
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, München, Germany
| | - Thi Yen Duong Nguyen
- Department of Informatics, Technical University of Munich, Boltzmannstraße, Garching, 85748 Germany
| | - Kamil J. Cygan
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island USA
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, Rhode Island USA
| | - Muhammed Hasan Çelik
- Department of Informatics, Technical University of Munich, Boltzmannstraße, Garching, 85748 Germany
| | - William G. Fairbrother
- Center for Computational Molecular Biology, Brown University, Providence, Rhode Island USA
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, Rhode Island USA
| | - žiga Avsec
- Department of Informatics, Technical University of Munich, Boltzmannstraße, Garching, 85748 Germany
- Graduate School of Quantitative Biosciences (QBM), Ludwig-Maximilians-Universität München, München, Germany
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, Boltzmannstraße, Garching, 85748 Germany
| |
Collapse
|
25
|
Niroula A, Vihinen M. How good are pathogenicity predictors in detecting benign variants? PLoS Comput Biol 2019; 15:e1006481. [PMID: 30742610 PMCID: PMC6386394 DOI: 10.1371/journal.pcbi.1006481] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 02/22/2019] [Accepted: 12/19/2018] [Indexed: 01/07/2023] Open
Abstract
Computational tools are widely used for interpreting variants detected in sequencing projects. The choice of these tools is critical for reliable variant impact interpretation for precision medicine and should be based on systematic performance assessment. The performance of the methods varies widely in different performance assessments, for example due to the contents and sizes of test datasets. To address this issue, we obtained 63,160 common amino acid substitutions (allele frequency ≥1% and <25%) from the Exome Aggregation Consortium (ExAC) database, which contains variants from 60,706 genomes or exomes. We evaluated the specificity, the capability to detect benign variants, for 10 variant interpretation tools. In addition to overall specificity of the tools, we tested their performance for variants in six geographical populations. PON-P2 had the best performance (95.5%) followed by FATHMM (86.4%) and VEST (83.5%). While these tools had excellent performance, the poorest method predicted more than one third of the benign variants to be disease-causing. The results allow choosing reliable methods for benign variant interpretation, for both research and clinical purposes, as well as provide a benchmark for method developers. In precision/personalized medicine of many conditions it is essential to investigate individual’s genome. Interpretation of the observed variation (mutation) sets is feasible only with computational approaches. We assessed the performance of variant pathogenicity/tolerance prediction programs on benign variants. Variants were obtained from high-quality ExAC database and selected to have minor allele frequency between 1 and 25%. We obtained 63,160 such cases and investigated 10 widely used predictors. Specificities of the methods showed large differences, from 64 to 96%, thus users of these methods have to be careful when choosing the one(s) they will use. We investigated further the performances on different populations, allele frequencies, separately for males and females, chromosome wise and for population unique and non-unique variants. The ranking of the tools remained the same in all these scenarios, i.e. the best methods were the best irrespective on how the data was filtered and grouped. This is to our knowledge the first large scale evaluation of method performance on benign variants.
Collapse
Affiliation(s)
- Abhishek Niroula
- Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, Lund, Sweden
| | - Mauno Vihinen
- Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, Lund, Sweden
- * E-mail:
| |
Collapse
|
26
|
Dyke SOM, Linden M, Lappalainen I, De Argila JR, Carey K, Lloyd D, Spalding JD, Cabili MN, Kerry G, Foreman J, Cutts T, Shabani M, Rodriguez LL, Haeussler M, Walsh B, Jiang X, Wang S, Perrett D, Boughtwood T, Matern A, Brookes AJ, Cupak M, Fiume M, Pandya R, Tulchinsky I, Scollen S, Törnroos J, Das S, Evans AC, Malin BA, Beck S, Brenner SE, Nyrönen T, Blomberg N, Firth HV, Hurles M, Philippakis AA, Rätsch G, Brudno M, Boycott KM, Rehm HL, Baudis M, Sherry ST, Kato K, Knoppers BM, Baker D, Flicek P. Registered access: authorizing data access. Eur J Hum Genet 2018; 26:1721-1731. [PMID: 30069064 PMCID: PMC6244209 DOI: 10.1038/s41431-018-0219-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 05/08/2018] [Accepted: 06/20/2018] [Indexed: 12/14/2022] Open
Abstract
The Global Alliance for Genomics and Health (GA4GH) proposes a data access policy model-"registered access"-to increase and improve access to data requiring an agreement to basic terms and conditions, such as the use of DNA sequence and health data in research. A registered access policy would enable a range of categories of users to gain access, starting with researchers and clinical care professionals. It would also facilitate general use and reuse of data but within the bounds of consent restrictions and other ethical obligations. In piloting registered access with the Scientific Demonstration data sharing projects of GA4GH, we provide additional ethics, policy and technical guidance to facilitate the implementation of this access model in an international setting.
Collapse
Affiliation(s)
- Stephanie O M Dyke
- Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, QC, Canada.
- Montreal Neurological Institute, Faculty of Medicine, McGill University, Montreal, QC, Canada.
| | - Mikael Linden
- CSC - IT Center for Science, Espoo, Finland
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ilkka Lappalainen
- CSC - IT Center for Science, Espoo, Finland
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Jordi Rambla De Argila
- Centre for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | | | - David Lloyd
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
- The Global Alliance for Genomics and Health, MaRS Centre, West Tower, 661 University Avenue, Suite 510, Toronto, M5G 0A3, ON, Canada
| | - J Dylan Spalding
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Giselle Kerry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Julia Foreman
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Tim Cutts
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Mahsa Shabani
- Center for Biomedical Ethics and Law, Department of Public Health and Primary Care, University of Leuven, Leuven, Belgium
| | | | | | | | - Xiaoqian Jiang
- Department of Biomedical Informatics, UC San Diego, La Jolla, CA, USA
| | - Shuang Wang
- Department of Biomedical Informatics, UC San Diego, La Jolla, CA, USA
| | - Daniel Perrett
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Tiffany Boughtwood
- Australian Genomics Health Alliance, 50 Flemington Road, Parkville, VIC, 3052, Australia
| | | | - Anthony J Brookes
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | | | | | | | | | - Serena Scollen
- ELIXIR Hub, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Samir Das
- McGill Centre for Integrative Neurosciences, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| | - Alan C Evans
- McGill Centre for Integrative Neurosciences, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| | | | - Stephan Beck
- UCL Cancer Institute, University College London, London, UK
| | - Steven E Brenner
- Department of Plant & Microbial Biology, University of California, Berkeley, CA, USA
| | - Tommi Nyrönen
- CSC - IT Center for Science, Espoo, Finland
- ELIXIR Compute Platform, ELIXIR, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Helen V Firth
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Matthew Hurles
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Gunnar Rätsch
- Department of Computer Science, Biomedical Informatics, ETH Zurich, Zurich, Switzerland
| | - Michael Brudno
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON, Canada
| | - Kym M Boycott
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON, Canada
| | - Heidi L Rehm
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Pathology, Brigham & Women's Hospital & Harvard Medical School, Boston, MA, USA
| | - Michael Baudis
- University of Zurich & Swiss Institute of Bioinformatics, Zurich, Switzerland
| | - Stephen T Sherry
- National Centre for Biotechnology Information, US National Library of Medicine, Bethesda, MD, USA
| | - Kazuto Kato
- Department of Biomedical Ethics and Public Policy, Graduate School of Medicine, Osaka University, Osaka, Japan
| | - Bartha M Knoppers
- Centre of Genomics and Policy, Faculty of Medicine, McGill University, Montreal, QC, Canada
| | - Dixie Baker
- Martin, Blanck & Associates, Alexandria, VA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| |
Collapse
|
27
|
Deep learning in biomedicine. Nat Biotechnol 2018; 36:829-838. [PMID: 30188539 DOI: 10.1038/nbt.4233] [Citation(s) in RCA: 297] [Impact Index Per Article: 42.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Accepted: 08/01/2018] [Indexed: 12/12/2022]
|
28
|
Clark WT, Yu GK, Aoyagi-Scharber M, LeBowitz JH. Utilizing ExAC to assess the hidden contribution of variants of unknown significance to Sanfilippo Type B incidence. PLoS One 2018; 13:e0200008. [PMID: 29979746 PMCID: PMC6034809 DOI: 10.1371/journal.pone.0200008] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Accepted: 06/18/2018] [Indexed: 01/30/2023] Open
Abstract
Given the large and expanding quantity of publicly available sequencing data, it should be possible to extract incidence information for monogenic diseases from allele frequencies, provided one knows which mutations are causal. We tested this idea on a rare, monogenic, lysosomal storage disorder, Sanfilippo Type B (Mucopolysaccharidosis type IIIB). Sanfilippo Type B is caused by mutations in the gene encoding α-N-acetylglucosaminidase (NAGLU). There were 189 NAGLU missense variants found in the ExAC dataset that comprises roughly 60,000 individual exomes. Only 24 of the 189 missense variants were known to be pathogenic; the remaining 165 variants were of unknown significance (VUS), and their potential contribution to disease is unknown. To address this problem, we measured enzymatic activities of 164 NAGLU missense VUS in the ExAC dataset and developed a statistical framework for estimating disease incidence with associated confidence intervals. We found that 25% of VUS decreased the activity of NAGLU to levels consistent with Sanfilippo Type B pathogenic alleles. We found that a substantial fraction of Sanfilippo Type B incidence (67%) could be accounted for by novel mutations not previously identified in patients, illustrating the utility of combining functional activity data for VUS with population-wide allele frequency data in estimating disease incidence.
Collapse
Affiliation(s)
- Wyatt T. Clark
- BioMarin Pharmaceutical, San Rafael, CA, United States of America
| | - G. Karen Yu
- BioMarin Pharmaceutical, San Rafael, CA, United States of America
| | | | | |
Collapse
|
29
|
Lee PH, Lee C, Li X, Wee B, Dwivedi T, Daly M. Principles and methods of in-silico prioritization of non-coding regulatory variants. Hum Genet 2018; 137:15-30. [PMID: 29288389 PMCID: PMC5892192 DOI: 10.1007/s00439-017-1861-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 12/14/2017] [Indexed: 12/13/2022]
Abstract
Over a decade of genome-wide association, studies have made great strides toward the detection of genes and genetic mechanisms underlying complex traits. However, the majority of associated loci reside in non-coding regions that are functionally uncharacterized in general. Now, the availability of large-scale tissue and cell type-specific transcriptome and epigenome data enables us to elucidate how non-coding genetic variants can affect gene expressions and are associated with phenotypic changes. Here, we provide an overview of this emerging field in human genomics, summarizing available data resources and state-of-the-art analytic methods to facilitate in-silico prioritization of non-coding regulatory mutations. We also highlight the limitations of current approaches and discuss the direction of much-needed future research.
Collapse
Affiliation(s)
- Phil H Lee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA.
- Quantitative Genomics Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Christian Lee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- Department of Life Sciences, Harvard University, Cambridge, MA, USA
| | - Xihao Li
- Quantitative Genomics Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Brian Wee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
| | - Tushar Dwivedi
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Mark Daly
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| |
Collapse
|