1
|
Jain S, Bakolitsa C, Brenner SE, Radivojac P, Moult J, Repo S, Hoskins RA, Andreoletti G, Barsky D, Chellapan A, Chu H, Dabbiru N, Kollipara NK, Ly M, Neumann AJ, Pal LR, Odell E, Pandey G, Peters-Petrulewicz RC, Srinivasan R, Yee SF, Yeleswarapu SJ, Zuhl M, Adebali O, Patra A, Beer MA, Hosur R, Peng J, Bernard BM, Berry M, Dong S, Boyle AP, Adhikari A, Chen J, Hu Z, Wang R, Wang Y, Miller M, Wang Y, Bromberg Y, Turina P, Capriotti E, Han JJ, Ozturk K, Carter H, Babbi G, Bovo S, Di Lena P, Martelli PL, Savojardo C, Casadio R, Cline MS, De Baets G, Bonache S, Díez O, Gutiérrez-Enríquez S, Fernández A, Montalban G, Ootes L, Özkan S, Padilla N, Riera C, De la Cruz X, Diekhans M, Huwe PJ, Wei Q, Xu Q, Dunbrack RL, Gotea V, Elnitski L, Margolin G, Fariselli P, Kulakovskiy IV, Makeev VJ, Penzar DD, Vorontsov IE, Favorov AV, Forman JR, Hasenahuer M, Fornasari MS, Parisi G, Avsec Z, Çelik MH, Nguyen TYD, Gagneur J, Shi FY, Edwards MD, Guo Y, Tian K, Zeng H, Gifford DK, Göke J, Zaucha J, Gough J, Ritchie GRS, Frankish A, Mudge JM, Harrow J, Young EL, Yu Y, Huff CD, Murakami K, Nagai Y, Imanishi T, Mungall CJ, Jacobsen JOB, Kim D, Jeong CS, Jones DT, Li MJ, Guthrie VB, Bhattacharya R, Chen YC, Douville C, Fan J, Kim D, Masica D, Niknafs N, Sengupta S, Tokheim C, Turner TN, Yeo HTG, Karchin R, Shin S, Welch R, Keles S, Li Y, Kellis M, Corbi-Verge C, Strokach AV, Kim PM, Klein TE, Mohan R, Sinnott-Armstrong NA, Wainberg M, Kundaje A, Gonzaludo N, Mak ACY, Chhibber A, Lam HYK, Dahary D, Fishilevich S, Lancet D, Lee I, Bachman B, Katsonis P, Lua RC, Wilson SJ, Lichtarge O, Bhat RR, Sundaram L, Viswanath V, Bellazzi R, Nicora G, Rizzo E, Limongelli I, Mezlini AM, Chang R, Kim S, Lai C, O’Connor R, Topper S, van den Akker J, Zhou AY, Zimmer AD, Mishne G, Bergquist TR, Breese MR, Guerrero RF, Jiang Y, Kiga N, Li B, Mort M, Pagel KA, Pejaver V, Stamboulian MH, Thusberg J, Mooney SD, Teerakulkittipong N, Cao C, Kundu K, Yin Y, Yu CH, Kleyman M, Lin CF, Stackpole M, Mount SM, Eraslan G, Mueller NS, Naito T, Rao AR, Azaria JR, Brodie A, Ofran Y, Garg A, Pal D, Hawkins-Hooker A, Kenlay H, Reid J, Mucaki EJ, Rogan PK, Schwarz JM, Searls DB, Lee GR, Seok C, Krämer A, Shah S, Huang CV, Kirsch JF, Shatsky M, Cao Y, Chen H, Karimi M, Moronfoye O, Sun Y, Shen Y, Shigeta R, Ford CT, Nodzak C, Uppal A, Shi X, Joseph T, Kotte S, Rana S, Rao A, Saipradeep VG, Sivadasan N, Sunderam U, Stanke M, Su A, Adzhubey I, Jordan DM, Sunyaev S, Rousseau F, Schymkowitz J, Van Durme J, Tavtigian SV, Carraro M, Giollo M, Tosatto SCE, Adato O, Carmel L, Cohen NE, Fenesh T, Holtzer T, Juven-Gershon T, Unger R, Niroula A, Olatubosun A, Väliaho J, Yang Y, Vihinen M, Wahl ME, Chang B, Chong KC, Hu I, Sun R, Wu WKK, Xia X, Zee BC, Wang MH, Wang M, Wu C, Lu Y, Chen K, Yang Y, Yates CM, Kreimer A, Yan Z, Yosef N, Zhao H, Wei Z, Yao Z, Zhou F, Folkman L, Zhou Y, Daneshjou R, Altman RB, Inoue F, Ahituv N, Arkin AP, Lovisa F, Bonvini P, Bowdin S, Gianni S, Mantuano E, Minicozzi V, Novak L, Pasquo A, Pastore A, Petrosino M, Puglisi R, Toto A, Veneziano L, Chiaraluce R, Ball MP, Bobe JR, Church GM, Consalvi V, Cooper DN, Buckley BA, Sheridan MB, Cutting GR, Scaini MC, Cygan KJ, Fredericks AM, Glidden DT, Neil C, Rhine CL, Fairbrother WG, Alontaga AY, Fenton AW, Matreyek KA, Starita LM, Fowler DM, Löscher BS, Franke A, Adamson SI, Graveley BR, Gray JW, Malloy MJ, Kane JP, Kousi M, Katsanis N, Schubach M, Kircher M, Mak ACY, Tang PLF, Kwok PY, Lathrop RH, Clark WT, Yu GK, LeBowitz JH, Benedicenti F, Bettella E, Bigoni S, Cesca F, Mammi I, Marino-Buslje C, Milani D, Peron A, Polli R, Sartori S, Stanzial F, Toldo I, Turolla L, Aspromonte MC, Bellini M, Leonardi E, Liu X, Marshall C, McCombie WR, Elefanti L, Menin C, Meyn MS, Murgia A, Nadeau KCY, Neuhausen SL, Nussbaum RL, Pirooznia M, Potash JB, Dimster-Denk DF, Rine JD, Sanford JR, Snyder M, Cote AG, Sun S, Verby MW, Weile J, Roth FP, Tewhey R, Sabeti PC, Campagna J, Refaat MM, Wojciak J, Grubb S, Schmitt N, Shendure J, Spurdle AB, Stavropoulos DJ, Walton NA, Zandi PP, Ziv E, Burke W, Chen F, Carr LR, Martinez S, Paik J, Harris-Wai J, Yarborough M, Fullerton SM, Koenig BA, McInnes G, Shigaki D, Chandonia JM, Furutsuki M, Kasak L, Yu C, Chen R, Friedberg I, Getz GA, Cong Q, Kinch LN, Zhang J, Grishin NV, Voskanian A, Kann MG, Tran E, Ioannidis NM, Hunter JM, Udani R, Cai B, Morgan AA, Sokolov A, Stuart JM, Minervini G, Monzon AM, Batzoglou S, Butte AJ, Greenblatt MS, Hart RK, Hernandez R, Hubbard TJP, Kahn S, O’Donnell-Luria A, Ng PC, Shon J, Veltman J, Zook JM. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol 2024; 25:53. [PMID: 38389099 PMCID: PMC10882881 DOI: 10.1186/s13059-023-03113-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2023] [Accepted: 11/17/2023] [Indexed: 02/24/2024] Open
Abstract
BACKGROUND The Critical Assessment of Genome Interpretation (CAGI) aims to advance the state-of-the-art for computational prediction of genetic variant impact, particularly where relevant to disease. The five complete editions of the CAGI community experiment comprised 50 challenges, in which participants made blind predictions of phenotypes from genetic data, and these were evaluated by independent assessors. RESULTS Performance was particularly strong for clinical pathogenic variants, including some difficult-to-diagnose cases, and extends to interpretation of cancer-related variants. Missense variant interpretation methods were able to estimate biochemical effects with increasing accuracy. Assessment of methods for regulatory variants and complex trait disease risk was less definitive and indicates performance potentially suitable for auxiliary use in the clinic. CONCLUSIONS Results show that while current methods are imperfect, they have major utility for research and clinical applications. Emerging methods and increasingly large, robust datasets for training and assessment promise further progress ahead.
Collapse
|
2
|
Asatryan B, Murray B, Gasperetti A, McClellan R, Barth AS. Unraveling Complexities in Genetically Elusive Long QT Syndrome. Circ Arrhythm Electrophysiol 2024; 17:e012356. [PMID: 38264885 DOI: 10.1161/circep.123.012356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
Genetic testing has become standard of care for patients with long QT syndrome (LQTS), providing diagnostic, prognostic, and therapeutic information for both probands and their family members. However, up to a quarter of patients with LQTS do not have identifiable Mendelian pathogenic variants in the currently known LQTS-associated genes. This absence of genetic confirmation, intriguingly, does not lessen the severity of LQTS, with the prognosis in these gene-elusive patients with unequivocal LQTS mirroring genotype-positive patients in the limited data available. Such a conundrum instigates an exploration into the causes of corrected QT interval (QTc) prolongation in these cases, unveiling a broad spectrum of potential scenarios and mechanisms. These include multiple environmental influences on QTc prolongation, exercise-induced repolarization abnormalities, and the profound implications of the constantly evolving nature of genetic testing and variant interpretation. In addition, the rapid advances in genetics have the potential to uncover new causal genes, and polygenic risk factors may aid in the diagnosis of high-risk patients. Navigating this multifaceted landscape requires a systematic approach and expert knowledge, integrating the dynamic nature of genetics and patient-specific influences for accurate diagnosis, management, and counseling of patients. The role of a subspecialized expert cardiogenetic clinic is paramount in evaluation to navigate this complexity. Amid these intricate aspects, this review outlines potential causes of gene-elusive LQTS. It also provides an outline for the evaluation of patients with negative and inconclusive genetic test results and underscores the need for ongoing adaptation and reassessment in our understanding of LQTS, as the complexities of gene-elusive LQTS are increasingly deciphered.
Collapse
Affiliation(s)
- Babken Asatryan
- Division of Cardiology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Brittney Murray
- Division of Cardiology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Alessio Gasperetti
- Division of Cardiology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Rebecca McClellan
- Division of Cardiology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| | - Andreas S Barth
- Division of Cardiology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD
| |
Collapse
|
3
|
Ahmad RM, Ali BR, Al-Jasmi F, Sinnott RO, Al Dhaheri N, Mohamad MS. A review of genetic variant databases and machine learning tools for predicting the pathogenicity of breast cancer. Brief Bioinform 2023; 25:bbad479. [PMID: 38149678 PMCID: PMC10782903 DOI: 10.1093/bib/bbad479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 09/22/2023] [Accepted: 12/04/2023] [Indexed: 12/28/2023] Open
Abstract
Studies continue to uncover contributing risk factors for breast cancer (BC) development including genetic variants. Advances in machine learning and big data generated from genetic sequencing can now be used for predicting BC pathogenicity. However, it is unclear which tool developed for pathogenicity prediction is most suited for predicting the impact and pathogenicity of variant effects. A significant challenge is to determine the most suitable data source for each tool since different tools can yield different prediction results with different data inputs. To this end, this work reviews genetic variant databases and tools used specifically for the prediction of BC pathogenicity. We provide a description of existing genetic variants databases and, where appropriate, the diseases for which they have been established. Through example, we illustrate how they can be used for prediction of BC pathogenicity and discuss their associated advantages and disadvantages. We conclude that the tools that are specialized by training on multiple diverse datasets from different databases for the same disease have enhanced accuracy and specificity and are thereby more helpful to the clinicians in predicting and diagnosing BC as early as possible.
Collapse
Affiliation(s)
- Rahaf M Ahmad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Bassam R Ali
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Fatma Al-Jasmi
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Richard O Sinnott
- School of Computing and Information System, Faculty of Engineering and Information Technology, The University of Melbourne, Melbourne, Victoria, Australia
| | - Noura Al Dhaheri
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Mohd Saberi Mohamad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| |
Collapse
|
4
|
Al Eissa MM, Alotibi RS, Alhaddad B, Aloraini T, Samman MS, AlAsiri A, Abouelhoda M, AlQahtani AS. Reclassifying variations of unknown significance in diseases affecting Saudi Arabia's population reveal new associations. Front Genet 2023; 14:1250317. [PMID: 38028588 PMCID: PMC10646566 DOI: 10.3389/fgene.2023.1250317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
Introduction: Physicians face diagnostic dilemmas upon reports indicating disease variants of unknown significance (VUS). The most puzzling cases are patients with rare diseases, where finding another matched genotype and phenotype to associate their results is challenging. This study aims to prove the value of updating patient files with new classifications, potentially leading to better assessment and prevention. Methodology: We recruited retrospective phenotypic and genotypic data from King Saud Medical City, Riyadh, Kingdom of Saudi Arabia. Between September 2020 and December 2021, 1,080 patients' genetic profiles were tested in a College of American Pathologists accredited laboratory. We excluded all confirmed pathogenic variants, likely pathogenic variants and copy number variations. Finally, we further reclassified 194 VUS using different local and global databases, employing in silico prediction to justify the phenotype-genotype association. Results: Of the 194 VUS, 90 remained VUS, and the other 104 were reclassified as follows: 16 pathogenic, 49 likely pathogenic, nine benign, and 30 likely benign. Moreover, most of these variants had never been observed in other local or international databases. Conclusion: Reclassifying the VUS adds value to understanding the causality of the phenotype if it has been reported in another family or population. The healthcare system should establish guidelines for re-evaluating VUS, and upgrading VUS should reflect on individual/family risks and management strategies.
Collapse
Affiliation(s)
- Mariam M. Al Eissa
- Public Health Authority, Public Health Lab, Molecular Genetics Laboratory, Riyadh, Saudi Arabia
- Medical School, AlFaisal University, Riyadh, Saudi Arabia
| | - Raniah S. Alotibi
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, King Saud bin Abdulaziz University for Health Sciences (KSAU-HS), Riyadh, Saudi Arabia
- King Abdullah International Medical Research Center (KAIMRC), Riyadh, Saudi Arabia
| | - Bader Alhaddad
- Laboratory Medicine Department, King Fahd University Hospital, Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
- Molecular Genetics Department, King Saud Medical City, Riyadh, Saudi Arabia
| | - Taghrid Aloraini
- Division of Translational Pathology, Department of Laboratory Medicine, King Abdulaziz Medical City, Riyadh, Saudi Arabia
- Department of Genetics, King Abdullah Specialized Children Hospital, King Abdulaziz Medical City, MNGHA, Riyadh, Saudi Arabia
| | - Manar S. Samman
- Department of Pathology and Clinical Laboratory Medicine Administration, King Fahad Medical City (KFMC), Riyadh, Saudi Arabia
| | - Abdulrahman AlAsiri
- Medical Genomics Research Department, King Abdullah International Medical Research Center, King Saud Bin Abdulaziz University for Health Sciences, Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia
- Department of Cardiology, Division of Heart and Lungs, University Medical Center Utrecht, University of Utrecht, Utrecht, Netherlands
| | - Mohamed Abouelhoda
- Chairman Computational Science Department at King Faisal Specialised Hospital and Research Center, KFSHRC, Riyadh, Saudi Arabia
| | - Amerh S. AlQahtani
- Medical Genetics Department, King Saud Medical City, Riyadh, Saudi Arabia
| |
Collapse
|
5
|
Asatryan B, Bleijendaal H, Wilde AAM. Toward advanced diagnosis and management of inherited arrhythmia syndromes: Harnessing the capabilities of artificial intelligence and machine learning. Heart Rhythm 2023; 20:1399-1407. [PMID: 37442407 DOI: 10.1016/j.hrthm.2023.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 06/20/2023] [Accepted: 07/02/2023] [Indexed: 07/15/2023]
Abstract
The use of advanced computational technologies, such as artificial intelligence (AI), is now exerting a significant influence on various aspects of life, including health care and science. AI has garnered remarkable public notice with the release of deep learning models that can model anything from artwork to academic papers with minimal human intervention. Machine learning, a method that uses algorithms to extract information from raw data and represent it in a model, and deep learning, a method that uses multiple layers to progressively extract higher-level features from the raw input with minimal human intervention, are increasingly leveraged to tackle problems in the health sector, including utilization for clinical decision support in cardiovascular medicine. Inherited arrhythmia syndromes are a clinical domain where multiple unanswered questions remain despite unprecedented progress over the past 2 decades with the introduction of large panel genetic testing and the first steps in precision medicine. In particular, AI tools can help address gaps in clinical diagnosis by identifying individuals with concealed or transient phenotypes; enhance risk stratification by elevating recognition of underlying risk burden beyond widely recognized risk factors; improve prediction of response to therapy, and further prognostication. In this contemporary review, we provide a summary of the AI models developed to solve challenges in inherited arrhythmia syndromes and also outline gaps that can be filled with the development of intelligent AI models.
Collapse
Affiliation(s)
- Babken Asatryan
- Division of Cardiology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.
| | - Hidde Bleijendaal
- University of Amsterdam, Heart Center; Department of Clinical and Experimental Cardiology, Amsterdam Cardiovascular Sciences, Heart Failure and Arrhythmias, Amsterdam, The Netherlands; Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, The Netherlands
| | - Arthur A M Wilde
- University of Amsterdam, Heart Center; Department of Clinical and Experimental Cardiology, Amsterdam Cardiovascular Sciences, Heart Failure and Arrhythmias, Amsterdam, The Netherlands; Department of Clinical Epidemiology, Biostatistics and Bioinformatics, University of Amsterdam, Amsterdam, The Netherlands; European Reference Network for Rare and Low Prevalence Complex Diseases of the Heart (ERN GUARD-Heart)
| |
Collapse
|
6
|
Kang M, Kim S, Lee DB, Hong C, Hwang KB. Gene-specific machine learning for pathogenicity prediction of rare BRCA1 and BRCA2 missense variants. Sci Rep 2023; 13:10478. [PMID: 37380723 DOI: 10.1038/s41598-023-37698-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/26/2023] [Indexed: 06/30/2023] Open
Abstract
Machine learning-based pathogenicity prediction helps interpret rare missense variants of BRCA1 and BRCA2, which are associated with hereditary cancers. Recent studies have shown that classifiers trained using variants of a specific gene or a set of genes related to a particular disease perform better than those trained using all variants, due to their higher specificity, despite the smaller training dataset size. In this study, we further investigated the advantages of "gene-specific" machine learning compared to "disease-specific" machine learning. We used 1068 rare (gnomAD minor allele frequency (MAF) < 0.005) missense variants of 28 genes associated with hereditary cancers for our investigation. Popular machine learning classifiers were employed: regularized logistic regression, extreme gradient boosting, random forests, support vector machines, and deep neural networks. As features, we used MAFs from multiple populations, functional prediction and conservation scores, and positions of variants. The disease-specific training dataset included the gene-specific training dataset and was > 7 × larger. However, we observed that gene-specific training variants were sufficient to produce the optimal pathogenicity predictor if a suitable machine learning classifier was employed. Therefore, we recommend gene-specific over disease-specific machine learning as an efficient and effective method for predicting the pathogenicity of rare BRCA1 and BRCA2 missense variants.
Collapse
Affiliation(s)
- Moonjong Kang
- Research Center, Software Division, NGeneBio, Seoul, 08390, Korea
| | - Seonhwa Kim
- Research Center, Software Division, NGeneBio, Seoul, 08390, Korea
| | - Da-Bin Lee
- Department of Computer Science and Engineering, Graduate School, Soongsil University, Seoul, 06978, Korea
| | - Changbum Hong
- Research Center, Software Division, NGeneBio, Seoul, 08390, Korea.
| | - Kyu-Baek Hwang
- Department of Computer Science and Engineering, Graduate School, Soongsil University, Seoul, 06978, Korea.
| |
Collapse
|
7
|
Walter W, Pohlkamp C, Meggendorfer M, Nadarajah N, Kern W, Haferlach C, Haferlach T. Artificial intelligence in hematological diagnostics: Game changer or gadget? Blood Rev 2023; 58:101019. [PMID: 36241586 DOI: 10.1016/j.blre.2022.101019] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 09/21/2022] [Accepted: 10/03/2022] [Indexed: 11/30/2022]
Abstract
The future of clinical diagnosis and treatment of hematologic diseases will inevitably involve the integration of artificial intelligence (AI)-based systems into routine practice to support the hematologists' decision making. Several studies have shown that AI-based models can already be used to automatically differentiate cells, reliably detect malignant cell populations, support chromosome banding analysis, and interpret clinical variants, contributing to early disease detection and prognosis. However, even the best tool can become useless if it is misapplied or the results are misinterpreted. Therefore, in order to comprehensively judge and correctly apply newly developed AI-based systems, the hematologist must have a basic understanding of the general concepts of machine learning. In this review, we provide the hematologist with a comprehensive overview of various machine learning techniques, their current implementations and approaches in different diagnostic subfields (e.g., cytogenetics, molecular genetics), and the limitations and unresolved challenges of the systems.
Collapse
Affiliation(s)
- Wencke Walter
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Christian Pohlkamp
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Manja Meggendorfer
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Niroshan Nadarajah
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Wolfgang Kern
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Claudia Haferlach
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| | - Torsten Haferlach
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 München, Germany.
| |
Collapse
|
8
|
Karalidou V, Kalfakakou D, Papathanasiou A, Fostira F, Matsopoulos GK. MARGINAL: An Automatic Classification of Variants in BRCA1 and BRCA2 Genes Using a Machine Learning Model. Biomolecules 2022; 12:biom12111552. [PMID: 36358902 PMCID: PMC9687470 DOI: 10.3390/biom12111552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/10/2022] [Accepted: 10/20/2022] [Indexed: 12/29/2022] Open
Abstract
Implementation of next-generation sequencing (NGS) for the genetic analysis of hereditary diseases has resulted in a vast number of genetic variants identified daily, leading to inadequate variant interpretation and, consequently, a lack of useful clinical information for treatment decisions. Herein, we present MARGINAL 1.0.0, a machine learning (ML)-based software for the interpretation of rare BRCA1 and BRCA2 germline variants. MARGINAL software classifies variants into three categories, namely, (likely) pathogenic, of uncertain significance and (likely) benign, implementing the criteria established by the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG-AMP). We first annotated BRCA1 and BRCA2 variants using various sources. Then, we automatically implemented the ACMG-AMP criteria, and we finally constructed the ML model for variant classification. To maximize accuracy, we compared the performance of eight different ML algorithms in a classification scheme based on a serial combination of two classifiers. The model showed high predictive abilities with maximum accuracy of 92% and 98%, recall of 92% and 98% and specificity of 90% and 98% for the first and second classifiers, respectively. Our results indicate that using a gene and disease-specific ML automated software for clinical variant evaluation can minimize conflicting interpretations.
Collapse
Affiliation(s)
- Vasiliki Karalidou
- School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Athens, Greece
- Correspondence:
| | - Despoina Kalfakakou
- Molecular Diagnostics Laboratory, INRaSTES, National Center for Scientific Research NCSR Demokritos, 15341 Athens, Greece
| | - Athanasios Papathanasiou
- Molecular Diagnostics Laboratory, INRaSTES, National Center for Scientific Research NCSR Demokritos, 15341 Athens, Greece
| | - Florentia Fostira
- Molecular Diagnostics Laboratory, INRaSTES, National Center for Scientific Research NCSR Demokritos, 15341 Athens, Greece
| | - George K. Matsopoulos
- School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Athens, Greece
| |
Collapse
|
9
|
Liu Y, Yeung WSB, Chiu PCN, Cao D. Computational approaches for predicting variant impact: An overview from resources, principles to applications. Front Genet 2022; 13:981005. [PMID: 36246661 PMCID: PMC9559863 DOI: 10.3389/fgene.2022.981005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
One objective of human genetics is to unveil the variants that contribute to human diseases. With the rapid development and wide use of next-generation sequencing (NGS), massive genomic sequence data have been created, making personal genetic information available. Conventional experimental evidence is critical in establishing the relationship between sequence variants and phenotype but with low efficiency. Due to the lack of comprehensive databases and resources which present clinical and experimental evidence on genotype-phenotype relationship, as well as accumulating variants found from NGS, different computational tools that can predict the impact of the variants on phenotype have been greatly developed to bridge the gap. In this review, we present a brief introduction and discussion about the computational approaches for variant impact prediction. Following an innovative manner, we mainly focus on approaches for non-synonymous variants (nsSNVs) impact prediction and categorize them into six classes. Their underlying rationale and constraints, together with the concerns and remedies raised from comparative studies are discussed. We also present how the predictive approaches employed in different research. Although diverse constraints exist, the computational predictive approaches are indispensable in exploring genotype-phenotype relationship.
Collapse
Affiliation(s)
- Ye Liu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - William S. B. Yeung
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Philip C. N. Chiu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- *Correspondence: Philip C. N. Chiu, ; Dandan Cao,
| | - Dandan Cao
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- *Correspondence: Philip C. N. Chiu, ; Dandan Cao,
| |
Collapse
|
10
|
A hybrid approach for lung cancer diagnosis using optimized random forest classification and K-means visualization algorithm. HEALTH AND TECHNOLOGY 2022. [DOI: 10.1007/s12553-022-00679-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
11
|
Lyon E, Temple-Smolkin RL, Hegde M, Gastier-Foster JM, Palomaki GE, Richards CS. An Educational Assessment of Evidence Used for Variant Classification: A Report of the Association for Molecular Pathology. J Mol Diagn 2022; 24:555-565. [PMID: 35429647 DOI: 10.1016/j.jmoldx.2021.12.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 11/12/2021] [Accepted: 12/10/2021] [Indexed: 11/25/2022] Open
Abstract
The Association for Molecular Pathology Variant Interpretation Testing Among Laboratories (VITAL) Working Group convened to evaluate the Standards and Guidelines for the Interpretation of Sequence Variants implementation into clinical practice, identify problematic classification rules, and define implementation challenges. Variants and associated clinical information were provided to volunteer respondents. Participant variant classifications were compared with intended consensus-derived classifications of the Working Group. The 24 variant challenges received 1379 responses; 1119 agreed with the intended response (81%; 95% CI, 79% to 83%). Agreement ranged from 44% to 100%, with 16 challenges (67%; 47% to 82%) reaching consensus (≥80% agreement). Participant classifications were also compared to a calculated interpretation of the ACMG Guidelines using the participant-reported criteria as input. The 24 variant challenges had 1368 responses with specific evidence provided and 1121 (82%; 80% to 84%) agreed with the calculated interpretation. Agreement for challenges ranged from 63% to 98%; 15 (63%; 43% to 79%) reaching consensus. Among 81 individual participants, 32 (40%; 30% to 50%) reached agreement with at least 80% of the intended classifications and 42 (52%; 41% to 62%) with the calculated classifications. This study demonstrated that although variant classification remains challenging, published guidelines are being utilized and adapted to improve variant calling consensus. This study identified situations where clarifications are warranted and provides a model for competency assessment.
Collapse
Affiliation(s)
- Elaine Lyon
- The Variant Interpretation Testing Among Laboratories (VITAL) Working Group of the Clinical Practice Committee, Association for Molecular Pathology (AMP), Rockville, Maryland; HudsonAlpha Institute for Biotechnology, Huntsville, Alabama
| | | | - Madhuri Hegde
- The Variant Interpretation Testing Among Laboratories (VITAL) Working Group of the Clinical Practice Committee, Association for Molecular Pathology (AMP), Rockville, Maryland; Global Genetics Laboratory, PerkinElmer Genomics, Pittsburgh, Pennsylvania
| | - Julie M Gastier-Foster
- The Variant Interpretation Testing Among Laboratories (VITAL) Working Group of the Clinical Practice Committee, Association for Molecular Pathology (AMP), Rockville, Maryland; Departments of Pediatrics and Pathology/Immunology, Baylor College of Medicine, Houston, Texas; Pathology Department, Texas Children's Hospital, Houston, Texas; Department of Pathology, The Ohio State University College of Medicine, Columbus, Ohio
| | - Glenn E Palomaki
- The Variant Interpretation Testing Among Laboratories (VITAL) Working Group of the Clinical Practice Committee, Association for Molecular Pathology (AMP), Rockville, Maryland; Department of Pathology and Laboratory Medicine, Women & Infants Hospital and the Alpert Medical School at Brown University, Providence, Rhode Island
| | - C Sue Richards
- The Variant Interpretation Testing Among Laboratories (VITAL) Working Group of the Clinical Practice Committee, Association for Molecular Pathology (AMP), Rockville, Maryland; Department of Molecular and Medical Genetics and Knight Diagnostic Laboratories, Oregon Health & Science University, Portland, Oregon.
| |
Collapse
|
12
|
Aloraini T, Aljouie A, Alniwaider R, Alharbi W, Alsubaie L, AlTuraif W, Qureshi W, Alswaid A, Eyiad W, Al Mutairi F, Ababneh F, Alfadhel M, Alfares A. The variant artificial intelligence easy scoring (VARIES) system. Comput Biol Med 2022; 145:105492. [PMID: 35585733 DOI: 10.1016/j.compbiomed.2022.105492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 03/30/2022] [Accepted: 04/02/2022] [Indexed: 11/03/2022]
|
13
|
A machine learning approach based on ACMG/AMP guidelines for genomic variant classification and prioritization. Sci Rep 2022; 12:2517. [PMID: 35169226 PMCID: PMC8847497 DOI: 10.1038/s41598-022-06547-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 01/07/2022] [Indexed: 01/19/2023] Open
Abstract
Genomic variant interpretation is a critical step of the diagnostic procedure, often supported by the application of tools that may predict the damaging impact of each variant or provide a guidelines-based classification. We propose the application of Machine Learning methodologies, in particular Penalized Logistic Regression, to support variant classification and prioritization. Our approach combines ACMG/AMP guidelines for germline variant interpretation as well as variant annotation features and provides a probabilistic score of pathogenicity, thus supporting the prioritization and classification of variants that would be interpreted as uncertain by the ACMG/AMP guidelines. We compared different approaches in terms of variant prioritization and classification on different datasets, showing that our data-driven approach is able to solve more variant of uncertain significance (VUS) cases in comparison with guidelines-based approaches and in silico prediction tools.
Collapse
|
14
|
Brooks-Warburton J, Ashton J, Dhar A, Tham T, Allen PB, Hoque S, Lovat LB, Sebastian S. Artificial intelligence and inflammatory bowel disease: practicalities and future prospects. Frontline Gastroenterol 2021; 13:325-331. [PMID: 35722596 PMCID: PMC9186028 DOI: 10.1136/flgastro-2021-102003] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 11/16/2021] [Indexed: 02/04/2023] Open
Abstract
Artificial intelligence (AI) is an emerging technology predicted to have significant applications in healthcare. This review highlights AI applications that impact the patient journey in inflammatory bowel disease (IBD), from genomics to endoscopic applications in disease classification, stratification and self-monitoring to risk stratification for personalised management. We discuss the practical AI applications currently in use while giving a balanced view of concerns and pitfalls and look to the future with the potential of where AI can provide significant value to the care of the patient with IBD.
Collapse
Affiliation(s)
- Johanne Brooks-Warburton
- Department of Clinical Pharmacology and Biological Sciences, University of Hertfordshire, Hatfield, UK,Gastroenterology Department, Lister Hospital, Stevenage, UK
| | - James Ashton
- Paediatric Gastroenterology, Southampton University Hospitals NHS Trust, Southampton, UK
| | - Anjan Dhar
- Gastroenterology, County Durham & Darlington NHS Foundation Trust, Bishop Auckland, UK
| | - Tony Tham
- Department of Gastroenterology, Ulster Hospital, Dundonald, UK
| | - Patrick B Allen
- Department of Gastroenterology, Ulster Hospital, Dundonald, UK
| | - Sami Hoque
- Department of Gastroenterology, Barts Health NHS Trust, London, UK
| | - Laurence B Lovat
- Division of Surgery & Interventional Science, University College London, London, UK
| | - Shaji Sebastian
- Department of Gastroenterology, Hull University Teaching Hospitals NHS Trust, Hull, UK,Hull York Medical School, Hull, UK
| |
Collapse
|
15
|
Machine learning random forest for predicting oncosomatic variant NGS analysis. Sci Rep 2021; 11:21820. [PMID: 34750410 PMCID: PMC8575902 DOI: 10.1038/s41598-021-01253-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 10/21/2021] [Indexed: 12/02/2022] Open
Abstract
Since 2017, we have used IonTorrent NGS platform in our hospital to diagnose and treat cancer. Analyzing variants at each run requires considerable time, and we are still struggling with some variants that appear correct on the metrics at first, but are found to be negative upon further investigation. Can any machine learning algorithm (ML) help us classify NGS variants? This has led us to investigate which ML can fit our NGS data and to develop a tool that can be routinely implemented to help biologists. Currently, one of the greatest challenges in medicine is processing a significant quantity of data. This is particularly true in molecular biology with the advantage of next-generation sequencing (NGS) for profiling and identifying molecular tumors and their treatment. In addition to bioinformatics pipelines, artificial intelligence (AI) can be valuable in helping to analyze mutation variants. Generating sequencing data from patient DNA samples has become easy to perform in clinical trials. However, analyzing the massive quantities of genomic or transcriptomic data and extracting the key biomarkers associated with a clinical response to a specific therapy requires a formidable combination of scientific expertise, biomolecular skills and a panel of bioinformatic and biostatistic tools, in which artificial intelligence is now successful in developing future routine diagnostics. However, cancer genome complexity and technical artifacts make identifying real variants challenging. We present a machine learning method for classifying pathogenic single nucleotide variants (SNVs), single nucleotide polymorphisms (SNPs), multiple nucleotide variants (MNVs), insertions, and deletions detected by NGS from different types of tumor specimens, such as: colorectal, melanoma, lung and glioma cancer. We compared our NGS data to different machine learning algorithms using the k-fold cross-validation method and to neural networks (deep learning) to measure the performance of the different ML algorithms and determine which one is a valid model for confirming NGS variant calls in cancer diagnosis. We trained our machine learning with 70% of our data samples, extracted from our local database (our data structure had 7 parameters: chromosome, position, exon, variant allele frequency, minor allele frequency, coverage and protein description) and validated it with the 30% remaining data. The model offering the best accuracy was chosen and implemented in the NGS analysis routine. Artificial intelligence was developed with the R script language version 3.6.0. We trained our model on 70% of 102,011 variants. Our best error rate (0.22%) was found with random forest machine learning (ntree = 500 and mtry = 4), with an AUC of 0.99. Neural networks achieved some good scores. The final trained model with the neural network achieved an accuracy of 98% and an ROC-AUC of 0.99 with validation data. We tested our RF model to interpret more than 2000 variants from our NGS database: 20 variants were misclassified (error rate < 1%). The errors were nomenclature problems and false positives. After adding false positives to our training database and implementing our RF model routinely, our error rate was always < 0.5%. The RF model shows excellent results for oncosomatic NGS interpretation and can easily be implemented in other molecular biology laboratories. AI is becoming increasingly important in molecular biomedical analysis and can be very helpful in processing medical data. Neural networks show a good capacity in variant classification, and in the future, they may be useful in predicting more complex variants.
Collapse
|
16
|
Chen HC, Wang J, Liu Q, Shyr Y. A domain damage index to prioritizing the pathogenicity of missense variants. Hum Mutat 2021; 42:1503-1517. [PMID: 34350656 PMCID: PMC8511099 DOI: 10.1002/humu.24269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 07/08/2021] [Accepted: 07/30/2021] [Indexed: 11/09/2022]
Abstract
Prioritizing causal variants is one major challenge for the clinical application of sequencing data. Prompted by the observation that 74.3% of missense pathogenic variants locate in protein domains, we developed an approach named domain damage index (DDI). DDI identifies protein domains depleted of rare missense variations in the general population, which can be further used as a metric to prioritize variants. DDI is significantly correlated with phylogenetic conservation, variant-level metrics, and reported pathogenicity. DDI achieved great performance for distinguishing pathogenic variants from benign ones in three benchmark datasets. The combination of DDI with the other two best approaches improved the performance of each individual method considerably, suggesting DDI provides a powerful and complementary way of variant prioritization.
Collapse
Affiliation(s)
- Hua-Chang Chen
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Jing Wang
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Qi Liu
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Yu Shyr
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
- Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| |
Collapse
|
17
|
Herman DS, Rhoads DD, Schulz WL, Durant TJS. Artificial Intelligence and Mapping a New Direction in Laboratory Medicine: A Review. Clin Chem 2021; 67:1466-1482. [PMID: 34557917 DOI: 10.1093/clinchem/hvab165] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 07/26/2021] [Indexed: 12/21/2022]
Abstract
BACKGROUND Modern artificial intelligence (AI) and machine learning (ML) methods are now capable of completing tasks with performance characteristics that are comparable to those of expert human operators. As a result, many areas throughout healthcare are incorporating these technologies, including in vitro diagnostics and, more broadly, laboratory medicine. However, there are limited literature reviews of the landscape, likely future, and challenges of the application of AI/ML in laboratory medicine. CONTENT In this review, we begin with a brief introduction to AI and its subfield of ML. The ensuing sections describe ML systems that are currently in clinical laboratory practice or are being proposed for such use in recent literature, ML systems that use laboratory data outside the clinical laboratory, challenges to the adoption of ML, and future opportunities for ML in laboratory medicine. SUMMARY AI and ML have and will continue to influence the practice and scope of laboratory medicine dramatically. This has been made possible by advancements in modern computing and the widespread digitization of health information. These technologies are being rapidly developed and described, but in comparison, their implementation thus far has been modest. To spur the implementation of reliable and sophisticated ML-based technologies, we need to establish best practices further and improve our information system and communication infrastructure. The participation of the clinical laboratory community is essential to ensure that laboratory data are sufficiently available and incorporated conscientiously into robust, safe, and clinically effective ML-supported clinical diagnostics.
Collapse
Affiliation(s)
- Daniel S Herman
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel D Rhoads
- Department of Laboratory Medicine, Cleveland Clinic, Cleveland, OH, USA.,Department of Pathology, Cleveland Clinic Lerner College of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Wade L Schulz
- Department of Laboratory Medicine, Yale University, New Haven, CT, USA
| | - Thomas J S Durant
- Department of Laboratory Medicine, Yale University, New Haven, CT, USA
| |
Collapse
|
18
|
Walter W, Haferlach C, Nadarajah N, Schmidts I, Kühn C, Kern W, Haferlach T. How artificial intelligence might disrupt diagnostics in hematology in the near future. Oncogene 2021; 40:4271-4280. [PMID: 34103684 PMCID: PMC8225509 DOI: 10.1038/s41388-021-01861-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 05/11/2021] [Accepted: 05/24/2021] [Indexed: 02/07/2023]
Abstract
Artificial intelligence (AI) is about to make itself indispensable in the health care sector. Examples of successful applications or promising approaches range from the application of pattern recognition software to pre-process and analyze digital medical images, to deep learning algorithms for subtype or disease classification, and digital twin technology and in silico clinical trials. Moreover, machine-learning techniques are used to identify patterns and anomalies in electronic health records and to perform ad-hoc evaluations of gathered data from wearable health tracking devices for deep longitudinal phenotyping. In the last years, substantial progress has been made in automated image classification, reaching even superhuman level in some instances. Despite the increasing awareness of the importance of the genetic context, the diagnosis in hematology is still mainly based on the evaluation of the phenotype. Either by the analysis of microscopic images of cells in cytomorphology or by the analysis of cell populations in bidimensional plots obtained by flow cytometry. Here, AI algorithms not only spot details that might escape the human eye, but might also identify entirely new ways of interpreting these images. With the introduction of high-throughput next-generation sequencing in molecular genetics, the amount of available information is increasing exponentially, priming the field for the application of machine learning approaches. The goal of all the approaches is to allow personalized and informed interventions, to enhance treatment success, to improve the timeliness and accuracy of diagnoses, and to minimize technically induced misclassifications. The potential of AI-based applications is virtually endless but where do we stand in hematology and how far can we go?
Collapse
|
19
|
Favalli V, Tini G, Bonetti E, Vozza G, Guida A, Gandini S, Pelicci PG, Mazzarella L. Machine learning-based reclassification of germline variants of unknown significance: The RENOVO algorithm. Am J Hum Genet 2021; 108:682-695. [PMID: 33761318 DOI: 10.1016/j.ajhg.2021.03.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 03/01/2021] [Indexed: 01/20/2023] Open
Abstract
The increasing scope of genetic testing allowed by next-generation sequencing (NGS) dramatically increased the number of genetic variants to be interpreted as pathogenic or benign for adequate patient management. Still, the interpretation process often fails to deliver a clear classification, resulting in either variants of unknown significance (VUSs) or variants with conflicting interpretation of pathogenicity (CIP); these represent a major clinical problem because they do not provide useful information for decision-making, causing a large fraction of genetically determined disease to remain undertreated. We developed a machine learning (random forest)-based tool, RENOVO, that classifies variants as pathogenic or benign on the basis of publicly available information and provides a pathogenicity likelihood score (PLS). Using the same feature classes recommended by guidelines, we trained RENOVO on established pathogenic/benign variants in ClinVar (training set accuracy = 99%) and tested its performance on variants whose interpretation has changed over time (test set accuracy = 95%). We further validated the algorithm on additional datasets including unreported variants validated either through expert consensus (ENIGMA) or laboratory-based functional techniques (on BRCA1/2 and SCN5A). On all datasets, RENOVO outperformed existing automated interpretation tools. On the basis of the above validation metrics, we assigned a defined PLS to all existing ClinVar VUSs, proposing a reclassification for 67% with >90% estimated precision. RENOVO provides a validated tool to reduce the fraction of uninterpreted or misinterpreted variants, tackling an area of unmet need in modern clinical genetics.
Collapse
Affiliation(s)
- Valentina Favalli
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy
| | - Giulia Tini
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy
| | - Emanuele Bonetti
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy
| | - Gianluca Vozza
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy
| | - Alessandro Guida
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy; Biomedical Translational Imaging Centre, Nova Scotia Health Authority and IWK Health Centre, Halifax, NS B3K 6R8, Canada
| | - Sara Gandini
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy
| | - Pier Giuseppe Pelicci
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy
| | - Luca Mazzarella
- Department of Experimental Oncology, IEO, European Institute of Oncology IRCCS, Milan, Italy.
| |
Collapse
|
20
|
Lai C, Zimmer AD, O'Connor R, Kim S, Chan R, van den Akker J, Zhou AY, Topper S, Mishne G. LEAP: Using machine learning to support variant classification in a clinical setting. Hum Mutat 2020; 41:1079-1090. [PMID: 32176384 PMCID: PMC7317941 DOI: 10.1002/humu.24011] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 02/03/2020] [Accepted: 03/03/2020] [Indexed: 01/14/2023]
Abstract
Advances in genome sequencing have led to a tremendous increase in the discovery of novel missense variants, but evidence for determining clinical significance can be limited or conflicting. Here, we present Learning from Evidence to Assess Pathogenicity (LEAP), a machine learning model that utilizes a variety of feature categories to classify variants, and achieves high performance in multiple genes and different health conditions. Feature categories include functional predictions, splice predictions, population frequencies, conservation scores, protein domain data, and clinical observation data such as personal and family history and covariant information. L2‐regularized logistic regression and random forest classification models were trained on missense variants detected and classified during the course of routine clinical testing at Color Genomics (14,226 variants from 24 cancer‐related genes and 5,398 variants from 30 cardiovascular‐related genes). Using 10‐fold cross‐validated predictions, the logistic regression model achieved an area under the receiver operating characteristic curve (AUROC) of 97.8% (cancer) and 98.8% (cardiovascular), while the random forest model achieved 98.3% (cancer) and 98.6% (cardiovascular). We demonstrate generalizability to different genes by validating predictions on genes withheld from training (96.8% AUROC). High accuracy and broad applicability make LEAP effective in the clinical setting as a high‐throughput quality control layer.
Collapse
Affiliation(s)
- Carmen Lai
- Data Science, Color Genomics, Burlingame, California
| | | | | | - Serra Kim
- Variant Science, Color Genomics, Burlingame, California
| | - Ray Chan
- Variant Science, Color Genomics, Burlingame, California
| | | | - Alicia Y Zhou
- Scientific Affairs, Color Genomics, Burlingame, California
| | - Scott Topper
- Clinical Genomics, Color Genomics, Burlingame, California
| | - Gilad Mishne
- Data Science, Color Genomics, Burlingame, California
| |
Collapse
|