1
|
Lotter W, Hassett MJ, Schultz N, Kehl KL, Van Allen EM, Cerami E. Artificial Intelligence in Oncology: Current Landscape, Challenges, and Future Directions. Cancer Discov 2024; 14:711-726. [PMID: 38597966 PMCID: PMC11131133 DOI: 10.1158/2159-8290.cd-23-1199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/29/2024] [Accepted: 02/28/2024] [Indexed: 04/11/2024]
Abstract
Artificial intelligence (AI) in oncology is advancing beyond algorithm development to integration into clinical practice. This review describes the current state of the field, with a specific focus on clinical integration. AI applications are structured according to cancer type and clinical domain, focusing on the four most common cancers and tasks of detection, diagnosis, and treatment. These applications encompass various data modalities, including imaging, genomics, and medical records. We conclude with a summary of existing challenges, evolving solutions, and potential future directions for the field. SIGNIFICANCE AI is increasingly being applied to all aspects of oncology, where several applications are maturing beyond research and development to direct clinical integration. This review summarizes the current state of the field through the lens of clinical translation along the clinical care continuum. Emerging areas are also highlighted, along with common challenges, evolving solutions, and potential future directions for the field.
Collapse
Affiliation(s)
- William Lotter
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Michael J. Hassett
- Harvard Medical School, Boston, MA, USA
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nikolaus Schultz
- Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center; New York, NY, USA
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Kenneth L. Kehl
- Harvard Medical School, Boston, MA, USA
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Eliezer M. Van Allen
- Harvard Medical School, Boston, MA, USA
- Division of Population Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ethan Cerami
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
2
|
Ergun MA, Cinal O, Bakışlı B, Emül AA, Baysan M. COSAP: Comparative Sequencing Analysis Platform. BMC Bioinformatics 2024; 25:130. [PMID: 38532317 DOI: 10.1186/s12859-024-05756-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/20/2024] [Indexed: 03/28/2024] Open
Abstract
BACKGROUND Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and compared to reference genome for variant identification. These operations should be done with computers due to the size and complexity of the data. The need for analysis software resulted in many programs for mapping, variant calling and annotation steps. Currently, most programs are either expensive enterprise software with proprietary code which makes access and verification very difficult or open-access programs that are mostly based on command-line operations without user interfaces and extensive documentation. Moreover, a high level of disagreement is observed among popular mapping and variant calling algorithms in multiple studies, which makes relying on a single algorithm unreliable. User-friendly open-source software tools that offer comparative analysis are an important need considering the growth of sequencing technologies. RESULTS Here, we propose Comparative Sequencing Analysis Platform (COSAP), an open-source platform that provides popular sequencing algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis and their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. COSAP is developed as a workflow management system and designed to enhance cooperation among scientists with different backgrounds. It is publicly available at https://cosap.bio and https://github.com/MBaysanLab/cosap/ . The source code of the frontend and backend services can be found at https://github.com/MBaysanLab/cosap-webapi/ and https://github.com/MBaysanLab/cosap_frontend/ respectively. All services are packed as Docker containers as well. Pipelines that combine algorithms can be customized and new algorithms can be added with minimal coding through modular structure. CONCLUSIONS COSAP simplifies and speeds up the process of DNA sequencing analyses providing commonly used algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis as well as their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. Standardized implementations of popular algorithms in a modular platform make comparisons much easier to assess the impact of alternative pipelines which is crucial in establishing reproducibility of sequencing analyses.
Collapse
Affiliation(s)
- Mehmet Arif Ergun
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Omer Cinal
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Berkant Bakışlı
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Abdullah Asım Emül
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Mehmet Baysan
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey.
| |
Collapse
|
3
|
Tomlinson B, Black RW, Patterson DJ, Torrance AW. The carbon emissions of writing and illustrating are lower for AI than for humans. Sci Rep 2024; 14:3732. [PMID: 38355820 PMCID: PMC10867074 DOI: 10.1038/s41598-024-54271-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Accepted: 02/10/2024] [Indexed: 02/16/2024] Open
Abstract
As AI systems proliferate, their greenhouse gas emissions are an increasingly important concern for human societies. In this article, we present a comparative analysis of the carbon emissions associated with AI systems (ChatGPT, BLOOM, DALL-E2, Midjourney) and human individuals performing equivalent writing and illustrating tasks. Our findings reveal that AI systems emit between 130 and 1500 times less CO2e per page of text generated compared to human writers, while AI illustration systems emit between 310 and 2900 times less CO2e per image than their human counterparts. Emissions analyses do not account for social impacts such as professional displacement, legality, and rebound effects. In addition, AI is not a substitute for all human tasks. Nevertheless, at present, the use of AI holds the potential to carry out several major activities at much lower emission levels than can humans.
Collapse
Affiliation(s)
- Bill Tomlinson
- Department of Informatics, University of California, Irvine, Irvine, CA, 92697, USA.
- School of Information Management, Victoria University of Wellington-Te Herenga Waka, Wellington, 6140, New Zealand.
| | - Rebecca W Black
- Department of Informatics, University of California, Irvine, Irvine, CA, 92697, USA
| | - Donald J Patterson
- Department of Informatics, University of California, Irvine, Irvine, CA, 92697, USA
- Department of Mathematics and Computer Science, Westmont College, Santa Barbara, CA, 93108, USA
| | - Andrew W Torrance
- School of Law, University of Kansas, Lawrence, KS, 66045, USA
- Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, 02142, USA
| |
Collapse
|
4
|
Capasso M, Brignole C, Lasorsa VA, Bensa V, Cantalupo S, Sebastiani E, Quattrone A, Ciampi E, Avitabile M, Sementa AR, Mazzocco K, Cafferata B, Gaggero G, Vellone VG, Cilli M, Calarco E, Giusto E, Perri P, Aveic S, Fruci D, Tondo A, Luksch R, Mura R, Rabusin M, De Leonardis F, Cellini M, Coccia P, Iolascon A, Corrias MV, Conte M, Garaventa A, Amoroso L, Ponzoni M, Pastorino F. From the identification of actionable molecular targets to the generation of faithful neuroblastoma patient-derived preclinical models. J Transl Med 2024; 22:151. [PMID: 38351008 PMCID: PMC10863144 DOI: 10.1186/s12967-024-04954-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 02/03/2024] [Indexed: 02/16/2024] Open
Abstract
BACKGROUND Neuroblastoma (NB) represents the most frequent and aggressive form of extracranial solid tumor of infants. Although the overall survival of patients with NB has improved in the last years, more than 50% of high-risk patients still undergo a relapse. Thus, in the era of precision/personalized medicine, the need for high-risk NB patient-specific therapies is urgent. METHODS Within the PeRsonalizEd Medicine (PREME) program, patient-derived NB tumors and bone marrow (BM)-infiltrating NB cells, derived from either iliac crests or tumor bone lesions, underwent to histological and to flow cytometry immunophenotyping, respectively. BM samples containing a NB cells infiltration from 1 to 50 percent, underwent to a subsequent NB cells enrichment using immune-magnetic manipulation. Then, NB samples were used for the identification of actionable targets and for the generation of 3D/tumor-spheres and Patient-Derived Xenografts (PDX) and Cell PDX (CPDX) preclinical models. RESULTS Eighty-four percent of NB-patients showed potentially therapeutically targetable somatic alterations (including point mutations, copy number variations and mRNA over-expression). Sixty-six percent of samples showed alterations, graded as "very high priority", that are validated to be directly targetable by an approved drug or an investigational agent. A molecular targeted therapy was applied for four patients, while a genetic counseling was suggested to two patients having one pathogenic germline variant in known cancer predisposition genes. Out of eleven samples implanted in mice, five gave rise to (C)PDX, all preserved in a local PDX Bio-bank. Interestingly, comparing all molecular alterations and histological and immunophenotypic features among the original patient's tumors and PDX/CPDX up to second generation, a high grade of similarity was observed. Notably, also 3D models conserved immunophenotypic features and molecular alterations of the original tumors. CONCLUSIONS PREME confirms the possibility of identifying targetable genomic alterations in NB, indeed, a molecular targeted therapy was applied to four NB patients. PREME paves the way to the creation of clinically relevant repositories of faithful patient-derived (C)PDX and 3D models, on which testing precision, NB standard-of-care and experimental medicines.
Collapse
Affiliation(s)
- Mario Capasso
- Department of Medical Biotechnology, University of Naples Federico II, 80138, Naples, Italy
- CEINGE Advanced Biotecnology, 80138, Naples, Italy
| | - Chiara Brignole
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Via G. Gaslini 5, 16147, Genoa, Italy
| | | | - Veronica Bensa
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Via G. Gaslini 5, 16147, Genoa, Italy
| | - Sueva Cantalupo
- Department of Medical Biotechnology, University of Naples Federico II, 80138, Naples, Italy
- CEINGE Advanced Biotecnology, 80138, Naples, Italy
| | | | | | - Eleonora Ciampi
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Via G. Gaslini 5, 16147, Genoa, Italy
| | - Marianna Avitabile
- Department of Medical Biotechnology, University of Naples Federico II, 80138, Naples, Italy
- CEINGE Advanced Biotecnology, 80138, Naples, Italy
| | - Angela R Sementa
- Pathological Anatomy, IRCCS Istituto Giannina Gaslini, 16147, Genoa, Italy
| | - Katia Mazzocco
- Pathological Anatomy, IRCCS Istituto Giannina Gaslini, 16147, Genoa, Italy
| | - Barbara Cafferata
- Pathological Anatomy, IRCCS Istituto Giannina Gaslini, 16147, Genoa, Italy
| | - Gabriele Gaggero
- Pathological Anatomy, IRCCS Istituto Giannina Gaslini, 16147, Genoa, Italy
| | - Valerio G Vellone
- Pathological Anatomy, IRCCS Istituto Giannina Gaslini, 16147, Genoa, Italy
| | - Michele Cilli
- Animal Facility, IRCCS Policlinico San Martino, 16100, Genoa, Italy
| | - Enzo Calarco
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Via G. Gaslini 5, 16147, Genoa, Italy
| | - Elena Giusto
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Via G. Gaslini 5, 16147, Genoa, Italy
| | - Patrizia Perri
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Via G. Gaslini 5, 16147, Genoa, Italy
| | - Sanja Aveic
- Pediatric Research Institute Città Della Speranza, 35127, Padua, Italy
| | - Doriana Fruci
- Department of Emato-Oncology, Bambino Gesù Children's Hospital, 00146, -Rome, Italy
| | - Annalisa Tondo
- Department of Emato-Oncology, Anna Meyer Children's Hospital, 50139, Florence, Italy
| | - Roberto Luksch
- Emato-Oncology Unit, Fondazione IRCCS Istituto Nazionale Dei Tumori, 20133, Milan, Italy
| | - Rossella Mura
- Emato-Oncology Unit, Azienda Ospedaliera Brotzu, 09047, Cagliari, Italy
| | - Marco Rabusin
- Pediatric Department, Institute for Maternal and Child Health, IRCCS Burlo Garofolo, 34137, Trieste, Italy
| | | | - Monica Cellini
- Emato-Oncology Unit, University-Hospital Polyclinic of Modena, 41124, Modena, Italy
| | - Paola Coccia
- University-Hospital of Marche, Presidio Ospedaliero "G. Salesi", 60126, Ancona, Italy
| | - Achille Iolascon
- Department of Medical Biotechnology, University of Naples Federico II, 80138, Naples, Italy
- CEINGE Advanced Biotecnology, 80138, Naples, Italy
| | - Maria V Corrias
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Via G. Gaslini 5, 16147, Genoa, Italy
| | - Massimo Conte
- Clinical Oncology Unit, IRCCS Istituto Giannina Gaslini, 16147, -Genoa, Italy
| | - Alberto Garaventa
- Clinical Oncology Unit, IRCCS Istituto Giannina Gaslini, 16147, -Genoa, Italy
| | - Loredana Amoroso
- Clinical Oncology Unit, IRCCS Istituto Giannina Gaslini, 16147, -Genoa, Italy
| | - Mirco Ponzoni
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Via G. Gaslini 5, 16147, Genoa, Italy.
| | - Fabio Pastorino
- Laboratory of Experimental Therapies in Oncology, IRCCS Istituto Giannina Gaslini, Via G. Gaslini 5, 16147, Genoa, Italy
| |
Collapse
|
5
|
Muench A, Teichmann D, Spille D, Kuzman P, Perez E, May SA, Mueller WC, Kombos T, Nazari-Dehkordi S, Onken J, Vajkoczy P, Ntoulias G, Bettencourt C, von Deimling A, Paulus W, Heppner FL, Koch A, Capper D, Kaul D, Thomas C, Schweizer L. A Novel Type of IDH-wildtype Glioma Characterized by Gliomatosis Cerebri-like Growth Pattern, TERT Promoter Mutation, and Distinct Epigenetic Profile. Am J Surg Pathol 2023; 47:1364-1375. [PMID: 37737691 DOI: 10.1097/pas.0000000000002118] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/23/2023]
Abstract
Diffuse gliomas in adults encompass a heterogenous group of central nervous system neoplasms. In recent years, extensive (epi-)genomic profiling has identified several glioma subgroups characterized by distinct molecular characteristics, most importantly IDH1/2 and histone H3 mutations. A group of 16 diffuse gliomas classified as "adult-type diffuse high-grade glioma, IDH-wildtype, subtype F (HGG-F)" was identified by the DKFZ v12.5 Brain Tumor Classifier . Histopathologic characterization, exome sequencing, and review of clinical data was performed in all cases. Based on unsupervised t -distributed stochastic neighbor embedding and clustering analysis of genome-wide DNA methylation data, HGG-F shows distinct epigenetic profiles separate from established central nervous system tumors. Exome sequencing demonstrated frequent TERT promoter (12/15 cases), PIK3R1 (11/16), and TP53 mutations (5/16). Radiologic characteristics were reminiscent of gliomatosis cerebri in 9/14 cases (64%). Histopathologically, most cases were classified as diffuse gliomas (7/16, 44%) or were suspicious for the infiltration zone of a diffuse glioma (5/16, 31%). None of the cases demonstrated microvascular proliferation or necrosis. Outcome of 14 patients with follow-up data was better compared to IDH-wildtype glioblastomas with a median progression-free survival of 58 months and overall survival of 74 months (both P <0.0001). Our series represents a novel type of adult-type diffuse glioma with distinct molecular and clinical features. Importantly, we provide evidence that TERT promoter mutations in diffuse gliomas without further morphologic or molecular signs of high-grade glioma should be interpreted in the context of the clinicoradiologic presentation as well as epigenetic profile and may not be suitable as a standalone marker for glioblastoma, IDH-wildtype.
Collapse
Affiliation(s)
- Amos Muench
- Edinger Institute, Institute of Neurology, University of Frankfurt am Main
| | | | | | - Peter Kuzman
- Institute of Neuropathology, University Hospital Leipzig, Leipzig
| | | | - Sven-Axel May
- Department of Neurosurgery, Klinikum Chemnitz, Chemnitz
| | - Wolf C Mueller
- Institute of Neuropathology, University Hospital Leipzig, Leipzig
| | | | | | | | | | - Georgios Ntoulias
- Department of Neurosurgery, Schlosspark-Klinik Charlottenburg, Berlin
| | - Conceição Bettencourt
- Queen Square Brain Bank, UCL Queen Square Institute of Neurology, University College London, London, UK
| | | | - Werner Paulus
- Institute of Neuropathology, University Hospital Münster, Münster
| | - Frank L Heppner
- Departments of Neuropathology
- Cluster of Excellence, NeuroCure
- German Center for Neurodegenerative Diseases (DZNE)
- German Cancer Consortium (DKTK), Partner Site Berlin, German Cancer Research Center (DKFZ)
| | - Arend Koch
- Departments of Neuropathology
- German Cancer Consortium (DKTK), Partner Site Berlin, German Cancer Research Center (DKFZ)
| | - David Capper
- Departments of Neuropathology
- German Cancer Consortium (DKTK), Partner Site Berlin, German Cancer Research Center (DKFZ)
| | - David Kaul
- Radiation Oncology and Radiotherapy, Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin
| | - Christian Thomas
- Institute of Neuropathology, University Hospital Münster, Münster
| | - Leonille Schweizer
- Edinger Institute, Institute of Neurology, University of Frankfurt am Main
- Frankfurt Cancer Institute (FCI), Frankfurt am Main
- Departments of Neuropathology
- German Cancer Consortium (DKTK), Partner Site Berlin, German Cancer Research Center (DKFZ)
- German Cancer Consortium (DKTK), Partner Site Frankfurt/Mainz, German Cancer Research Center (DKFZ), Heidelberg, Germany
| |
Collapse
|
6
|
El Naqa I, Karolak A, Luo Y, Folio L, Tarhini AA, Rollison D, Parodi K. Translation of AI into oncology clinical practice. Oncogene 2023; 42:3089-3097. [PMID: 37684407 DOI: 10.1038/s41388-023-02826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 08/23/2023] [Accepted: 08/25/2023] [Indexed: 09/10/2023]
Abstract
Artificial intelligence (AI) is a transformative technology that is capturing popular imagination and can revolutionize biomedicine. AI and machine learning (ML) algorithms have the potential to break through existing barriers in oncology research and practice such as automating workflow processes, personalizing care, and reducing healthcare disparities. Emerging applications of AI/ML in the literature include screening and early detection of cancer, disease diagnosis, response prediction, prognosis, and accelerated drug discovery. Despite this excitement, only few AI/ML models have been properly validated and fewer have become regulated products for routine clinical use. In this review, we highlight the main challenges impeding AI/ML clinical translation. We present different clinical use cases from the domains of radiology, radiation oncology, immunotherapy, and drug discovery in oncology. We dissect the unique challenges and opportunities associated with each of these cases. Finally, we summarize the general requirements for successful AI/ML implementation in the clinic, highlighting specific examples and points of emphasis including the importance of multidisciplinary collaboration of stakeholders, role of domain experts in AI augmentation, transparency of AI/ML models, and the establishment of a comprehensive quality assurance program to mitigate risks of training bias and data drifts, all culminating toward safer and more beneficial AI/ML applications in oncology labs and clinics.
Collapse
Affiliation(s)
- Issam El Naqa
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, 33612, USA.
| | - Aleksandra Karolak
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, 33612, USA
| | - Yi Luo
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, 33612, USA
| | - Les Folio
- Diagnostic Imaging & Interventional Radiology, Moffitt Cancer Center, Tampa, FL, 33612, USA
| | - Ahmad A Tarhini
- Cutaneous Oncology and Immunology, Moffitt Cancer Center, Tampa, FL, 33612, USA
| | - Dana Rollison
- Department of Cancer Epidemiology, Moffitt Cancer Center, Tampa, FL, 33612, USA
| | - Katia Parodi
- Department of Medical Physics, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
7
|
Yao H, Li H, Wang J, Wu T, Ning W, Diao K, Wu C, Wang G, Tao Z, Zhao X, Chen J, Sun X, Liu XS. Copy number alteration features in pan-cancer homologous recombination deficiency prediction and biology. Commun Biol 2023; 6:527. [PMID: 37193789 DOI: 10.1038/s42003-023-04901-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 05/02/2023] [Indexed: 05/18/2023] Open
Abstract
Homologous recombination deficiency (HRD) renders cancer cells vulnerable to unrepaired double-strand breaks and is an important therapeutic target as exemplified by the clinical efficacy of poly ADP-ribose polymerase (PARP) inhibitors as well as the platinum chemotherapy drugs applied to HRD patients. However, it remains a challenge to predict HRD status precisely and economically. Copy number alteration (CNA), as a pervasive trait of human cancers, can be extracted from a variety of data sources, including whole genome sequencing (WGS), SNP array, and panel sequencing, and thus can be easily applied clinically. Here we systematically evaluate the predictive performance of various CNA features and signatures in HRD prediction and build a gradient boosting machine model (HRDCNA) for pan-cancer HRD prediction based on these CNA features. CNA features BP10MB[1] (The number of breakpoints per 10MB of DNA is 1) and SS[ > 7 & <=8] (The log10-based size of segments is greater than 7 and less than or equal to 8) are identified as the most important features in HRD prediction. HRDCNA suggests the biallelic inactivation of BRCA1, BRCA2, PALB2, RAD51C, RAD51D, and BARD1 as the major genetic basis for human HRD, and may also be applied to effectively validate the pathogenicity of BRCA1/2 variants of uncertain significance (VUS). Together, this study provides a robust tool for cost-effective HRD prediction and also demonstrates the applicability of CNA features and signatures in cancer precision medicine.
Collapse
Affiliation(s)
- Huizi Yao
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Huimin Li
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinyu Wang
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
- Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Tao Wu
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Wei Ning
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Kaixuan Diao
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Chenxu Wu
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Guangshuai Wang
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Ziyu Tao
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xiangyu Zhao
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Jing Chen
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xiaoqin Sun
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China
| | - Xue-Song Liu
- School of Life Science and Technology, ShanghaiTech University, Shanghai, China.
- Shanghai Clinical Research and Trial Center, Shanghai, China.
| |
Collapse
|
8
|
Ren Z, Li Q, Cao K, Li MM, Zhou Y, Wang K. Model performance and interpretability of semi-supervised generative adversarial networks to predict oncogenic variants with unlabeled data. BMC Bioinformatics 2023; 24:43. [PMID: 36759776 PMCID: PMC9909865 DOI: 10.1186/s12859-023-05141-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 01/05/2023] [Indexed: 02/11/2023] Open
Abstract
BACKGROUND It remains an important challenge to predict the functional consequences or clinical impacts of genetic variants in human diseases, such as cancer. An increasing number of genetic variants in cancer have been discovered and documented in public databases such as COSMIC, but the vast majority of them have no functional or clinical annotations. Some databases, such as CiVIC are available with manual annotation of functional mutations, but the size of the database is small due to the use of human annotation. Since the unlabeled data (millions of variants) typically outnumber labeled data (thousands of variants), computational tools that take advantage of unlabeled data may improve prediction accuracy. RESULT To leverage unlabeled data to predict functional importance of genetic variants, we introduced a method using semi-supervised generative adversarial networks (SGAN), incorporating features from both labeled and unlabeled data. Our SGAN model incorporated features from clinical guidelines and predictive scores from other computational tools. We also performed comparative analysis to study factors that influence prediction accuracy, such as using different algorithms, types of features, and training sample size, to provide more insights into variant prioritization. We found that SGAN can achieve competitive performances with small labeled training samples by incorporating unlabeled samples, which is a unique advantage compared to traditional machine learning methods. We also found that manually curated samples can achieve a more stable predictive performance than publicly available datasets. CONCLUSIONS By incorporating much larger samples of unlabeled data, the SGAN method can improve the ability to detect novel oncogenic variants, compared to other machine-learning algorithms that use only labeled datasets. SGAN can be potentially used to predict the pathogenicity of more complex variants such as structural variants or non-coding variants, with the availability of more training samples and informative features.
Collapse
Affiliation(s)
- Zilin Ren
- grid.239552.a0000 0001 0680 8770Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Quan Li
- grid.239552.a0000 0001 0680 8770Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA ,grid.17063.330000 0001 2157 2938Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, ON M5G2C1 Canada
| | - Kajia Cao
- grid.239552.a0000 0001 0680 8770Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Marilyn M. Li
- grid.239552.a0000 0001 0680 8770Division of Genomic Diagnostics, Department of Pathology and Laboratory Medicine, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA ,grid.25879.310000 0004 1936 8972Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA
| | - Yunyun Zhou
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
9
|
Anne A, Kumar L, Salavadi RK, Anand PS, Nuguri S, Bindra S, Reddy KVR, Gummanur MR, Mohan KN. Somatic Variants and Exon-Level Copy Number Changes in Five Hyperplastic Oral Leukoplakias. Cytogenet Genome Res 2023; 162:560-569. [PMID: 36630923 DOI: 10.1159/000528890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 12/29/2022] [Indexed: 01/12/2023] Open
Abstract
Oral leukoplakia (OL), an oral potentially malignant disorder, begins with a hyperplastic/hyperkeratotic stage at which no genome-scale somatic single nucleotide variant profiles have been described so far. We performed exome sequencing of five cases at this stage with no evidence of dysplasia to identify genetic alterations (exon-level copy number alterations, indels, and single nucleotide variants), their association with transcript levels, and relationship with oral cancer susceptibility. Pathway enrichment analysis of genes associated with tobacco chewing and age-related mutation signatures, transcripts with variants predicted to be functionally damaging and those with significantly altered levels all indicated the involvement of focal adhesion, ECM-receptor interactions, regulation of cytoskeleton, and DNA repair. Two novel mutations identified in FAT1 tumor suppressor gene were associated with decreased transcript levels. In addition, 16 expressed cancer driver genes contained functionally damaging variants. Many of the affected genes were also reported in dysplastic OL lesions. The presence of variants in cancer driver genes and those shared with oral dysplasias possibly provides a basis for further progression and increased susceptibility to oral cancer.
Collapse
Affiliation(s)
- Anuhya Anne
- Molecular Biology and Genetics Laboratory, BITS Pilani Hyderabad Campus, Hyderabad, India
- Centre for Human Disease Research, BITS Pilani Hyderabad Campus, Hyderabad, India
| | - Lov Kumar
- Computer Science and Information Systems, BITS Pilani Hyderabad Campus, Hyderabad, India
| | | | | | | | | | | | | | - Kommu N Mohan
- Molecular Biology and Genetics Laboratory, BITS Pilani Hyderabad Campus, Hyderabad, India
- Centre for Human Disease Research, BITS Pilani Hyderabad Campus, Hyderabad, India
| |
Collapse
|
10
|
Parallel functional annotation of cancer-associated missense mutations in histone methyltransferases. Sci Rep 2022; 12:18487. [PMID: 36323913 PMCID: PMC9630446 DOI: 10.1038/s41598-022-23229-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 10/27/2022] [Indexed: 12/03/2022] Open
Abstract
Using exome sequencing for biomarker discovery and precision medicine requires connecting nucleotide-level variation with functional changes in encoded proteins. However, for functionally annotating the thousands of cancer-associated missense mutations, or variants of uncertain significance (VUS), purifying variant proteins for biochemical and functional analysis is cost-prohibitive and inefficient. We describe parallel functional annotation (PFA) of large numbers of VUS using small cultures and crude extracts in 96-well plates. Using members of a histone methyltransferase family, we demonstrate high-throughput structural and functional annotation of cancer-associated mutations. By combining functional annotation of paralogs, we discovered two phylogenetic and clustering parameters that improve the accuracy of sequence-based functional predictions to over 90%. Our results demonstrate the value of PFA for defining oncogenic/tumor suppressor functions of histone methyltransferases as well as enhancing the accuracy of sequence-based algorithms in predicting the effects of cancer-associated mutations.
Collapse
|
11
|
Liu Y, Yeung WSB, Chiu PCN, Cao D. Computational approaches for predicting variant impact: An overview from resources, principles to applications. Front Genet 2022; 13:981005. [PMID: 36246661 PMCID: PMC9559863 DOI: 10.3389/fgene.2022.981005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
One objective of human genetics is to unveil the variants that contribute to human diseases. With the rapid development and wide use of next-generation sequencing (NGS), massive genomic sequence data have been created, making personal genetic information available. Conventional experimental evidence is critical in establishing the relationship between sequence variants and phenotype but with low efficiency. Due to the lack of comprehensive databases and resources which present clinical and experimental evidence on genotype-phenotype relationship, as well as accumulating variants found from NGS, different computational tools that can predict the impact of the variants on phenotype have been greatly developed to bridge the gap. In this review, we present a brief introduction and discussion about the computational approaches for variant impact prediction. Following an innovative manner, we mainly focus on approaches for non-synonymous variants (nsSNVs) impact prediction and categorize them into six classes. Their underlying rationale and constraints, together with the concerns and remedies raised from comparative studies are discussed. We also present how the predictive approaches employed in different research. Although diverse constraints exist, the computational predictive approaches are indispensable in exploring genotype-phenotype relationship.
Collapse
Affiliation(s)
- Ye Liu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - William S. B. Yeung
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Philip C. N. Chiu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- *Correspondence: Philip C. N. Chiu, ; Dandan Cao,
| | - Dandan Cao
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- *Correspondence: Philip C. N. Chiu, ; Dandan Cao,
| |
Collapse
|