1
|
Yates J, Van Allen EM. New horizons at the interface of artificial intelligence and translational cancer research. Cancer Cell 2025; 43:708-727. [PMID: 40233719 PMCID: PMC12007700 DOI: 10.1016/j.ccell.2025.03.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2025] [Revised: 03/04/2025] [Accepted: 03/12/2025] [Indexed: 04/17/2025]
Abstract
Artificial intelligence (AI) is increasingly being utilized in cancer research as a computational strategy for analyzing multiomics datasets. Advances in single-cell and spatial profiling technologies have contributed significantly to our understanding of tumor biology, and AI methodologies are now being applied to accelerate translational efforts, including target discovery, biomarker identification, patient stratification, and therapeutic response prediction. Despite these advancements, the integration of AI into clinical workflows remains limited, presenting both challenges and opportunities. This review discusses AI applications in multiomics analysis and translational oncology, emphasizing their role in advancing biological discoveries and informing clinical decision-making. Key areas of focus include cellular heterogeneity, tumor microenvironment interactions, and AI-aided diagnostics. Challenges such as reproducibility, interpretability of AI models, and clinical integration are explored, with attention to strategies for addressing these hurdles. Together, these developments underscore the potential of AI and multiomics to enhance precision oncology and contribute to advancements in cancer care.
Collapse
Affiliation(s)
- Josephine Yates
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Institute for Machine Learning, Department of Computer Science, ETH Zürich, Zurich, Switzerland; ETH AI Center, ETH Zurich, Zurich, Switzerland; Swiss Institute for Bioinformatics (SIB), Lausanne, Switzerland
| | - Eliezer M Van Allen
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA; Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA; Division of Medical Sciences, Harvard University, Boston, MA, USA; Parker Institute for Cancer Immunotherapy, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
2
|
Arnal Segura M, Bini G, Krithara A, Paliouras G, Tartaglia GG. Machine Learning Methods for Classifying Multiple Sclerosis and Alzheimer's Disease Using Genomic Data. Int J Mol Sci 2025; 26:2085. [PMID: 40076709 PMCID: PMC11900513 DOI: 10.3390/ijms26052085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Revised: 02/22/2025] [Accepted: 02/24/2025] [Indexed: 03/14/2025] Open
Abstract
Complex diseases pose challenges in prediction due to their multifactorial and polygenic nature. This study employed machine learning (ML) to analyze genomic data from the UK Biobank, aiming to predict the genomic predisposition to complex diseases like multiple sclerosis (MS) and Alzheimer's disease (AD). We tested logistic regression (LR), ensemble tree methods, and deep learning models for this purpose. LR displayed remarkable stability across various subsets of data, outshining deep learning approaches, which showed greater variability in performance. Additionally, ML methods demonstrated an ability to maintain optimal performance despite correlated genomic features due to linkage disequilibrium. When comparing the performance of polygenic risk score (PRS) with ML methods, PRS consistently performed at an average level. By employing explainability tools in the ML models of MS, we found that the results confirmed the polygenicity of this disease. The highest-prioritized genomic variants in MS were identified as expression or splicing quantitative trait loci located in non-coding regions within or near genes associated with the immune response, with a prevalence of human leukocyte antigen (HLA) gene annotations. Our findings shed light on both the potential and the challenges of employing ML to capture complex genomic patterns, paving the way for improved predictive models.
Collapse
Affiliation(s)
- Magdalena Arnal Segura
- Centre for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen, 83, 16152 Genova, Italy (G.B.)
- Department of Biology ‘Charles Darwin’, Sapienza University of Rome, P.le A. Moro 5, 00185 Rome, Italy
| | - Giorgio Bini
- Centre for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen, 83, 16152 Genova, Italy (G.B.)
- Department of Physics, University of Genova, Via Dodecaneso 33, 16146 Genova, Italy
| | - Anastasia Krithara
- Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”, 15341 Athens, Greece; (A.K.); (G.P.)
| | - Georgios Paliouras
- Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”, 15341 Athens, Greece; (A.K.); (G.P.)
| | - Gian Gaetano Tartaglia
- Centre for Human Technologies, Istituto Italiano di Tecnologia, Via Enrico Melen, 83, 16152 Genova, Italy (G.B.)
- Department of Biology ‘Charles Darwin’, Sapienza University of Rome, P.le A. Moro 5, 00185 Rome, Italy
| |
Collapse
|
3
|
Wang J, Chai J, Chen L, Zhang T, Long X, Diao S, Chen D, Guo Z, Tang G, Wu P. Enhancing Genomic Prediction Accuracy of Reproduction Traits in Rongchang Pigs Through Machine Learning. Animals (Basel) 2025; 15:525. [PMID: 40003007 PMCID: PMC11852217 DOI: 10.3390/ani15040525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2025] [Revised: 02/02/2025] [Accepted: 02/10/2025] [Indexed: 02/27/2025] Open
Abstract
The increasing volume of genome sequencing data presents challenges for traditional genome-wide prediction methods in handling large datasets. Machine learning (ML) techniques, which can process high-dimensional data, offer promising solutions. This study aimed to find a genome-wide prediction method for local pig breeds, using 10 datasets with varying SNP densities derived from imputed sequencing data of 515 Rongchang pigs and the Pig QTL database. Three reproduction traits-litter weight, total number of piglets born, and number of piglets born alive-were predicted using six traditional methods and five ML methods, including kernel ridge regression, random forest, Gradient Boosting Decision Tree (GBDT), Light Gradient Boosting Machine, and Adaboost. The methods' efficacy was evaluated using fivefold cross-validation and independent tests. The predictive performance of both traditional and ML methods initially increased with SNP density, peaking at 800-900 k SNPs. ML methods outperformed traditional ones, showing improvements of 0.4-4.1%. The integration of GWAS and the Pig QTL database enhanced ML robustness. ML models exhibited superior generalizability, with high correlation coefficients (0.935-0.998) between cross-validation and independent test results. GBDT and random forest showed high computational efficiency, making them promising methods for genomic prediction in livestock breeding.
Collapse
Affiliation(s)
- Junge Wang
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China; (J.W.); (D.C.)
| | - Jie Chai
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Li Chen
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Tinghuan Zhang
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Xi Long
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Shuqi Diao
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Dong Chen
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China; (J.W.); (D.C.)
| | - Zongyi Guo
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| | - Guoqing Tang
- Farm Animal Genetic Resources Exploration and Innovation Key Laboratory of Sichuan Province, Sichuan Agricultural University, Chengdu 611130, China; (J.W.); (D.C.)
| | - Pingxian Wu
- Chongqing Academy of Animal Sciences, Chongqing 402460, China; (J.C.); (L.C.); (T.Z.); (X.L.); (S.D.); (Z.G.)
- National Center of Technology Innovation for Pigs, Chongqing 402460, China
| |
Collapse
|
4
|
Beccaria R, Lazzeri A, Tiana G. Predicting the Binding of Small Molecules to Proteins through Invariant Representation of the Molecular Structure. J Chem Inf Model 2024; 64:6758-6767. [PMID: 39197011 DOI: 10.1021/acs.jcim.4c00752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/30/2024]
Abstract
We present a computational scheme for predicting the ligands that bind to a pocket of a known structure. It is based on the generation of a general abstract representation of the molecules, which is invariant to rotations, translations, and permutations of atoms, and has some degree of isometry with the space of conformations. We use these representations to train a nondeep machine learning algorithm to classify the binding between pockets and molecule pairs and show that this approach has a better generalization capability than existing methods.
Collapse
Affiliation(s)
- R Beccaria
- Department of Physics, University of Milano, via Celoria 16, 20133 Milano, Italy
- Heidelberg Institute for Theoretical Studies, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
- Faculty of Physics, Heidelberg University, Im Neuenheimer Feld 227, 69120 Heidelberg, Germany
| | - A Lazzeri
- Department of Physics, University of Milano, via Celoria 16, 20133 Milano, Italy
| | - G Tiana
- Department of Physics, University of Milano, via Celoria 16, 20133 Milano, Italy
- INFN, via Celoria 16, 20133 Milano, Italy
| |
Collapse
|
5
|
Wang H, Zeng J, Dai R, Wang Z. Understanding Rejection Mechanisms of Trace Organic Contaminants by Polyamide Membranes via Data-Knowledge Codriven Machine Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:5878-5888. [PMID: 38498471 DOI: 10.1021/acs.est.3c08523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/20/2024]
Abstract
Data-driven machine learning (ML) provides a promising approach to understanding and predicting the rejection of trace organic contaminants (TrOCs) by polyamide (PA). However, various confounding variables, coupled with data scarcity, restrict the direct application of data-driven ML. In this study, we developed a data-knowledge codriven ML model via domain-knowledge embedding and explored its application in comprehending TrOC rejection by PA membranes. Domain-knowledge embedding enhanced both the predictive performance and the interpretability of the ML model. The contribution of key mechanisms, including size exclusion, charge effect, hydrophobic interaction, etc., that dominate the rejections of the three TrOC categories (neutral hydrophilic, neutral hydrophobic, and charged TrOCs) was quantified. Log D and molecular charge emerge as key factors contributing to the discernible variations in the rejection among the three TrOC categories. Furthermore, we quantitatively compared the TrOC rejection mechanisms between nanofiltration (NF) and reverse osmosis (RO) PA membranes. The charge effect and hydrophobic interactions possessed higher weights for NF to reject TrOCs, while the size exclusion in RO played a more important role. This study demonstrated the effectiveness of the data-knowledge codriven ML method in understanding TrOC rejection by PA membranes, providing a methodology to formulate a strategy for targeted TrOC removal.
Collapse
Affiliation(s)
- Hejia Wang
- State Key Laboratory of Pollution Control and Resource Reuse, Shanghai Institute of Pollution Control and Ecological Security, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| | - Jin Zeng
- School of Software Engineering, Tongji University, Shanghai 201804, China
| | - Ruobin Dai
- State Key Laboratory of Pollution Control and Resource Reuse, Shanghai Institute of Pollution Control and Ecological Security, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| | - Zhiwei Wang
- State Key Laboratory of Pollution Control and Resource Reuse, Shanghai Institute of Pollution Control and Ecological Security, School of Environmental Science and Engineering, Tongji University, Shanghai 200092, China
| |
Collapse
|