1
|
Felici A, Peduzzi G, Pellungrini R, Campa D. Artificial intelligence to predict cancer risk, are we there yet? A comprehensive review across cancer types. Eur J Cancer 2025; 222:115440. [PMID: 40273730 DOI: 10.1016/j.ejca.2025.115440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2025] [Accepted: 03/25/2025] [Indexed: 04/26/2025]
Abstract
Cancer remains the second leading cause of death worldwide, representing a substantial challenge to global health. Although traditional risk prediction models have played a crucial role in epidemiology of several cancer types, they have limitations especially in the ability to process complex and multidimensional data. In contrast, artificial intelligence (AI) approaches represent a promising solution to overcome this limitation. AI techniques have the potential to identify complex patterns and relationships in data that traditional methods might overlook, making them especially useful for handling large and heterogeneous datasets analysed in cancer research. This review first examines the current state of the art of AI techniques, highlighting their differences and suitability for various data types. Then, offers a comprehensive analysis of the literature, focusing on the application of AI approaches in nineteen cancer types (bladder cancer, breast cancer, cervical cancer, colorectal cancer, endometrial cancer, esophageal cancer, gastric cancer, gynaecological cancers, head and neck cancer, haematological cancers, kidney cancer, liver cancer, lung cancer, melanoma, ovarian cancer, pancreatic cancer, prostate cancer, thyroid cancer and overall cancer), evaluating the models, metrics, and exposure variables used. Finally, the review discusses the application of AI in the clinical practice, along with an assessment of its potential limitations and future directions.
Collapse
Affiliation(s)
- Alessio Felici
- Department of Biology, University of Pisa, Via Luca Ghini, 13, Pisa 56126, Italy
| | - Giulia Peduzzi
- Department of Biology, University of Pisa, Via Luca Ghini, 13, Pisa 56126, Italy
| | - Roberto Pellungrini
- Classe di scienze, Scuola Normale Superiore, Piazza dei Cavalieri, 7, Pisa 56126, Italy
| | - Daniele Campa
- Department of Biology, University of Pisa, Via Luca Ghini, 13, Pisa 56126, Italy.
| |
Collapse
|
2
|
Vaida M, Arumalla KK, Tatikonda PK, Popuri B, Bux RA, Tappia PS, Huang G, Haince JF, Ford WR. Identification of a Novel Biomarker Panel for Breast Cancer Screening. Int J Mol Sci 2024; 25:11835. [PMID: 39519384 PMCID: PMC11546995 DOI: 10.3390/ijms252111835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 10/25/2024] [Accepted: 10/27/2024] [Indexed: 11/16/2024] Open
Abstract
Breast cancer remains a major public health concern, and early detection is crucial for improving survival rates. Metabolomics offers the potential to develop non-invasive screening and diagnostic tools based on metabolic biomarkers. However, the inherent complexity of metabolomic datasets and the high dimensionality of biomarkers complicates the identification of diagnostically relevant features, with multiple studies demonstrating limited consensus on the specific metabolites involved. Unlike previous studies that rely on singular feature selection techniques such as Partial Least Square (PLS) or LASSO regression, this research combines supervised and unsupervised machine learning methods with random sampling strategies, offering a more robust and interpretable approach to feature selection. This study aimed to identify a parsimonious and robust set of biomarkers for breast cancer diagnosis using metabolomics data. Plasma samples from 185 breast cancer patients and 53 controls (from the Cooperative Human Tissue Network, USA) were analyzed. This study also overcomes the common issue of dataset imbalance by using propensity score matching (PSM), which ensures reliable comparisons between cancer and control groups. We employed Univariate Naïve Bayes, L2-regularized Support Vector Classifier (SVC), Principal Component Analysis (PCA), and feature engineering techniques to refine and select the most informative features. Our best-performing feature set comprised 11 biomarkers, including 9 metabolites (SM(OH) C22:2, SM C18:0, C0, C3OH, C14:2OH, C16:2OH, LysoPC a C18:1, PC aa C36:0 and Asparagine), a metabolite ratio (Kynurenine-to-Tryptophan), and 1 demographic variable (Age), achieving an area under the ROC curve (AUC) of 98%. These results demonstrate the potential for a robust, cost-effective, and non-invasive breast cancer screening and diagnostic tool, offering significant clinical value for early detection and personalized patient management.
Collapse
Affiliation(s)
- Maria Vaida
- Department of Analytics, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA; (M.V.); (K.K.A.); (P.K.T.); (B.P.); (W.R.F.)
| | - Kamala K. Arumalla
- Department of Analytics, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA; (M.V.); (K.K.A.); (P.K.T.); (B.P.); (W.R.F.)
| | - Pavan Kumar Tatikonda
- Department of Analytics, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA; (M.V.); (K.K.A.); (P.K.T.); (B.P.); (W.R.F.)
| | - Bharadwaj Popuri
- Department of Analytics, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA; (M.V.); (K.K.A.); (P.K.T.); (B.P.); (W.R.F.)
| | - Rashid A. Bux
- BioMark Diagnostics Inc., Richmond, BC V6X 2W2, Canada;
| | | | - Guoyu Huang
- BioMark Diagnostic Solutions Inc., Quebec City, QC G1P 4P5, Canada; (G.H.); (J.-F.H.)
| | - Jean-François Haince
- BioMark Diagnostic Solutions Inc., Quebec City, QC G1P 4P5, Canada; (G.H.); (J.-F.H.)
| | - W. Randolph Ford
- Department of Analytics, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA; (M.V.); (K.K.A.); (P.K.T.); (B.P.); (W.R.F.)
| |
Collapse
|
3
|
Park JW, Rhee JK. Integrative Analysis of ATAC-Seq and RNA-Seq through Machine Learning Identifies 10 Signature Genes for Breast Cancer Intrinsic Subtypes. BIOLOGY 2024; 13:799. [PMID: 39452108 PMCID: PMC11505269 DOI: 10.3390/biology13100799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2024] [Revised: 09/28/2024] [Accepted: 10/05/2024] [Indexed: 10/26/2024]
Abstract
Breast cancer is a heterogeneous disease composed of various biologically distinct subtypes, each characterized by unique molecular features. Its formation and progression involve a complex, multistep process that includes the accumulation of numerous genetic and epigenetic alterations. Although integrating RNA-seq transcriptome data with ATAC-seq epigenetic information provides a more comprehensive understanding of gene regulation and its impact across different conditions, no classification model has yet been developed for breast cancer intrinsic subtypes based on such integrative analyses. In this study, we employed machine learning algorithms to predict intrinsic subtypes through the integrative analysis of ATAC-seq and RNA-seq data. We identified 10 signature genes (CDH3, ERBB2, TYMS, GREB1, OSR1, MYBL2, FAM83D, ESR1, FOXC1, and NAT1) using recursive feature elimination with cross-validation (RFECV) and a support vector machine (SVM) based on SHAP (SHapley Additive exPlanations) feature importance. Furthermore, we found that these genes were primarily associated with immune responses, hormone signaling, cancer progression, and cellular proliferation.
Collapse
Affiliation(s)
| | - Je-Keun Rhee
- Department of Bioinformatics & Life Science, Soongsil University, Seoul 06987, Republic of Korea;
| |
Collapse
|
4
|
Bumrungthai S, Duangjit S, Passorn S, Pongpakdeesakul S, Butsri S, Janyakhantikul S. Comprehensive breast cancer risk analysis with whole exome sequencing and the prevalence of BRCA1 and ABCG2 mutations and oncogenic HPV. Biomed Rep 2024; 21:144. [PMID: 39170756 PMCID: PMC11337157 DOI: 10.3892/br.2024.1832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 07/02/2024] [Indexed: 08/23/2024] Open
Abstract
Breast cancer is the most prevalent cancer and also the leading cause of cancer death in women worldwide. A comprehensive understanding of breast cancer risk factors and their incidences is useful information for breast cancer prevention and control planning. The present study aimed to provide information on single nucleotide polymorphisms (SNPs) and copy number variations (CNVs) in breast cancer, the allele frequency of two SNPs in breast cancer-related genes BRCA1 DNA repair associated (BRCA1; rs799917) and ATP binding cassette subfamily G member 2 (ABCG2; rs2231142), and the prevalence of human papillomavirus (HPV) infections in a normal population living in Phayao Province, Northern Thailand. One breast cancer and 10 healthy samples were investigated by whole exome sequencing (WES) and compared for genetic variation. The WES data contained SNPs in genes previously implicated in breast cancer and provided data on CNVs. The allele frequencies for SNPs rs799917 and rs2231142 were also examined. The SNP genotype frequencies were 35.88% CC, 46.54% CT, and 17.58% TT for rs799917 and 33.20% CC, 46.88% CA, and 19.92% AA for rs2231142. A total of 825 human whole blood samples were examined for HPV infection by PCR, and the pooled DNA was tested for HPV infection using metagenomic sequencing. No HPV infections were detected among all 825 samples or the pooled blood samples. The incidence of breast cancer among the tested samples was estimated based on acceptable breast cancer risk factors and demographic data and was 1.47%. The present study provided data on SNPs and CNVs in breast cancer-related genes. The associations between SNPs rs2231142 and rs799917 and breast cancer should be further investigated in a case-control study since heterozygous and homozygous variants are more common. Based on the detection of HPV infection in the blood samples, HPV may not be associated with breast cancer, at least in the Northern Thai population.
Collapse
Affiliation(s)
- Sureewan Bumrungthai
- Division of Biopharmaceutical Sciences, Faculty of Pharmaceutical Sciences, Ubon Ratchathani University, Ubon Ratchathani 34190, Thailand
- Division of Microbiology and Parasitology, School of Medical Sciences, University of Phayao, Phayao 56000, Thailand
- Center for Pharmacogenomics and Clinical Translational Research, Faculty of Pharmaceutical Sciences, Ubon Ratchathani University, Ubon Ratchathani 34190, Thailand
| | - Sureewan Duangjit
- Division of Pharmaceutical Chemistry and Technology, Faculty of Pharmaceutical Sciences, Ubon Ratchathani University, Ubon Ratchathani 34190, Thailand
| | - Supaporn Passorn
- Division of Biotechnology, School of Agriculture and Natural resources, University of Phayao, Phayao 56000, Thailand
| | - Sutida Pongpakdeesakul
- Division of Biotechnology, School of Agriculture and Natural resources, University of Phayao, Phayao 56000, Thailand
| | - Siriwoot Butsri
- Division of Biopharmaceutical Sciences, Faculty of Pharmaceutical Sciences, Ubon Ratchathani University, Ubon Ratchathani 34190, Thailand
- Center for Pharmacogenomics and Clinical Translational Research, Faculty of Pharmaceutical Sciences, Ubon Ratchathani University, Ubon Ratchathani 34190, Thailand
| | - Somwang Janyakhantikul
- Division of Biopharmaceutical Sciences, Faculty of Pharmaceutical Sciences, Ubon Ratchathani University, Ubon Ratchathani 34190, Thailand
- Center for Pharmacogenomics and Clinical Translational Research, Faculty of Pharmaceutical Sciences, Ubon Ratchathani University, Ubon Ratchathani 34190, Thailand
| |
Collapse
|
5
|
Valencia-Moreno JM, Gonzalez-Fraga JA, Gutierrez-Lopez E, Estrada-Senti V, Cantero-Ronquillo HA, Kober V. Breast cancer risk estimation with intelligent algorithms and risk factors for Cuban women. Comput Biol Med 2024; 179:108818. [PMID: 38991318 DOI: 10.1016/j.compbiomed.2024.108818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 07/13/2024]
Abstract
Breast cancer is the most common malignant neoplasm and the leading cause of cancer mortality among women globally. Current prediction models based on risk factors are inefficient in specific populations, so an appropriate and calibrated breast cancer prediction model for Cuban women is essential. This article proposes a conceptual model for breast cancer risk estimation for Cuban women using machine learning algorithms and risk factors. The model has three main components: knowledge representation, risk estimation modeling, and risk predictor evaluation. Nine of the most common machine learning algorithms were used to generate risk predictors using the proposed model. Two data sources served as case studies: the first comprised data collected from Cuban women, and the second included data from US Hispanic women obtained from the Breast Cancer Surveillance Consortium dataset. The results show that the model effectively estimates breast cancer risk and could be a valuable tool for early detection of breast cancer and identification of patients at risk. According to the first experiment results, the best predictor of breast cancer risk for the Cuban female population corresponds to the Random Forest algorithm with a weighted score of 5.981, a training accuracy of 0.996 and a training AUC of 0.997. In a second experiment, it was demonstrated that the risk predictors generated by the proposed model using data from Cuban women obtained better AUC and accuracy values compared to the predictors generated by using the US Hispanic population, potentially generalizable to other Hispanic populations. Implementing this model could be an economically viable alternative to reduce the mortality rate of this type of cancer in Latin American countries such as Cuba.
Collapse
Affiliation(s)
- Jose Manuel Valencia-Moreno
- Universidad Autónoma de Baja California, Ensenada, Baja California, Mexico; Universidad de las Ciencias Informáticas, La Habana, Cuba
| | - Jose Angel Gonzalez-Fraga
- Universidad Autónoma de Baja California, Ensenada, Baja California, Mexico; Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada, Baja California, Mexico.
| | | | | | | | - Vitaly Kober
- Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada, Baja California, Mexico; Department of Mathematics, Chelyabinsk State University, Russian Federation
| |
Collapse
|
6
|
Manir SB, Deshpande P. Critical Risk Assessment, Diagnosis, and Survival Analysis of Breast Cancer. Diagnostics (Basel) 2024; 14:984. [PMID: 38786282 PMCID: PMC11119540 DOI: 10.3390/diagnostics14100984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 04/17/2024] [Indexed: 05/25/2024] Open
Abstract
Breast cancer is the most prevalent type of cancer in women. Risk factor assessment can aid in directing counseling regarding risk reduction and breast cancer surveillance. This research aims to (1) investigate the relationship between various risk factors and breast cancer incidence using the BCSC (Breast Cancer Surveillance Consortium) Risk Factor Dataset and create a prediction model for assessing the risk of developing breast cancer; (2) diagnose breast cancer using the Breast Cancer Wisconsin diagnostic dataset; and (3) analyze breast cancer survivability using the SEER (Surveillance, Epidemiology, and End Results) Breast Cancer Dataset. Applying resampling techniques on the training dataset before using various machine learning techniques can affect the performance of the classifiers. The three breast cancer datasets were examined using a variety of pre-processing approaches and classification models to assess their performance in terms of accuracy, precision, F-1 scores, etc. The PCA (principal component analysis) and resampling strategies produced remarkable results. For the BCSC Dataset, the Random Forest algorithm exhibited the best performance out of the applied classifiers, with an accuracy of 87.53%. Out of the different resampling techniques applied to the training dataset for training the Random Forest classifier, the Tomek Link exhibited the best test accuracy, at 87.47%. We compared all the models used with previously used techniques. After applying the resampling techniques, the accuracy scores of the test data decreased even if the training data accuracy increased. For the Breast Cancer Wisconsin diagnostic dataset, the K-Nearest Neighbor algorithm had the best accuracy with the original dataset test set, at 94.71%, and the PCA dataset test set exhibited 95.29% accuracy for detecting breast cancer. Using the SEER Dataset, this study also explores survival analysis, employing supervised and unsupervised learning approaches to offer insights into the variables affecting breast cancer survivability. This study emphasizes the significance of individualized approaches in the management and treatment of breast cancer by incorporating phenotypic variations and recognizing the heterogeneity of the disease. Through data-driven insights and advanced machine learning, this study contributes significantly to the ongoing efforts in breast cancer research, diagnostics, and personalized medicine.
Collapse
|
7
|
Nicolis O, De Los Angeles D, Taramasco C. A contemporary review of breast cancer risk factors and the role of artificial intelligence. Front Oncol 2024; 14:1356014. [PMID: 38699635 PMCID: PMC11063273 DOI: 10.3389/fonc.2024.1356014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 03/25/2024] [Indexed: 05/05/2024] Open
Abstract
Background Breast cancer continues to be a significant global health issue, necessitating advancements in prevention and early detection strategies. This review aims to assess and synthesize research conducted from 2020 to the present, focusing on breast cancer risk factors, including genetic, lifestyle, and environmental aspects, as well as the innovative role of artificial intelligence (AI) in prediction and diagnostics. Methods A comprehensive literature search, covering studies from 2020 to the present, was conducted to evaluate the diversity of breast cancer risk factors and the latest advances in Artificial Intelligence (AI) in this field. The review prioritized high-quality peer-reviewed research articles and meta-analyses. Results Our analysis reveals a complex interplay of genetic, lifestyle, and environmental risk factors for breast cancer, with significant variability across different populations. Furthermore, AI has emerged as a promising tool in enhancing the accuracy of breast cancer risk prediction and the personalization of prevention strategies. Conclusion The review highlights the necessity for personalized breast cancer prevention and detection approaches that account for individual risk factor profiles. It underscores the potential of AI to revolutionize these strategies, offering clear recommendations for future research directions and clinical practice improvements.
Collapse
Affiliation(s)
- Orietta Nicolis
- Engineering Faculty, Universidad Andres Bello, Viña del Mar, Chile
- Centro para la Prevención y Control del Cáncer (CECAN), Santiago, Chile
| | - Denisse De Los Angeles
- Engineering Faculty, Universidad Andres Bello, Viña del Mar, Chile
- Centro para la Prevención y Control del Cáncer (CECAN), Santiago, Chile
| | - Carla Taramasco
- Engineering Faculty, Universidad Andres Bello, Viña del Mar, Chile
- Centro para la Prevención y Control del Cáncer (CECAN), Santiago, Chile
| |
Collapse
|
8
|
Torad AA, Ahmed MM, Elabd OM, El-Shamy FF, Alajam RA, Amin WM, Alfaifi BH, Elabd AM. Identifying Predictors of Neck Disability in Patients with Cervical Pain Using Machine Learning Algorithms: A Cross-Sectional Correlational Study. J Clin Med 2024; 13:1967. [PMID: 38610732 PMCID: PMC11012682 DOI: 10.3390/jcm13071967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/23/2024] [Accepted: 03/25/2024] [Indexed: 04/14/2024] Open
Abstract
(1) Background: Neck pain intensity, psychosocial factors, and physical function have been identified as potential predictors of neck disability. Machine learning algorithms have shown promise in classifying patients based on their neck disability status. So, the current study was conducted to identify predictors of neck disability in patients with neck pain based on clinical findings using machine learning algorithms. (2) Methods: Ninety participants with chronic neck pain took part in the study. Demographic characteristics in addition to neck pain intensity, the neck disability index, cervical spine contour, and surface electromyographic characteristics of the axioscapular muscles were measured. Participants were categorised into high disability and low disability groups based on the median value (22.2) of their neck disability index scores. Several regression and classification machine learning models were trained and assessed using a 10-fold cross-validation method; also, MANCOVA was used to compare between the two groups. (3) Results: The multilayer perceptron (MLP) revealed the highest adjusted R2 of 0.768, while linear discriminate analysis showed the highest receiver characteristic operator (ROC) area under the curve of 0.91. Pain intensity was the most important feature in both models with the highest effect size of 0.568 with p < 0.001. (4) Conclusions: The study findings provide valuable insights into pain as the most important predictor of neck disability in patients with cervical pain. Tailoring interventions based on pain can improve patient outcomes and potentially prevent or reduce neck disability.
Collapse
Affiliation(s)
- Ahmed A. Torad
- Basic Science Department, Faculty of Physical Therapy, Kafrelsheik University, Kafrelsheik 33516, Egypt;
| | - Mohamed M. Ahmed
- Department of Physical Therapy, Collage of Applied Medical Sciences, Jazan University, Jizan 45142, Saudi Arabia; (R.A.A.); (W.M.A.); (B.H.A.)
- Department of Basic Sciences, Faculty of Physical Therapy, Beni-Suef University, Beni-Suef 62521, Egypt
| | - Omar M. Elabd
- Department of Orthopedics and Its Surgery, Faculty of Physical Therapy, Delta University for Science and Technology, Gamasa 35712, Egypt;
- Department of Physical Therapy, Aqaba University of Technology, Aqaba 11191, Jordan
| | - Fayiz F. El-Shamy
- Department of Physical Therapy for Women Health, Kafrelsheikh University, Karfelsheikh 33516, Egypt;
| | - Ramzi A. Alajam
- Department of Physical Therapy, Collage of Applied Medical Sciences, Jazan University, Jizan 45142, Saudi Arabia; (R.A.A.); (W.M.A.); (B.H.A.)
| | - Wafaa Mahmoud Amin
- Department of Physical Therapy, Collage of Applied Medical Sciences, Jazan University, Jizan 45142, Saudi Arabia; (R.A.A.); (W.M.A.); (B.H.A.)
- Department of Basic Sciences of Physical Therapy, Faculty of Physical Therapy, Cairo University, Giza 12613, Egypt
| | - Bsmah H. Alfaifi
- Department of Physical Therapy, Collage of Applied Medical Sciences, Jazan University, Jizan 45142, Saudi Arabia; (R.A.A.); (W.M.A.); (B.H.A.)
| | - Aliaa M. Elabd
- Department of Basic Sciences, Faculty of Physical Therapy, Benha University, Benha 13511, Egypt;
| |
Collapse
|
9
|
Hussain S, Ali M, Naseem U, Nezhadmoghadam F, Jatoi MA, Gulliver TA, Tamez-Peña JG. Breast cancer risk prediction using machine learning: a systematic review. Front Oncol 2024; 14:1343627. [PMID: 38571502 PMCID: PMC10987819 DOI: 10.3389/fonc.2024.1343627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 02/26/2024] [Indexed: 04/05/2024] Open
Abstract
Background Breast cancer is the leading cause of cancer-related fatalities among women worldwide. Conventional screening and risk prediction models primarily rely on demographic and patient clinical history to devise policies and estimate likelihood. However, recent advancements in artificial intelligence (AI) techniques, particularly deep learning (DL), have shown promise in the development of personalized risk models. These models leverage individual patient information obtained from medical imaging and associated reports. In this systematic review, we thoroughly investigated the existing literature on the application of DL to digital mammography, radiomics, genomics, and clinical information for breast cancer risk assessment. We critically analyzed these studies and discussed their findings, highlighting the promising prospects of DL techniques for breast cancer risk prediction. Additionally, we explored ongoing research initiatives and potential future applications of AI-driven approaches to further improve breast cancer risk prediction, thereby facilitating more effective screening and personalized risk management strategies. Objective and methods This study presents a comprehensive overview of imaging and non-imaging features used in breast cancer risk prediction using traditional and AI models. The features reviewed in this study included imaging, radiomics, genomics, and clinical features. Furthermore, this survey systematically presented DL methods developed for breast cancer risk prediction, aiming to be useful for both beginners and advanced-level researchers. Results A total of 600 articles were identified, 20 of which met the set criteria and were selected. Parallel benchmarking of DL models, along with natural language processing (NLP) applied to imaging and non-imaging features, could allow clinicians and researchers to gain greater awareness as they consider the clinical deployment or development of new models. This review provides a comprehensive guide for understanding the current status of breast cancer risk assessment using AI. Conclusion This study offers investigators a different perspective on the use of AI for breast cancer risk prediction, incorporating numerous imaging and non-imaging features.
Collapse
Affiliation(s)
- Sadam Hussain
- School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, Mexico
- Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada
| | - Mansoor Ali
- School of Engineering and Sciences, Tecnologico de Monterrey, Monterrey, Mexico
| | - Usman Naseem
- College of Science and Engineering, James Cook University, Cairns, QLD, Australia
| | | | - Munsif Ali Jatoi
- Department of Biomedical Engineering, Salim Habib University, Karachi, Pakistan
| | - T. Aaron Gulliver
- Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada
| | | |
Collapse
|
10
|
Eremici I, Borlea A, Dumitru C, Stoian D. Breast Cancer Risk Factors among Women with Solid Breast Lesions. Clin Pract 2024; 14:473-485. [PMID: 38525715 PMCID: PMC10961805 DOI: 10.3390/clinpract14020036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 03/10/2024] [Accepted: 03/12/2024] [Indexed: 03/26/2024] Open
Abstract
BACKGROUND Breast cancer is the most frequent malignancy in women worldwide and one of the most curable cancers if diagnosed at an early stage. Female patients presenting solid breast lesions are greatly predisposed to breast cancer development, and as such, effective screening of high-risk patients is valuable in early-stage breast cancer detection. OBJECTIVES The aim of our study was to identify the most relevant demographic, reproductive and lifestyle risk factors for breast cancer among women with solid breast lesions living in western Romania, namely the urban region consisting of Timisoara and the rural surrounding regions. METHODS From January 2017 to December 2021, 1161 patients with solid breast lesions, as detected by sonoelastography, were divided into two groups: patients with benign lesions (1019, 87.77%) and patients with malignant nodules (142, 12.23%). The malignancy group was confirmed by a histopathological result. Variables including age, BMI, menarche, menopause, years of exposure to estrogen, number of births, breastfeeding period, use of oral combined contraceptives, smoker status, family medical history and living area (rural-urban) were recorded. RESULTS It was evidenced by our study that the main risk factors for malignancy were elevated age (OR = 1.07, 95% CI 1.05-1.08), BMI (OR = 1.06, 95% CI 1.02-1.10), living area (rural) (OR = 1.86, 95% CI 1.13-2.85) and family medical history (negative) (OR 3.13, 95% CI 1.43-8.29). The other proposed risk factors were not found to be statistically significant. CONCLUSIONS Age and BMI were observed to be the most significant factors for breast cancer risk increase, followed by living in a rural area. A family history of breast cancer was shown to be inversely correlated with cancer risk increase.
Collapse
Affiliation(s)
- Ivana Eremici
- PhD School, Victor Babes University of Medicine and Pharmacy, 300041 Timisoara, Romania
| | - Andreea Borlea
- Department of Internal Medicine II, Victor Babes University of Medicine and Pharmacy, 300041 Timisoara, Romania
| | - Catalin Dumitru
- Obstetrics and Gynecology Department, Victor Babes University of Medicine and Pharmacy, 300041 Timisoara, Romania;
| | - Dana Stoian
- Department of Internal Medicine II, Victor Babes University of Medicine and Pharmacy, 300041 Timisoara, Romania
| |
Collapse
|
11
|
Fong WJ, Tan HM, Garg R, Teh AL, Pan H, Gupta V, Krishna B, Chen ZH, Purwanto NY, Yap F, Tan KH, Chan KYJ, Chan SY, Goh N, Rane N, Tan ESE, Jiang Y, Han M, Meaney M, Wang D, Keppo J, Tan GCY. Comparing feature selection and machine learning approaches for predicting CYP2D6 methylation from genetic variation. Front Neuroinform 2024; 17:1244336. [PMID: 38449836 PMCID: PMC10915285 DOI: 10.3389/fninf.2023.1244336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 10/18/2023] [Indexed: 03/08/2024] Open
Abstract
Introduction Pharmacogenetics currently supports clinical decision-making on the basis of a limited number of variants in a few genes and may benefit paediatric prescribing where there is a need for more precise dosing. Integrating genomic information such as methylation into pharmacogenetic models holds the potential to improve their accuracy and consequently prescribing decisions. Cytochrome P450 2D6 (CYP2D6) is a highly polymorphic gene conventionally associated with the metabolism of commonly used drugs and endogenous substrates. We thus sought to predict epigenetic loci from single nucleotide polymorphisms (SNPs) related to CYP2D6 in children from the GUSTO cohort. Methods Buffy coat DNA methylation was quantified using the Illumina Infinium Methylation EPIC beadchip. CpG sites associated with CYP2D6 were used as outcome variables in Linear Regression, Elastic Net and XGBoost models. We compared feature selection of SNPs from GWAS mQTLs, GTEx eQTLs and SNPs within 2 MB of the CYP2D6 gene and the impact of adding demographic data. The samples were split into training (75%) sets and test (25%) sets for validation. In Elastic Net model and XGBoost models, optimal hyperparameter search was done using 10-fold cross validation. Root Mean Square Error and R-squared values were obtained to investigate each models' performance. When GWAS was performed to determine SNPs associated with CpG sites, a total of 15 SNPs were identified where several SNPs appeared to influence multiple CpG sites. Results Overall, Elastic Net models of genetic features appeared to perform marginally better than heritability estimates and substantially better than Linear Regression and XGBoost models. The addition of nongenetic features appeared to improve performance for some but not all feature sets and probes. The best feature set and Machine Learning (ML) approach differed substantially between CpG sites and a number of top variables were identified for each model. Discussion The development of SNP-based prediction models for CYP2D6 CpG methylation in Singaporean children of varying ethnicities in this study has clinical application. With further validation, they may add to the set of tools available to improve precision medicine and pharmacogenetics-based dosing.
Collapse
Affiliation(s)
- Wei Jing Fong
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Hong Ming Tan
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Rishabh Garg
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Ai Ling Teh
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Hong Pan
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Varsha Gupta
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Bernadus Krishna
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Zou Hui Chen
- Computational Biology, National University of Singapore, Singapore, Singapore
| | | | - Fabian Yap
- KK Women's and Children's Hospital, Singapore, Singapore
| | - Kok Hian Tan
- KK Women's and Children's Hospital, Singapore, Singapore
- Duke NUS Medical School, Singapore, Singapore
| | - Kok Yen Jerry Chan
- KK Women's and Children's Hospital, Singapore, Singapore
- Duke NUS Medical School, Singapore, Singapore
| | - Shiao-Yng Chan
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- National University Hospital, Singapore, Singapore
| | | | - Nikita Rane
- Institute of Mental Health,Singapore, Singapore
| | | | | | - Mei Han
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Michael Meaney
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Dennis Wang
- Singapore Institute for Clinical Sciences (SICS), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- National Heart and Lung Institute, Imperial College London, London, United Kingdom
| | - Jussi Keppo
- Computational Biology, National University of Singapore, Singapore, Singapore
| | - Geoffrey Chern-Yee Tan
- Computational Biology, National University of Singapore, Singapore, Singapore
- Institute of Mental Health,Singapore, Singapore
| |
Collapse
|
12
|
Kaur C, Madaan R. Risk Factor Analysis in Breast Cancer Using Principal Component Analysis. 2023 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTATION, COMMUNICATION AND INFORMATION TECHNOLOGY (ICAICCIT) 2023:482-487. [DOI: 10.1109/icaiccit60255.2023.10465837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
Affiliation(s)
| | - Rosy Madaan
- MRIIRS Faridabad,Department of FET,Haryana,India
| |
Collapse
|
13
|
Nguyen AA, McCarthy AM, Kontos D. Combining Molecular and Radiomic Features for Risk Assessment in Breast Cancer. Annu Rev Biomed Data Sci 2023; 6:299-311. [PMID: 37159874 DOI: 10.1146/annurev-biodatasci-020722-092748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Breast cancer risk is highly variable within the population and current research is leading the shift toward personalized medicine. By accurately assessing an individual woman's risk, we can reduce the risk of over/undertreatment by preventing unnecessary procedures or by elevating screening procedures. Breast density measured from conventional mammography has been established as one of the most dominant risk factors for breast cancer; however, it is currently limited by its ability to characterize more complex breast parenchymal patterns that have been shown to provide additional information to strengthen cancer risk models. Molecular factors ranging from high penetrance, or high likelihood that a mutation will show signs and symptoms of the disease, to combinations of gene mutations with low penetrance have shown promise for augmenting risk assessment. Although imaging biomarkers and molecular biomarkers have both individually demonstrated improved performance in risk assessment, few studies have evaluated them together. This review aims to highlight the current state of the art in breast cancer risk assessment using imaging and genetic biomarkers.
Collapse
Affiliation(s)
- Alex A Nguyen
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Anne Marie McCarthy
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Despina Kontos
- Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania, USA;
| |
Collapse
|
14
|
Paige JS, Lee CI, Wang PC, Hsu W, Brentnall AR, Hoyt AC, Naeim A, Elmore JG. Variability Among Breast Cancer Risk Classification Models When Applied at the Level of the Individual Woman. J Gen Intern Med 2023; 38:2584-2592. [PMID: 36749434 PMCID: PMC10465429 DOI: 10.1007/s11606-023-08043-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 01/13/2023] [Indexed: 02/08/2023]
Abstract
BACKGROUND Breast cancer risk models guide screening and chemoprevention decisions, but the extent and effect of variability among models, particularly at the individual level, is uncertain. OBJECTIVE To quantify the accuracy and disagreement between commonly used risk models in categorizing individual women as average vs. high risk for developing invasive breast cancer. DESIGN Comparison of three risk prediction models: Breast Cancer Risk Assessment Tool (BCRAT), Breast Cancer Surveillance Consortium (BCSC) model, and International Breast Intervention Study (IBIS) model. SUBJECTS Women 40 to 74 years of age presenting for screening mammography at a multisite health system between 2011 and 2015, with 5-year follow-up for cancer outcome. MAIN MEASURES Comparison of model discrimination and calibration at the population level and inter-model agreement for 5-year breast cancer risk at the individual level using two cutoffs (≥ 1.67% and ≥ 3.0%). KEY RESULTS A total of 31,115 women were included. When using the ≥ 1.67% threshold, more than 21% of women were classified as high risk for developing breast cancer in the next 5 years by one model, but average risk by another model. When using the ≥ 3.0% threshold, more than 5% of women had disagreements in risk severity between models. Almost half of the women (46.6%) were classified as high risk by at least one of the three models (e.g., if all three models were applied) for the threshold of ≥ 1.67%, and 11.1% were classified as high risk for ≥ 3.0%. All three models had similar accuracy at the population level. CONCLUSIONS Breast cancer risk estimates for individual women vary substantially, depending on which risk assessment model is used. The choice of cutoff used to define high risk can lead to adverse effects for screening, preventive care, and quality of life for misidentified individuals. Clinicians need to be aware of the high false-positive and false-negative rates and variation between models when talking with patients.
Collapse
Affiliation(s)
- Jeremy S Paige
- Department of Radiology, University of California, Los Angeles, CA, USA
| | - Christoph I Lee
- Department of Radiology, University of Washington School of Medicine, Seattle, WA, USA
| | - Pin-Chieh Wang
- Department of Medicine, Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, and Office of Health Informatics and Analytics, University of California, Los Angeles, Los Angeles, USA
| | - William Hsu
- Department of Radiology, University of California, Los Angeles, CA, USA
| | - Adam R Brentnall
- Centre for Evaluation and Methods, Wolfson Institute of Population Health, Charterhouse Square, Queen Mary University of London, London, UK
| | - Anne C Hoyt
- Department of Radiology, University of California, Los Angeles, CA, USA
| | - Arash Naeim
- Division of Hematology and Oncology, Department of Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Joann G Elmore
- Department of Medicine, Division of General Internal Medicine and Health Services Research and the National Clinician Scholars Program, David Geffen School of Medicine, University of California, Los Angeles, 1100 Glendon Ave, Ste. 900, Los Angeles, CA, 90024, USA.
| |
Collapse
|
15
|
Ortiz MMO, Andrechek ER. Molecular Characterization and Landscape of Breast cancer Models from a multi-omics Perspective. J Mammary Gland Biol Neoplasia 2023; 28:12. [PMID: 37269418 DOI: 10.1007/s10911-023-09540-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 05/25/2023] [Indexed: 06/05/2023] Open
Abstract
Breast cancer is well-known to be a highly heterogenous disease. This facet of cancer makes finding a research model that mirrors the disparate intrinsic features challenging. With advances in multi-omics technologies, establishing parallels between the various models and human tumors is increasingly intricate. Here we review the various model systems and their relation to primary breast tumors using available omics data platforms. Among the research models reviewed here, breast cancer cell lines have the least resemblance to human tumors since they have accumulated many mutations and copy number alterations during their long use. Moreover, individual proteomic and metabolomic profiles do not overlap with the molecular landscape of breast cancer. Interestingly, omics analysis revealed that the initial subtype classification of some breast cancer cell lines was inappropriate. In cell lines the major subtypes are all well represented and share some features with primary tumors. In contrast, patient-derived xenografts (PDX) and patient-derived organoids (PDO) are superior in mirroring human breast cancers at many levels, making them suitable models for drug screening and molecular analysis. While patient derived organoids are spread across luminal, basal- and normal-like subtypes, the PDX samples were initially largely basal but other subtypes have been increasingly described. Murine models offer heterogenous tumor landscapes, inter and intra-model heterogeneity, and give rise to tumors of different phenotypes and histology. Murine models have a reduced mutational burden compared to human breast cancer but share some transcriptomic resemblance, and representation of many breast cancer subtypes can be found among the variety subtypes. To date, while mammospheres and three- dimensional cultures lack comprehensive omics data, these are excellent models for the study of stem cells, cell fate decision and differentiation, and have also been used for drug screening. Therefore, this review explores the molecular landscapes and characterization of breast cancer research models by comparing recent published multi-omics data and analysis.
Collapse
Affiliation(s)
- Mylena M O Ortiz
- Genetics and Genomics Science Program, Michigan State University, East Lansing, MI, USA
| | - Eran R Andrechek
- Department of Physiology, Michigan State University, 2194 BPS Building 567 Wilson Road, East Lansing, MI, 48824, USA.
| |
Collapse
|
16
|
Alzoubi H, Alzubi R, Ramzan N. Deep Learning Framework for Complex Disease Risk Prediction Using Genomic Variations. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23094439. [PMID: 37177642 PMCID: PMC10181706 DOI: 10.3390/s23094439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 04/05/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023]
Abstract
Genome-wide association studies have proven their ability to improve human health outcomes by identifying genotypes associated with phenotypes. Various works have attempted to predict the risk of diseases for individuals based on genotype data. This prediction can either be considered as an analysis model that can lead to a better understanding of gene functions that underlie human disease or as a black box in order to be used in decision support systems and in early disease detection. Deep learning techniques have gained more popularity recently. In this work, we propose a deep-learning framework for disease risk prediction. The proposed framework employs a multilayer perceptron (MLP) in order to predict individuals' disease status. The proposed framework was applied to the Wellcome Trust Case-Control Consortium (WTCCC), the UK National Blood Service (NBS) Control Group, and the 1958 British Birth Cohort (58C) datasets. The performance comparison of the proposed framework showed that the proposed approach outperformed the other methods in predicting disease risk, achieving an area under the curve (AUC) up to 0.94.
Collapse
Affiliation(s)
- Hadeel Alzoubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Raid Alzubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Naeem Ramzan
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High Street, Paisley PA1 2BE, UK
| |
Collapse
|
17
|
Learning high-order interactions for polygenic risk prediction. PLoS One 2023; 18:e0281618. [PMID: 36763605 PMCID: PMC9916647 DOI: 10.1371/journal.pone.0281618] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 01/27/2023] [Indexed: 02/11/2023] Open
Abstract
Within the framework of precision medicine, the stratification of individual genetic susceptibility based on inherited DNA variation has paramount relevance. However, one of the most relevant pitfalls of traditional Polygenic Risk Scores (PRS) approaches is their inability to model complex high-order non-linear SNP-SNP interactions and their effect on the phenotype (e.g. epistasis). Indeed, they incur in a computational challenge as the number of possible interactions grows exponentially with the number of SNPs considered, affecting the statistical reliability of the model parameters as well. In this work, we address this issue by proposing a novel PRS approach, called High-order Interactions-aware Polygenic Risk Score (hiPRS), that incorporates high-order interactions in modeling polygenic risk. The latter combines an interaction search routine based on frequent itemsets mining and a novel interaction selection algorithm based on Mutual Information, to construct a simple and interpretable weighted model of user-specified dimensionality that can predict a given binary phenotype. Compared to traditional PRSs methods, hiPRS does not rely on GWAS summary statistics nor any external information. Moreover, hiPRS differs from Machine Learning-based approaches that can include complex interactions in that it provides a readable and interpretable model and it is able to control overfitting, even on small samples. In the present work we demonstrate through a comprehensive simulation study the superior performance of hiPRS w.r.t. state of the art methods, both in terms of scoring performance and interpretability of the resulting model. We also test hiPRS against small sample size, class imbalance and the presence of noise, showcasing its robustness to extreme experimental settings. Finally, we apply hiPRS to a case study on real data from DACHS cohort, defining an interaction-aware scoring model to predict mortality of stage II-III Colon-Rectal Cancer patients treated with oxaliplatin.
Collapse
|
18
|
Furth PA, Wang W, Kang K, Rooney BL, Keegan G, Muralidaran V, Zou X, Flaws JA. Esr1 but Not CYP19A1 Overexpression in Mammary Epithelial Cells during Reproductive Senescence Induces Pregnancy-Like Proliferative Mammary Disease Responsive to Anti-Hormonals. THE AMERICAN JOURNAL OF PATHOLOGY 2023; 193:84-102. [PMID: 36464512 PMCID: PMC9768685 DOI: 10.1016/j.ajpath.2022.09.007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 08/22/2022] [Accepted: 09/16/2022] [Indexed: 12/04/2022]
Abstract
Molecular-level analyses of breast carcinogenesis benefit from vivo disease models. Estrogen receptor 1 (Esr1) and cytochrome P450 family 19 subfamily A member 1 (CYP19A1) overexpression targeted to mammary epithelial cells in genetically engineered mouse models induces largely similar rates of proliferative mammary disease in prereproductive senescent mice. Herein, with natural reproductive senescence, Esr1 overexpression compared with CYP19A1 overexpression resulted in significantly higher rates of preneoplasia and cancer. Before reproductive senescence, Esr1, but not CYP19A1, overexpressing mice are tamoxifen resistant. However, during reproductive senescence, Esr1 mice exhibited responsiveness. Both Esr1 and CYP19A1 are responsive to letrozole before and after reproductive senescence. Gene Set Enrichment Analyses of RNA-sequencing data sets showed that higher disease rates in Esr1 mice were accompanied by significantly higher expression of cell proliferation genes, including members of prognostic platforms for women with early-stage hormone receptor-positive disease. Tamoxifen and letrozole exposure induced down-regulation of these genes and resolved differences between the two models. Both Esr1 and CYP19A1 overexpression induced abnormal developmental patterns of pregnancy-like gene expression. This resolved with progression through reproductive senescence in CYP19A1 mice, but was more persistent in Esr1 mice, resolving only with tamoxifen and letrozole exposure. In summary, genetically engineered mouse models of Esr1 and CYP19A1 overexpression revealed a diversion of disease processes resulting from the two distinct molecular pathophysiological mammary gland-targeted intrusions into estrogen signaling during reproductive senescence.
Collapse
Affiliation(s)
- Priscilla A Furth
- Department of Oncology, Georgetown University, Washington, District of Columbia; Department of Medicine, Georgetown University, Washington, District of Columbia.
| | - Weisheng Wang
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Keunsoo Kang
- Department of Microbiology, College of Science and Technology, Dankook University, Cheonan, Republic of Korea
| | - Brendan L Rooney
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Grace Keegan
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Vinona Muralidaran
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Xiaojun Zou
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Jodi A Flaws
- Department of Comparative Biosciences, University of Illinois Urbana-Champaign, Urbana, Illinois
| |
Collapse
|
19
|
Furth PA, Wang W, Kang K, Rooney BL, Keegan G, Muralidaran V, Wong J, Shearer C, Zou X, Flaws JA. Overexpression of Estrogen Receptor α in Mammary Glands of Aging Mice Is Associated with a Proliferative Risk Signature and Generation of Estrogen Receptor α-Positive Mammary Adenocarcinomas. THE AMERICAN JOURNAL OF PATHOLOGY 2023; 193:103-120. [PMID: 36464513 PMCID: PMC9768686 DOI: 10.1016/j.ajpath.2022.09.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 08/29/2022] [Accepted: 09/28/2022] [Indexed: 12/03/2022]
Abstract
Age is a risk factor for human estrogen receptor-positive breast cancer, with highest prevalence following menopause. While transcriptome risk profiling is available for human breast cancers, it is not yet developed for prognostication for primary or secondary breast cancer development utilizing at-risk breast tissue. Both estrogen receptor α (ER) and aromatase overexpression have been linked to human breast cancer. Herein, conditional genetically engineered mouse models of estrogen receptor 1 (Esr1) and cytochrome P450 family 19 subfamily A member 1 (CYP19A1) were used to show that induction of Esr1 overexpression just before or with reproductive senescence and maintained through age 30 months resulted in significantly higher prevalence of estrogen receptor-positive adenocarcinomas than CYP19A1 overexpression. All adenocarcinomas tested showed high percentages of ER+ cells. Mammary cancer development was preceded by a persistent proliferative transcriptome risk signature initiated within 1 week of transgene induction that showed parallels to the Prosigna/Prediction Analysis of Microarray 50 human prognostic signature for early-stage human ER+ breast cancer. CYP19A1 mice also developed ER+ mammary cancers, but histology was more divided between adenocarcinoma and adenosquamous, with one ER- adenocarcinoma. Results demonstrate that, like humans, generation of ER+ adenocarcinoma in mice was facilitated by aging mice past the age of reproductive senescence. Esr1 overexpression was associated with a proliferative estrogen pathway-linked signature that preceded appearance of ER+ mammary adenocarcinomas.
Collapse
Affiliation(s)
- Priscilla A Furth
- Department of Oncology, Georgetown University, Washington, District of Columbia; Department of Medicine, Georgetown University, Washington, District of Columbia.
| | - Weisheng Wang
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Keunsoo Kang
- Department of Microbiology, College of Science and Technology, Dankook University, Cheonan, Republic of Korea
| | - Brendan L Rooney
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Grace Keegan
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Vinona Muralidaran
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Justin Wong
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Charles Shearer
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Xiaojun Zou
- Department of Oncology, Georgetown University, Washington, District of Columbia
| | - Jodi A Flaws
- Department of Comparative Biosciences, University of Illinois Urbana-Champaign, Urbana, Illinois
| |
Collapse
|
20
|
Lee W, Schwartz N, Bansal A, Khor S, Hammarlund N, Basu A, Devine B. A Scoping Review of the Use of Machine Learning in Health Economics and Outcomes Research: Part 2-Data From Nonwearables. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2022; 25:2053-2061. [PMID: 35989154 DOI: 10.1016/j.jval.2022.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 06/10/2022] [Accepted: 07/10/2022] [Indexed: 06/15/2023]
Abstract
OBJECTIVES Despite the increasing interest in applying machine learning (ML) methods in health economics and outcomes research (HEOR), stakeholders face uncertainties in when and how ML can be used. We reviewed the recent applications of ML in HEOR. METHODS We searched PubMed for studies published between January 2020 and March 2021 and randomly chose 20% of the identified studies for the sake of manageability. Studies that were in HEOR and applied an ML technique were included. Studies related to wearable devices were excluded. We abstracted information on the ML applications, data types, and ML methods and analyzed it using descriptive statistics. RESULTS We retrieved 805 articles, of which 161 (20%) were randomly chosen. Ninety-two of the random sample met the eligibility criteria. We found that ML was primarily used for predicting future events (86%) rather than current events (14%). The most common response variables were clinical events or disease incidence (42%) and treatment outcomes (22%). ML was less used to predict economic outcomes such as health resource utilization (16%) or costs (3%). Although electronic medical records (35%) were frequently used for model development, claims data were used less frequently (9%). Tree-based methods (eg, random forests and boosting) were the most commonly used ML methods (31%). CONCLUSIONS The use of ML techniques in HEOR is growing rapidly, but there remain opportunities to apply them to predict economic outcomes, especially using claims databases, which could inform the development of cost-effectiveness models.
Collapse
Affiliation(s)
- Woojung Lee
- The Comparative Health Outcomes, Policy, and Economics (CHOICE) Institute, School of Pharmacy, University of Washington, Seattle, WA, USA.
| | - Naomi Schwartz
- The Comparative Health Outcomes, Policy, and Economics (CHOICE) Institute, School of Pharmacy, University of Washington, Seattle, WA, USA
| | - Aasthaa Bansal
- The Comparative Health Outcomes, Policy, and Economics (CHOICE) Institute, School of Pharmacy, University of Washington, Seattle, WA, USA
| | - Sara Khor
- The Comparative Health Outcomes, Policy, and Economics (CHOICE) Institute, School of Pharmacy, University of Washington, Seattle, WA, USA
| | - Noah Hammarlund
- Department of Health Services Research, Management & Policy, University of Florida, Gainesville, FL, USA
| | - Anirban Basu
- The Comparative Health Outcomes, Policy, and Economics (CHOICE) Institute, School of Pharmacy, University of Washington, Seattle, WA, USA
| | - Beth Devine
- The Comparative Health Outcomes, Policy, and Economics (CHOICE) Institute, School of Pharmacy, University of Washington, Seattle, WA, USA
| |
Collapse
|
21
|
Iqbal MS, Ahmad W, Alizadehsani R, Hussain S, Rehman R. Breast Cancer Dataset, Classification and Detection Using Deep Learning. Healthcare (Basel) 2022; 10:2395. [PMID: 36553919 PMCID: PMC9778593 DOI: 10.3390/healthcare10122395] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 11/24/2022] [Accepted: 11/25/2022] [Indexed: 12/05/2022] Open
Abstract
Incorporating scientific research into clinical practice via clinical informatics, which includes genomics, proteomics, bioinformatics, and biostatistics, improves patients' treatment. Computational pathology is a growing subspecialty with the potential to integrate whole slide images, multi-omics data, and health informatics. Pathology and laboratory medicine are critical to diagnosing cancer. This work will review existing computational and digital pathology methods for breast cancer diagnosis with a special focus on deep learning. The paper starts by reviewing public datasets related to breast cancer diagnosis. Additionally, existing deep learning methods for breast cancer diagnosis are reviewed. The publicly available code repositories are introduced as well. The paper is closed by highlighting challenges and future works for deep learning-based diagnosis.
Collapse
Affiliation(s)
- Muhammad Shahid Iqbal
- Department of Computer Science and Information Technology, Women University AJK, Bagh 12500, Pakistan
| | - Waqas Ahmad
- Higher Education Department Govt, AJK, Mirpur 10250, Pakistan
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Geelong, VIC 3216, Australia
| | - Sadiq Hussain
- Examination Branch, Dibrugarh University, Dibrugarh 786004, India
| | - Rizwan Rehman
- Centre for Computer Science and Applications, Dibrugarh University, Dibrugarh 786004, India
| |
Collapse
|
22
|
Bautista Saiz C, Mora Gómez MM, Polo JF, Gutiérrez Castañeda LD. La proteína 7 unida al receptor del factor de crecimiento (GRB7) en cáncer de mama. REPERTORIO DE MEDICINA Y CIRUGÍA 2022. [DOI: 10.31260/repertmedcir.01217372.1119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
El cáncer de mama debe considerarse como un problema de salud pública ya que es la causa principal de muerte en mujeres en el mundo. Se conoce que es multifactorial y heterogéneo de manera que cada tumor tiene características genéticas y moleculares propias, lo cual se refleja en el comportamiento clínico, respuesta al tratamiento y pronóstico. La proteína 7 unida al receptor del factor de crecimiento (GRB7) hace parte de un grupo de proteínas GRB que median la interacción entre receptores tirosina cinasa y proteínas efectoras en algunas vías de señalización involucradas en transducción de señales, migración celular y angiogénesis. Esta proteína es codificada por el gen GRB7 localizado en el cromosoma 17 en el locus 17q11–21, cerca del gen ERBB2, lo que sugiere coamplificación y coexpresión de estos dos genes en el desarrollo del cáncer. Se ha visto que la proteína GRB7 por sí sola está presente en la biología molecular implícita del cáncer de mama, interviniendo en la proliferación y migración celular facilitando así la invasión y posibles metástasis. Se considera como un factor de mal pronóstico en esta enfermedad.
Collapse
|
23
|
Elgart M, Lyons G, Romero-Brufau S, Kurniansyah N, Brody JA, Guo X, Lin HJ, Raffield L, Gao Y, Chen H, de Vries P, Lloyd-Jones DM, Lange LA, Peloso GM, Fornage M, Rotter JI, Rich SS, Morrison AC, Psaty BM, Levy D, Redline S, Sofer T. Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations. Commun Biol 2022; 5:856. [PMID: 35995843 PMCID: PMC9395509 DOI: 10.1038/s42003-022-03812-z] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 08/05/2022] [Indexed: 01/03/2023] Open
Abstract
Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models.
Collapse
Affiliation(s)
- Michael Elgart
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
| | - Genevieve Lyons
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Santiago Romero-Brufau
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Medicine, Mayo Clinic, Rochester, MN, USA
| | - Nuzulul Kurniansyah
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
| | - Jennifer A Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Henry J Lin
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Laura Raffield
- Department of Genetics, University of North Carolina, Chapel Hill, NC, USA
| | - Yan Gao
- The Jackson Heart Study, University of Mississippi Medical Center, Jackson, MS, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Paul de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Leslie A Lange
- Department of Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA
| | - Daniel Levy
- The Population Sciences Branch of the National Heart, Lung and Blood Institute, Bethesda, MD, USA
- The Framingham Heart Study, Framingham, MA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Tamar Sofer
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, USA.
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| |
Collapse
|
24
|
Moorthie S, Babb de Villiers C, Burton H, Kroese M, Antoniou AC, Bhattacharjee P, Garcia-Closas M, Hall P, Schmidt MK. Towards implementation of comprehensive breast cancer risk prediction tools in health care for personalised prevention. Prev Med 2022; 159:107075. [PMID: 35526672 DOI: 10.1016/j.ypmed.2022.107075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 04/05/2022] [Accepted: 05/02/2022] [Indexed: 12/24/2022]
Abstract
Advances in knowledge about breast cancer risk factors have led to the development of more comprehensive risk models. These integrate information on a variety of risk factors such as lifestyle, genetics, family history, and breast density. These risk models have the potential to deliver more personalised breast cancer prevention. This is through improving accuracy of risk estimates, enabling more effective targeting of preventive options and creating novel prevention pathways through enabling risk estimation in a wider variety of populations than currently possible. The systematic use of risk tools as part of population screening programmes is one such example. A clear understanding of how such tools can contribute to the goal of personalised prevention can aid in understanding and addressing barriers to implementation. In this paper we describe how emerging models, and their associated tools can contribute to the goal of personalised healthcare for breast cancer through health promotion, early disease detection (screening) and improved management of women at higher risk of disease. We outline how addressing specific challenges on the level of communication, evidence, evaluation, regulation, and acceptance, can facilitate implementation and uptake.
Collapse
Affiliation(s)
- Sowmiya Moorthie
- PHG Foundation, University of Cambridge, Cambridge, UK; Cambridge Public Health, University of Cambridge School of Clinical Medicine, Forvie Site, Cambridge Biomedical Campus, Cambridge CB2 0SR, United Kingdom.
| | | | - Hilary Burton
- PHG Foundation, University of Cambridge, Cambridge, UK
| | - Mark Kroese
- PHG Foundation, University of Cambridge, Cambridge, UK
| | - Antonis C Antoniou
- Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
| | - Proteeti Bhattacharjee
- Division of Molecular Pathology, The Netherlands Cancer Institute - Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands
| | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health (NIH), Bethesda, USA
| | - Per Hall
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden; Department of Oncology, Södersjukhuset, Stockholm, Sweden
| | - Marjanka K Schmidt
- Division of Molecular Pathology, The Netherlands Cancer Institute - Antoni van Leeuwenhoek Hospital, Amsterdam, the Netherlands; Department of Clinical Genetics, Leiden University Medical Center, Leiden, the Netherlands
| |
Collapse
|
25
|
Rabiei R, Ayyoubzadeh SM, Sohrabei S, Esmaeili M, Atashi A. Prediction of Breast Cancer using Machine Learning Approaches. J Biomed Phys Eng 2022; 12:297-308. [PMID: 35698545 PMCID: PMC9175124 DOI: 10.31661/jbpe.v0i0.2109-1403] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Accepted: 03/05/2022] [Indexed: 05/27/2023]
Abstract
BACKGROUND Breast cancer is considered one of the most common cancers in women caused by various clinical, lifestyle, social, and economic factors. Machine learning has the potential to predict breast cancer based on features hidden in data. OBJECTIVE This study aimed to predict breast cancer using different machine-learning approaches applying demographic, laboratory, and mammographic data. MATERIAL AND METHODS In this analytical study, the database, including 5,178 independent records, 25% of which belonged to breast cancer patients with 24 attributes in each record was obtained from Motamed cancer institute (ACECR), Tehran, Iran. The database contained 5,178 independent records, 25% of which belonged to breast cancer patients containing 24 attributes in each record. The random forest (RF), neural network (MLP), gradient boosting trees (GBT), and genetic algorithms (GA) were used in this study. Models were initially trained with demographic and laboratory features (20 features). The models were then trained with all demographic, laboratory, and mammographic features (24 features) to measure the effectiveness of mammography features in predicting breast cancer. RESULTS RF presented higher performance compared to other techniques (accuracy 80%, sensitivity 95%, specificity 80%, and the area under the curve (AUC) 0.56). Gradient boosting (AUC=0.59) showed a stronger performance compared to the neural network. CONCLUSION Combining multiple risk factors in modeling for breast cancer prediction could help the early diagnosis of the disease with necessary care plans. Collection, storage, and management of different data and intelligent systems based on multiple factors for predicting breast cancer are effective in disease management.
Collapse
Affiliation(s)
- Reza Rabiei
- PhD, Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed Mohammad Ayyoubzadeh
- PhD, Department of Health Information Technology and Management, School of Allied Medical Sciences, Tehran University of Medical Science, Tehran, Iran
| | - Solmaz Sohrabei
- MSc, Department Deputy of Development, Management and Resources, Office of Statistic and Information Technology Management, Zanjan University of Medical Sciences, Zanjan, Iran
| | - Marzieh Esmaeili
- PhD, Department of Health Information Technology and Management, School of Allied Medical Sciences, Tehran University of Medical Science, Tehran, Iran
| | - Alireza Atashi
- PhD, Department of E-Health, Virtual School, Tehran University of Medical Sciences, Medical Informatics Research Group, Clinical Research Department, Breast Cancer Research Center, Motamed Cancer Institute, ACECR, Tehran, Iran
| |
Collapse
|
26
|
Characterization of transcriptome diversity and in vitro behavior of primary human high-risk breast cells. Sci Rep 2022; 12:6159. [PMID: 35459280 PMCID: PMC9033878 DOI: 10.1038/s41598-022-10246-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Accepted: 04/01/2022] [Indexed: 11/09/2022] Open
Abstract
Biology and transcriptomes of non-cancerous human mammary epithelial cells at risk for breast cancer development were explored following primary isolation utilizing conditional reprogramming cell technology from mastectomy tissue ipsilateral to invasive breast cancer. Cultures demonstrated consistent categorizable behaviors. Relative viability and mammosphere formation differed between samples but were stable across three different mammary-specific media. E2F cell cycle target genes expression levels were positively correlated with viability and advancing age was inversely associated. Estrogen growth response was associated with Tissue necrosis factor signaling and Interferon alpha response gene enrichment. Neoadjuvant chemotherapy exposure significantly altered transcriptomes, shifting them towards expression of genes linked to mammary stem cell formation. Breast cancer prognostic signature sets include genes that in normal development are limited to specific stages of pregnancy or the menstrual cycle. Sample transcriptomes were queried for stage specific gene expression patterns. All cancer samples and a portion of high-risk samples showed overlapping stages reflective of abnormal gene expression patterns, while other high-risk samples exhibited more stage specific patterns. In conclusion, at-risk cells preserve behavioral and transcriptome diversity that could reflect different risk profiles. It is possible that prognostic platforms analogous to those used for breast cancer could be developed for high-risk mammary cells.
Collapse
|
27
|
Hou C, Xu B, Hao Y, Yang D, Song H, Li J. Development and validation of polygenic risk scores for prediction of breast cancer and breast cancer subtypes in Chinese women. BMC Cancer 2022; 22:374. [PMID: 35395775 PMCID: PMC8991589 DOI: 10.1186/s12885-022-09425-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 03/15/2022] [Indexed: 02/08/2023] Open
Abstract
Background Studies investigating breast cancer polygenic risk score (PRS) in Chinese women are scarce. The objectives of this study were to develop and validate PRSs that could be used to stratify risk for overall and subtype-specific breast cancer in Chinese women, and to evaluate the performance of a newly proposed Artificial Neural Network (ANN) based approach for PRS construction. Methods The PRSs were constructed using the dataset from a genome-wide association study (GWAS) and validated in an independent case-control study. Three approaches, including repeated logistic regression (RLR), logistic ridge regression (LRR) and ANN based approach, were used to build the PRSs for overall and subtype-specific breast cancer based on 24 selected single nucleotide polymorphisms (SNPs). Predictive performance and calibration of the PRSs were evaluated unadjusted and adjusted for Gail-2 model 5-year risk or classical breast cancer risk factors. Results The primary PRSANN and PRSLRR both showed modest predictive ability for overall breast cancer (odds ratio per interquartile range increase of the PRS in controls [IQ-OR] 1.76 vs 1.58; area under the receiver operator characteristic curve [AUC] 0.601 vs 0.598) and remained to be predictive after adjustment. Although estrogen receptor negative (ER−) breast cancer was poorly predicted by the primary PRSs, the ER− PRSs trained solely on ER− breast cancer cases saw a substantial improvement in predictions of ER− breast cancer. Conclusions The 24 SNPs based PRSs can provide additional risk information to help breast cancer risk stratification in the general population of China. The newly proposed ANN approach for PRS construction has potential to replace the traditional approaches, but more studies are needed to validate and investigate its performance. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-022-09425-3.
Collapse
Affiliation(s)
- Can Hou
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, No. 37 Guo Xue Xiang, Chengdu, 610047, Sichuan, China.,Department of Epidemiology and Biostatistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, No.16 Ren Min Nan Lu, Chengdu, 610041, Sichuan, China.,Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Bin Xu
- Department of Epidemiology and Biostatistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, No.16 Ren Min Nan Lu, Chengdu, 610041, Sichuan, China
| | - Yu Hao
- Department of Epidemiology and Biostatistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, No.16 Ren Min Nan Lu, Chengdu, 610041, Sichuan, China
| | - Daowen Yang
- Robot Perception and Control Joint Lab, Sichuan University & Aisono, Chengdu, China
| | - Huan Song
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, No. 37 Guo Xue Xiang, Chengdu, 610047, Sichuan, China. .,Med-X Center for Informatics, Sichuan University, Chengdu, China.
| | - Jiayuan Li
- Department of Epidemiology and Biostatistics, West China School of Public Health and West China Fourth Hospital, Sichuan University, No.16 Ren Min Nan Lu, Chengdu, 610041, Sichuan, China.
| |
Collapse
|
28
|
Chung CW, Hsiao TH, Huang CJ, Chen YJ, Chen HH, Lin CH, Chou SC, Chen TS, Chung YF, Yang HI, Chen YM. Machine learning approaches for the genomic prediction of rheumatoid arthritis and systemic lupus erythematosus. BioData Min 2021; 14:52. [PMID: 34895289 PMCID: PMC8666017 DOI: 10.1186/s13040-021-00284-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 11/21/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rheumatoid arthritis (RA) and systemic lupus erythematous (SLE) are autoimmune rheumatic diseases that share a complex genetic background and common clinical features. This study's purpose was to construct machine learning (ML) models for the genomic prediction of RA and SLE. METHODS A total of 2,094 patients with RA and 2,190 patients with SLE were enrolled from the Taichung Veterans General Hospital cohort of the Taiwan Precision Medicine Initiative. Genome-wide single nucleotide polymorphism (SNP) data were obtained using Taiwan Biobank version 2 array. The ML methods used were logistic regression (LR), random forest (RF), support vector machine (SVM), gradient tree boosting (GTB), and extreme gradient boosting (XGB). SHapley Additive exPlanation (SHAP) values were calculated to clarify the contribution of each SNPs. Human leukocyte antigen (HLA) imputation was performed using the HLA Genotype Imputation with Attribute Bagging package. RESULTS Compared with LR (area under the curve [AUC] = 0.8247), the RF approach (AUC = 0.9844), SVM (AUC = 0.9828), GTB (AUC = 0.9932), and XGB (AUC = 0.9919) exhibited significantly better prediction performance. The top 20 genes by feature importance and SHAP values included HLA class II alleles. We found that imputed HLA-DQA1*05:01, DQB1*0201 and DRB1*0301 were associated with SLE; HLA-DQA1*03:03, DQB1*0401, DRB1*0405 were more frequently observed in patients with RA. CONCLUSIONS We established ML methods for genomic prediction of RA and SLE. Genetic variations at HLA-DQA1, HLA-DQB1, and HLA-DRB1 were crucial for differentiating RA from SLE. Future studies are required to verify our results and explore their mechanistic explanation.
Collapse
Affiliation(s)
- Chih-Wei Chung
- Department of Information Management, National Taiwan University, Taipei, Taiwan
| | - Tzu-Hung Hsiao
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Chih-Jen Huang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Yen-Ju Chen
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
- Division of Allergy, Immunology and Rheumatology, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Hsin-Hua Chen
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
- Division of Allergy, Immunology and Rheumatology, Taichung Veterans General Hospital, Taichung, Taiwan
- Rong Hsing Research Center for Translational Medicine & Ph.D. Program in Translational Medicine, National Chung Hsing University, Taichung, Taiwan
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ching-Heng Lin
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Seng-Cho Chou
- Department of Information Management, National Taiwan University, Taipei, Taiwan
| | - Tzer-Shyong Chen
- Department of Information Management, Tunghai University, Taichung, Taiwan
| | - Yu-Fang Chung
- Department of Electrical Engineering, Tunghai University, Taichung, Taiwan
| | - Hwai-I Yang
- Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Yi-Ming Chen
- Department of Medical Research, Taichung Veterans General Hospital, Taichung, Taiwan.
- Division of Allergy, Immunology and Rheumatology, Taichung Veterans General Hospital, Taichung, Taiwan.
- Rong Hsing Research Center for Translational Medicine & Ph.D. Program in Translational Medicine, National Chung Hsing University, Taichung, Taiwan.
- School of Medicine, College of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan.
- College of Medicine, National Chung Hsing University, 40227, Taichung City, Taiwan.
| |
Collapse
|
29
|
Jalbrzikowski M. Polygenic Scores for Psychiatric Disorders: One Important Piece of the Risk Prediction Puzzle. Biol Psychiatry 2021; 90:e41-e42. [PMID: 34620379 DOI: 10.1016/j.biopsych.2021.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 08/26/2021] [Indexed: 11/24/2022]
Affiliation(s)
- Maria Jalbrzikowski
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania.
| |
Collapse
|
30
|
Tehranifar P, Wei Y, Terry MB. Less Is More-Ways to Move Forward for Improved Breast Cancer Risk Stratification. Cancer Epidemiol Biomarkers Prev 2021; 30:587-589. [PMID: 33811169 DOI: 10.1158/1055-9965.epi-20-1627] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Revised: 01/12/2021] [Accepted: 01/14/2021] [Indexed: 11/16/2022] Open
Abstract
Breast cancer risk models increasingly are including mammographic density (MD) and polygenic risk scores (PRS) to improve identification of higher-risk women who may benefit from genetic screening, earlier and supplemental breast screening, chemoprevention, and other targeted interventions. Here, we present additional considerations for improved clinical use of risk prediction models with MD, PRS, and questionnaire-based risk factors. These considerations include whether changing risk factor patterns, including MD, can improve risk prediction and management, and whether PRS could help inform breast cancer screening without MD measures and prior to the age at initiation of population-based mammography. We further argue that it may be time to reconsider issues around breast cancer risk models that may warrant a more comprehensive head-to-head comparison with other methods for risk factor assessment and risk prediction, including emerging artificial intelligence methods. With the increasing recognition of limitations of any single mathematical model, no matter how simplified, we are at an important juncture for consideration of these different approaches for improved risk stratification in geographically and ethnically diverse populations.See related article by Rosner et al., p. 600.
Collapse
Affiliation(s)
- Parisa Tehranifar
- Department of Epidemiology, Columbia University Mailman School of Public Health, New York, New York. .,Herbert Irving Comprehensive Cancer Center, Columbia University Medical Center, New York, New York
| | - Ying Wei
- Herbert Irving Comprehensive Cancer Center, Columbia University Medical Center, New York, New York. .,Department of Biostatistics, Columbia University Mailman School of Public Health, New York, New York
| | - Mary Beth Terry
- Department of Epidemiology, Columbia University Mailman School of Public Health, New York, New York. .,Herbert Irving Comprehensive Cancer Center, Columbia University Medical Center, New York, New York
| |
Collapse
|
31
|
Malherbe K. Tumor Microenvironment and the Role of Artificial Intelligence in Breast Cancer Detection and Prognosis. THE AMERICAN JOURNAL OF PATHOLOGY 2021; 191:1364-1373. [PMID: 33639101 DOI: 10.1016/j.ajpath.2021.01.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 01/02/2021] [Accepted: 01/28/2021] [Indexed: 12/21/2022]
Abstract
A critical knowledge gap has been noted in breast cancer detection, prognosis, and evaluation between tumor microenvironment and associated neoplasm. Artificial intelligence (AI) has multiple subsets or methods for data extraction and evaluation, including artificial neural networking, which allows computational foundations, similar to neurons, to make connections and new neural pathways during data set training. Deep machine learning and AI hold great potential to accurately assess tumor microenvironment models employing vast data management techniques. Despite the significant potential AI holds, there is still much debate surrounding the appropriate and ethical curation of medical data from picture archiving and communication systems. AI output's clinical significance depends on its human predecessor's data training sets. Integration between biomarkers, risk factors, and imaging data will allow the best predictor models for patient-based outcomes.
Collapse
Affiliation(s)
- Kathryn Malherbe
- Department Radiography, Faculty Health Sciences, University of Pretoria, Pretoria, South Africa.
| |
Collapse
|
32
|
Lin YH, Liao KYK, Sung KB. Automatic detection and characterization of quantitative phase images of thalassemic red blood cells using a mask region-based convolutional neural network. JOURNAL OF BIOMEDICAL OPTICS 2020; 25:JBO-200187R. [PMID: 33188571 PMCID: PMC7665881 DOI: 10.1117/1.jbo.25.11.116502] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 10/26/2020] [Indexed: 05/07/2023]
Abstract
SIGNIFICANCE Label-free quantitative phase imaging is a promising technique for the automatic detection of abnormal red blood cells (RBCs) in real time. Although deep-learning techniques can accurately detect abnormal RBCs from quantitative phase images efficiently, their applications in diagnostic testing are limited by the lack of transparency. More interpretable results such as morphological and biochemical characteristics of individual RBCs are highly desirable. AIM An end-to-end deep-learning model was developed to efficiently discriminate thalassemic RBCs (tRBCs) from healthy RBCs (hRBCs) in quantitative phase images and segment RBCs for single-cell characterization. APPROACH Two-dimensional quantitative phase images of hRBCs and tRBCs were acquired using digital holographic microscopy. A mask region-based convolutional neural network (Mask R-CNN) model was trained to discriminate tRBCs and segment individual RBCs. Characterization of tRBCs was achieved utilizing SHapley Additive exPlanation analysis and canonical correlation analysis on automatically segmented RBC phase images. RESULTS The implemented model achieved 97.8% accuracy in detecting tRBCs. Phase-shift statistics showed the highest influence on the correct classification of tRBCs. Associations between the phase-shift features and three-dimensional morphological features were revealed. CONCLUSIONS The implemented Mask R-CNN model accurately identified tRBCs and segmented RBCs to provide single-RBC characterization, which has the potential to aid clinical decision-making.
Collapse
Affiliation(s)
- Yang-Hsien Lin
- National Taiwan University, Graduate Institute of Biomedical Electronics and Bioinformatics, Taipei, Taiwan
| | - Ken Y.-K. Liao
- Feng Chia University, College of Information and Electrical Engineering, Taichung, Taiwan
| | - Kung-Bin Sung
- National Taiwan University, Graduate Institute of Biomedical Electronics and Bioinformatics, Taipei, Taiwan
- National Taiwan University, Department of Electrical Engineering, Taipei, Taiwan
- National Taiwan University, Molecular Imaging Center, Taipei, Taiwan
| |
Collapse
|