1
|
Kuo JC, Chan W, Leon-Novelo L, Lairson DR, Brown A, Fujimoto K. Latent classification model for censored longitudinal binary outcome. Stat Med 2024; 43:3943-3957. [PMID: 38951953 DOI: 10.1002/sim.10156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 04/23/2024] [Accepted: 06/10/2024] [Indexed: 07/03/2024]
Abstract
Latent classification model is a class of statistical methods for identifying unobserved class membership among the study samples using some observed data. In this study, we proposed a latent classification model that takes a censored longitudinal binary outcome variable and uses its changing pattern over time to predict individuals' latent class membership. Assuming the time-dependent outcome variables follow a continuous-time Markov chain, the proposed method has two primary goals: (1) estimate the distribution of the latent classes and predict individuals' class membership, and (2) estimate the class-specific transition rates and rate ratios. To assess the model's performance, we conducted a simulation study and verified that our algorithm produces accurate model estimates (ie, small bias) with reasonable confidence intervals (ie, achieving approximately 95% coverage probability). Furthermore, we compared our model to four other existing latent class models and demonstrated that our approach yields higher prediction accuracies for latent classes. We applied our proposed method to analyze the COVID-19 data in Houston, Texas, US collected between January first 2021 and December 31st 2021. Early reports on the COVID-19 pandemic showed that the severity of a SARS-CoV-2 infection tends to vary greatly by cases. We found that while demographic characteristics explain some of the differences in individuals' experience with COVID-19, some unaccounted-for latent variables were associated with the disease.
Collapse
Affiliation(s)
- Jacky C Kuo
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Wenyaw Chan
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Luis Leon-Novelo
- Department of Biostatistics and Data Science, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - David R Lairson
- Department of Management, Policy and Community Health, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Armand Brown
- Bureau of Epidemiology, Houston Health Department, Houston, Texas, USA
| | - Kayo Fujimoto
- Department of Health Promotion and Behaviroal Sciences, University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
2
|
Li S, Yi H, Leng Q, Wu Y, Mao Y. New perspectives on cancer clinical research in the era of big data and machine learning. Surg Oncol 2024; 52:102009. [PMID: 38215544 DOI: 10.1016/j.suronc.2023.102009] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 10/16/2023] [Indexed: 01/14/2024]
Abstract
In the 21st century, the development of medical science has entered the era of big data, and machine learning has become an essential tool for mining medical big data. The establishment of the SEER database has provided a wealth of epidemiological data for cancer clinical research, and the number of studies based on SEER and machine learning has been growing in recent years. This article reviews recent research based on SEER and machine learning and finds that the current focus of such studies is primarily on the development and validation of models using machine learning algorithms, with the main directions being lymph node metastasis prediction, distant metastasis prediction, and prognosis-related research. Compared to traditional models, machine learning algorithms have the advantage of stronger adaptability, but also suffer from disadvantages such as overfitting and poor interpretability, which need to be weighed in practical applications. At present, machine learning algorithms, as the foundation of artificial intelligence, have just begun to emerge in the field of cancer clinical research. The future development of oncology will enter a more precise era of cancer research, characterized by larger data, higher dimensions, and more frequent information exchange. Machine learning is bound to shine brightly in this field.
Collapse
Affiliation(s)
- Shujun Li
- Department of Hematology, Xiangya Hospital, Central South University, Changsha, 410008, China; National Clinical Research Center for Geriatric Diseases (Xiangya Hospital), China; Hunan Hematology Oncology Clinical Medical Research Center, China
| | - Hang Yi
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Qihao Leng
- Xiangya School of Medicine, Central South University, Changsha, 410013, Hunan Province, China
| | - You Wu
- Institute for Hospital Management, School of Medicine, Tsinghua University, 30 Shuangqing Rd, Haidian District, Beijing, China; Department of Health Policy and Management, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, 21205, USA.
| | - Yousheng Mao
- Department of Thoracic Surgery, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China.
| |
Collapse
|
3
|
Altuhaifa FA, Win KT, Su G. Predicting lung cancer survival based on clinical data using machine learning: A review. Comput Biol Med 2023; 165:107338. [PMID: 37625260 DOI: 10.1016/j.compbiomed.2023.107338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 07/31/2023] [Accepted: 08/07/2023] [Indexed: 08/27/2023]
Abstract
Machine learning has gained popularity in predicting survival time in the medical field. This review examines studies utilizing machine learning and data-mining techniques to predict lung cancer survival using clinical data. A systematic literature review searched MEDLINE, Scopus, and Google Scholar databases, following reporting guidelines and using the COVIDENCE system. Studies published from 2000 to 2023 employing machine learning for lung cancer survival prediction were included. Risk of bias assessment used the prediction model risk of bias assessment tool. Thirty studies were reviewed, with 13 (43.3%) using the surveillance, epidemiology, and end results database. Missing data handling was addressed in 12 (40%) studies, primarily through data transformation and conversion. Feature selection algorithms were used in 19 (63.3%) studies, with age, sex, and N stage being the most chosen features. Random forest was the predominant machine learning model, used in 17 (56.6%) studies. While the number of lung cancer survival prediction studies is limited, the use of machine learning models based on clinical data has grown since 2012. Consideration of diverse patient cohorts and data pre-processing are crucial. Notably, most studies did not account for missing data, normalization, scaling, or standardized data, potentially introducing bias. Therefore, a comprehensive study on lung cancer survival prediction using clinical data is needed, addressing these challenges.
Collapse
Affiliation(s)
- Fatimah Abdulazim Altuhaifa
- School of Computing and Information Technology, University of Wollongong, NSW, 2500, Australia; Saudi Arabia Ministry of Higher Education, Riyadh, Saudi Arabia.
| | - Khin Than Win
- School of Computing and Information Technology, University of Wollongong, NSW, 2500, Australia
| | - Guoxin Su
- School of Computing and Information Technology, University of Wollongong, NSW, 2500, Australia
| |
Collapse
|
4
|
P D, C G. A systematic review on machine learning and deep learning techniques in cancer survival prediction. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2022; 174:62-71. [PMID: 35933043 DOI: 10.1016/j.pbiomolbio.2022.07.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/13/2022] [Accepted: 07/19/2022] [Indexed: 06/15/2023]
Abstract
Cancer is a disease which is characterised by the unusual and uncontrollable growth of body cells. This usually happens asymptomatically and gets spread to other parts of the body. The major problem in treating cancer is that its progress is not monitored once it is diagnosed. The progress or the prognosis can be done through survival analysis. The survival analysis is the branch of statistics that deals in predicting the time of event of occurrence. In the case of cancer prognosis the event is the survival time of the patient from the onset of the disease or it can be the recurrence of the disease after undergoing a treatment. This study aims to bring out the machine learning and deep learning models involved in providing the prognosis to the cancer patients.
Collapse
Affiliation(s)
- Deepa P
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Gunavathi C
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India.
| |
Collapse
|
5
|
Marzano L, Darwich AS, Tendler S, Dan A, Lewensohn R, De Petris L, Raghothama J, Meijer S. A novel analytical framework for risk stratification of real-world data using machine learning: A small cell lung cancer study. Clin Transl Sci 2022; 15:2437-2447. [PMID: 35856401 PMCID: PMC9579402 DOI: 10.1111/cts.13371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 06/26/2022] [Accepted: 07/08/2022] [Indexed: 01/25/2023] Open
Abstract
In recent studies, small cell lung cancer (SCLC) treatment guidelines based on Veterans' Administration Lung Study Group limited/extensive disease staging and resulted in broad and inseparable prognostic subgroups. Evidence suggests that the eight versions of tumor, node, and metastasis (TNM) staging can play an important role to address this issue. The aim of the present study was to improve the detection of prognostic subgroups from a real-word data (RWD) cohort of patients and analyze their patterns using a development pipeline with thoracic oncologists and machine learning methods. The method detected subgroups of patients informing unsupervised learning (partition around medoids) including the impact of covariates on prognosis (Cox regression and random survival forest). An analysis was carried out using patients with SCLC (n = 636) with stage IIIA-IVB according to TNM classification. The analysis yielded k = 7 compacted and well-separated clusters of patients. Performance status (Eastern Cooperative Oncology Group-Performance Status), lactate dehydrogenase, spreading of metastasis, cancer stage, and CRP were the baselines that characterized the subgroups. The selected clustering method outperformed standard clustering techniques, which were not capable of detecting meaningful subgroups. From the analysis of cluster treatment decisions, we showed the potential of future RWD applications to understand disease, develop individualized therapies, and improve healthcare decision making.
Collapse
Affiliation(s)
- Luca Marzano
- Division of Health Informatics and LogisticsSchool of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), KTH Royal Institute of TechnologyHuddingeSweden
| | - Adam S. Darwich
- Division of Health Informatics and LogisticsSchool of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), KTH Royal Institute of TechnologyHuddingeSweden
| | - Salomon Tendler
- Department of Oncology‐PathologyKarolinska Institutet and the Thoracic Oncology Center, Karolinska University HospitalStockholmSweden
| | - Asaf Dan
- Department of Oncology‐PathologyKarolinska Institutet and the Thoracic Oncology Center, Karolinska University HospitalStockholmSweden
| | - Rolf Lewensohn
- Department of Oncology‐PathologyKarolinska Institutet and the Thoracic Oncology Center, Karolinska University HospitalStockholmSweden
| | - Luigi De Petris
- Department of Oncology‐PathologyKarolinska Institutet and the Thoracic Oncology Center, Karolinska University HospitalStockholmSweden
| | - Jayanth Raghothama
- Division of Health Informatics and LogisticsSchool of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), KTH Royal Institute of TechnologyHuddingeSweden
| | - Sebastiaan Meijer
- Division of Health Informatics and LogisticsSchool of Engineering Sciences in Chemistry, Biotechnology and Health (CBH), KTH Royal Institute of TechnologyHuddingeSweden
| |
Collapse
|
6
|
Sedighi-Maman Z, Heath JJ. An Interpretable Two-Phase Modeling Approach for Lung Cancer Survivability Prediction. SENSORS (BASEL, SWITZERLAND) 2022; 22:6783. [PMID: 36146145 PMCID: PMC9503480 DOI: 10.3390/s22186783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Revised: 08/28/2022] [Accepted: 09/05/2022] [Indexed: 06/16/2023]
Abstract
Although lung cancer survival status and survival length predictions have primarily been studied individually, a scheme that leverages both fields in an interpretable way for physicians remains elusive. We propose a two-phase data analytic framework that is capable of classifying survival status for 0.5-, 1-, 1.5-, 2-, 2.5-, and 3-year time-points (phase I) and predicting the number of survival months within 3 years (phase II) using recent Surveillance, Epidemiology, and End Results data from 2010 to 2017. In this study, we employ three analytical models (general linear model, extreme gradient boosting, and artificial neural networks), five data balancing techniques (synthetic minority oversampling technique (SMOTE), relocating safe level SMOTE, borderline SMOTE, adaptive synthetic sampling, and majority weighted minority oversampling technique), two feature selection methods (least absolute shrinkage and selection operator (LASSO) and random forest), and the one-hot encoding approach. By implementing a comprehensive data preparation phase, we demonstrate that a computationally efficient and interpretable method such as GLM performs comparably to more complex models. Moreover, we quantify the effects of individual features in phase I and II by exploiting GLM coefficients. To the best of our knowledge, this study is the first to (a) implement a comprehensive data processing approach to develop performant, computationally efficient, and interpretable methods in comparison to black-box models, (b) visualize top factors impacting survival odds by utilizing the change in odds ratio, and (c) comprehensively explore short-term lung cancer survival using a two-phase approach.
Collapse
Affiliation(s)
- Zahra Sedighi-Maman
- Robert B. Willumstad School of Business, Adelphi University, Garden City, NY 11530, USA
| | - Jonathan J. Heath
- McDonough School of Business, Georgetown University, Washington, DC 20057, USA
| |
Collapse
|
7
|
Clustering based lung lobe segmentation and optimization based lung cancer classification using CT images. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
8
|
Parimbelli E, Wilk S, Cornet R, Sniatala P, Sniatala K, Glaser SLC, Fraterman I, Boekhout AH, Ottaviano M, Peleg M. A review of AI and Data Science support for cancer management. Artif Intell Med 2021; 117:102111. [PMID: 34127240 DOI: 10.1016/j.artmed.2021.102111] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 12/23/2020] [Accepted: 05/11/2021] [Indexed: 02/09/2023]
Abstract
INTRODUCTION Thanks to improvement of care, cancer has become a chronic condition. But due to the toxicity of treatment, the importance of supporting the quality of life (QoL) of cancer patients increases. Monitoring and managing QoL relies on data collected by the patient in his/her home environment, its integration, and its analysis, which supports personalization of cancer management recommendations. We review the state-of-the-art of computerized systems that employ AI and Data Science methods to monitor the health status and provide support to cancer patients managed at home. OBJECTIVE Our main objective is to analyze the literature to identify open research challenges that a novel decision support system for cancer patients and clinicians will need to address, point to potential solutions, and provide a list of established best-practices to adopt. METHODS We designed a review study, in compliance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, analyzing studies retrieved from PubMed related to monitoring cancer patients in their home environments via sensors and self-reporting: what data is collected, what are the techniques used to collect data, semantically integrate it, infer the patient's state from it and deliver coaching/behavior change interventions. RESULTS Starting from an initial corpus of 819 unique articles, a total of 180 papers were considered in the full-text analysis and 109 were finally included in the review. Our findings are organized and presented in four main sub-topics consisting of data collection, data integration, predictive modeling and patient coaching. CONCLUSION Development of modern decision support systems for cancer needs to utilize best practices like the use of validated electronic questionnaires for quality-of-life assessment, adoption of appropriate information modeling standards supplemented by terminologies/ontologies, adherence to FAIR data principles, external validation, stratification of patients in subgroups for better predictive modeling, and adoption of formal behavior change theories. Open research challenges include supporting emotional and social dimensions of well-being, including PROs in predictive modeling, and providing better customization of behavioral interventions for the specific population of cancer patients.
Collapse
Affiliation(s)
| | - S Wilk
- Poznan University of Technology, Poland
| | - R Cornet
- Amsterdam University Medical Centre, the Netherlands
| | | | | | - S L C Glaser
- Amsterdam University Medical Centre, the Netherlands
| | - I Fraterman
- Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - A H Boekhout
- Netherlands Cancer Institute, Amsterdam, the Netherlands
| | | | | |
Collapse
|
9
|
Banerjee A, Chen S, Fatemifar G, Zeina M, Lumbers RT, Mielke J, Gill S, Kotecha D, Freitag DF, Denaxas S, Hemingway H. Machine learning for subtype definition and risk prediction in heart failure, acute coronary syndromes and atrial fibrillation: systematic review of validity and clinical utility. BMC Med 2021; 19:85. [PMID: 33820530 PMCID: PMC8022365 DOI: 10.1186/s12916-021-01940-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 02/12/2021] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Machine learning (ML) is increasingly used in research for subtype definition and risk prediction, particularly in cardiovascular diseases. No existing ML models are routinely used for cardiovascular disease management, and their phase of clinical utility is unknown, partly due to a lack of clear criteria. We evaluated ML for subtype definition and risk prediction in heart failure (HF), acute coronary syndromes (ACS) and atrial fibrillation (AF). METHODS For ML studies of subtype definition and risk prediction, we conducted a systematic review in HF, ACS and AF, using PubMed, MEDLINE and Web of Science from January 2000 until December 2019. By adapting published criteria for diagnostic and prognostic studies, we developed a seven-domain, ML-specific checklist. RESULTS Of 5918 studies identified, 97 were included. Across studies for subtype definition (n = 40) and risk prediction (n = 57), there was variation in data source, population size (median 606 and median 6769), clinical setting (outpatient, inpatient, different departments), number of covariates (median 19 and median 48) and ML methods. All studies were single disease, most were North American (n = 61/97) and only 14 studies combined definition and risk prediction. Subtype definition and risk prediction studies respectively had limitations in development (e.g. 15.0% and 78.9% of studies related to patient benefit; 15.0% and 15.8% had low patient selection bias), validation (12.5% and 5.3% externally validated) and impact (32.5% and 91.2% improved outcome prediction; no effectiveness or cost-effectiveness evaluations). CONCLUSIONS Studies of ML in HF, ACS and AF are limited by number and type of included covariates, ML methods, population size, country, clinical setting and focus on single diseases, not overlap or multimorbidity. Clinical utility and implementation rely on improvements in development, validation and impact, facilitated by simple checklists. We provide clear steps prior to safe implementation of machine learning in clinical practice for cardiovascular diseases and other disease areas.
Collapse
Affiliation(s)
- Amitava Banerjee
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK.
- Health Data Research UK, University College London, London, UK.
- University College London Hospitals NHS Trust, 235 Euston Road, London, UK.
- Barts Health NHS Trust, The Royal London Hospital, Whitechapel Rd, London, UK.
| | - Suliang Chen
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK, University College London, London, UK
| | - Ghazaleh Fatemifar
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK, University College London, London, UK
| | | | - R Thomas Lumbers
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK, University College London, London, UK
- University College London Hospitals NHS Trust, 235 Euston Road, London, UK
| | - Johanna Mielke
- Bayer AG, Division Pharmaceuticals, Open Innovation & Digital Technologies, Wuppertal, Germany
| | - Simrat Gill
- University of Birmingham Institute of Cardiovascular Sciences and University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Dipak Kotecha
- University of Birmingham Institute of Cardiovascular Sciences and University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
- Department of Cardiology, University Medical Centre Utrecht, Utrecht, the Netherlands
| | - Daniel F Freitag
- Bayer AG, Division Pharmaceuticals, Open Innovation & Digital Technologies, Wuppertal, Germany
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK, University College London, London, UK
- The Alan Turing Institute, London, UK
| | - Harry Hemingway
- Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK
- Health Data Research UK, University College London, London, UK
- University College London Hospitals Biomedical Research Centre (UCLH BRC), London, UK
| |
Collapse
|
10
|
Barragán-Montero A, Javaid U, Valdés G, Nguyen D, Desbordes P, Macq B, Willems S, Vandewinckele L, Holmström M, Löfman F, Michiels S, Souris K, Sterpin E, Lee JA. Artificial intelligence and machine learning for medical imaging: A technology review. Phys Med 2021; 83:242-256. [PMID: 33979715 PMCID: PMC8184621 DOI: 10.1016/j.ejmp.2021.04.016] [Citation(s) in RCA: 134] [Impact Index Per Article: 33.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Revised: 04/15/2021] [Accepted: 04/18/2021] [Indexed: 02/08/2023] Open
Abstract
Artificial intelligence (AI) has recently become a very popular buzzword, as a consequence of disruptive technical advances and impressive experimental results, notably in the field of image analysis and processing. In medicine, specialties where images are central, like radiology, pathology or oncology, have seized the opportunity and considerable efforts in research and development have been deployed to transfer the potential of AI to clinical applications. With AI becoming a more mainstream tool for typical medical imaging analysis tasks, such as diagnosis, segmentation, or classification, the key for a safe and efficient use of clinical AI applications relies, in part, on informed practitioners. The aim of this review is to present the basic technological pillars of AI, together with the state-of-the-art machine learning methods and their application to medical imaging. In addition, we discuss the new trends and future research directions. This will help the reader to understand how AI methods are now becoming an ubiquitous tool in any medical image analysis workflow and pave the way for the clinical implementation of AI-based solutions.
Collapse
Affiliation(s)
- Ana Barragán-Montero
- Molecular Imaging, Radiation and Oncology (MIRO) Laboratory, UCLouvain, Belgium.
| | - Umair Javaid
- Molecular Imaging, Radiation and Oncology (MIRO) Laboratory, UCLouvain, Belgium
| | - Gilmer Valdés
- Department of Radiation Oncology, Department of Epidemiology and Biostatistics, University of California, San Francisco, USA
| | - Dan Nguyen
- Medical Artificial Intelligence and Automation (MAIA) Laboratory, Department of Radiation Oncology, UT Southwestern Medical Center, USA
| | - Paul Desbordes
- Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), UCLouvain, Belgium
| | - Benoit Macq
- Information and Communication Technologies, Electronics and Applied Mathematics (ICTEAM), UCLouvain, Belgium
| | - Siri Willems
- ESAT/PSI, KU Leuven Belgium & MIRC, UZ Leuven, Belgium
| | | | | | | | - Steven Michiels
- Molecular Imaging, Radiation and Oncology (MIRO) Laboratory, UCLouvain, Belgium
| | - Kevin Souris
- Molecular Imaging, Radiation and Oncology (MIRO) Laboratory, UCLouvain, Belgium
| | - Edmond Sterpin
- Molecular Imaging, Radiation and Oncology (MIRO) Laboratory, UCLouvain, Belgium; KU Leuven, Department of Oncology, Laboratory of Experimental Radiotherapy, Belgium
| | - John A Lee
- Molecular Imaging, Radiation and Oncology (MIRO) Laboratory, UCLouvain, Belgium
| |
Collapse
|
11
|
Deng F, Shen L, Wang H, Zhang L. Classify multicategory outcome in patients with lung adenocarcinoma using clinical, transcriptomic and clinico-transcriptomic data: machine learning versus multinomial models. Am J Cancer Res 2020; 10:4624-4639. [PMID: 33415023 PMCID: PMC7783755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 11/25/2020] [Indexed: 06/12/2023] Open
Abstract
Classification of multicategory survival-outcome is important for precision oncology. Machine learning (ML) algorithms have been used to accurately classify multi-category survival-outcome of some cancer-types, but not yet that of lung adenocarcinoma. Therefore, we compared the performances of 3 ML models (random forests, support vector machine [SVM], multilayer perceptron) and multinomial logistic regression (Mlogit) models for classifying 4-category survival-outcome of lung adenocarcinoma using the TCGA. Mlogit model overall performed similar to SVM and multilayer perceptron models (micro-average area under curve=0.82), while random forests model was inferior. Surprisingly, transcriptomic data alone and clinico-transcriptomic data appeared sufficient to accurately classify the 4-category survival-outcome in these patients, but no models using clinical data alone performed well. Notably, NDUFS5, P2RY2, PRPF18, CCL24, ZNF813, MYL6, FLJ41941, POU5F1B, and SUV420H1 were the top-ranked genes that were associated with alive without disease and inversely linked to other outcomes. Similarly, BDKRB2, TERC, DNAJA3, MRPL15, SLC16A13, CRHBP and ACSBG2 were associated with alive with progression and GAL3ST3, AD2, RAB41, HDC, and PLEKHG1 associated with dead with disease, respectively, while also inversely linked other outcomes. These cross-linked genes may be used for risk-stratification and future treatment development.
Collapse
Affiliation(s)
- Fei Deng
- School of Electrical and Electronic Engineering, Shanghai Institute of TechnologyShanghai, China
| | - Lanlan Shen
- Department of Pediatrics, Baylor College of Medicine, USDA/ARS Children’s Nutrition Research CenterHouston, TX, USA
| | - He Wang
- Department of Pathology, Yale University School of MedicineNew Haven, CT, USA
| | - Lanjing Zhang
- Department of Pathology, Princeton Medical CenterPlainsboro, NJ, USA
- Department of Biological Sciences, Rutgers UniversityNewark, NJ
- Rutgers Cancer Institute of New JerseyNew Brunswick, NJ, USA
- Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers UniversityPiscataway, NJ, USA
| |
Collapse
|
12
|
Bertsimas D, Wiberg H. Machine Learning in Oncology: Methods, Applications, and Challenges. JCO Clin Cancer Inform 2020; 4:885-894. [PMID: 33058693 PMCID: PMC7608565 DOI: 10.1200/cci.20.00072] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/26/2020] [Indexed: 01/16/2023] Open
Affiliation(s)
- Dimitris Bertsimas
- Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA
- Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA
| | - Holly Wiberg
- Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA
| |
Collapse
|
13
|
Abstract
Machine learning is a set of techniques that promise to greatly enhance our data-processing capability. In the field of oncology, ML presents itself with a wealth of possible applications to the research and the clinical context, such as automated diagnosis and precise treatment modulation. In this paper, we will review the principal applications of ML techniques in oncology and explore in detail how they work. This will allow us to discuss the issues and challenges that ML faces in this field, and ultimately gain a greater understanding of ML techniques and how they can improve oncological research and practice.
Collapse
Affiliation(s)
- Cecilia Nardini
- European School of Molecular Medicine (SEMM), 20139 Milan, Italy
| |
Collapse
|
14
|
Chamseddine IM, Frieboes HB, Kokkolaras M. Multi-objective optimization of tumor response to drug release from vasculature-bound nanoparticles. Sci Rep 2020; 10:8294. [PMID: 32427977 PMCID: PMC7237449 DOI: 10.1038/s41598-020-65162-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 04/26/2020] [Indexed: 12/31/2022] Open
Abstract
The pharmacokinetics of nanoparticle-borne drugs targeting tumors depends critically on nanoparticle design. Empirical approaches to evaluate such designs in order to maximize treatment efficacy are time- and cost-intensive. We have recently proposed the use of computational modeling of nanoparticle-mediated drug delivery targeting tumor vasculature coupled with numerical optimization to pursue optimal nanoparticle targeting and tumor uptake. Here, we build upon these studies to evaluate the effect of tumor size on optimal nanoparticle design by considering a cohort of heterogeneously-sized tumor lesions, as would be clinically expected. The results indicate that smaller nanoparticles yield higher tumor targeting and lesion regression for larger-sized tumors. We then augment the nanoparticle design optimization problem by considering drug diffusivity, which yields a two-fold tumor size decrease compared to optimizing nanoparticles without this consideration. We quantify the tradeoff between tumor targeting and size decrease using bi-objective optimization, and generate five Pareto-optimal nanoparticle designs. The results provide a spectrum of treatment outcomes - considering tumor targeting vs. antitumor effect - with the goal to enable therapy customization based on clinical need. This approach could be extended to other nanoparticle-based cancer therapies, and support the development of personalized nanomedicine in the longer term.
Collapse
Affiliation(s)
- Ibrahim M Chamseddine
- Deparment of Integrated Mathematical Oncology, Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Hermann B Frieboes
- Department of Bioengineering, University of Louisville, Louisville, KY, USA.
- James Graham Brown Cancer Center, University of Louisville, Louisville, KY, USA.
- Center for Predictive Medicine, University of Louisville, Louisville, KY, USA.
| | - Michael Kokkolaras
- Department of Mechanical Engineering, McGill University, Montreal, Quebec, Canada.
- GERAD - Group for Research in Decision Analysis, Montreal, Quebec, Canada.
| |
Collapse
|
15
|
Luo Y, Chen S, Valdes G. Machine learning for radiation outcome modeling and prediction. Med Phys 2020; 47:e178-e184. [DOI: 10.1002/mp.13570] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 03/26/2019] [Accepted: 04/09/2019] [Indexed: 12/18/2022] Open
Affiliation(s)
- Yi Luo
- Department of Radiation Oncology University of Michigan Ann Arbor MI 48103USA
| | - Shifeng Chen
- Department of Radiation Oncology University of Maryland School of Medicine Baltimore MD 21201USA
| | - Gilmer Valdes
- Department of Radiation Oncology University of California San Francisco CA 94158USA
| |
Collapse
|
16
|
Moreau JT, Hankinson TC, Baillet S, Dudley RWR. Individual-patient prediction of meningioma malignancy and survival using the Surveillance, Epidemiology, and End Results database. NPJ Digit Med 2020; 3:12. [PMID: 32025573 PMCID: PMC6992687 DOI: 10.1038/s41746-020-0219-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 01/10/2020] [Indexed: 01/17/2023] Open
Abstract
Meningiomas are known to have relatively lower aggressiveness and better outcomes than other central nervous system (CNS) tumors. However, there is considerable overlap between clinical and radiological features characterizing benign, atypical, and malignant tumors. In this study, we developed methods and a practical app designed to assist with the diagnosis and prognosis of meningiomas. Statistical learning models were trained and validated on 62,844 patients from the Surveillance, Epidemiology, and End Results database. We used balanced logistic regression-random forest ensemble classifiers and proportional hazards models to learn multivariate patterns of association between malignancy, survival, and a series of basic clinical variables-such as tumor size, location, and surgical procedure. We demonstrate that our models are capable of predicting meaningful individual-specific clinical outcome variables and show good generalizability across 16 SEER registries. A free smartphone and web application is provided for readers to access and test the predictive models (www.meningioma.app). Future model improvements and prospective replication will be necessary to demonstrate true clinical utility. Rather than being used in isolation, we expect that the proposed models will be integrated into larger and more comprehensive models that integrate imaging and molecular biomarkers. Whether for meningiomas or other tumors of the CNS, the power of these methods to make individual-patient predictions could lead to improved diagnosis, patient counseling, and outcomes.
Collapse
Affiliation(s)
- Jeremy T. Moreau
- McConnell Brain Imaging Centre, Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, QC Canada
- Department of Pediatric Surgery, Division of Neurosurgery, Montreal Children’s Hospital, Montreal, QC Canada
| | - Todd C. Hankinson
- Department of Pediatric Neurosurgery, Children’s Hospital Colorado, University of Colorado Anschutz Medical Campus, Aurora, CO USA
- Morgan Adams Foundation Pediatric Brain Tumor Research Program, Aurora, CO USA
| | - Sylvain Baillet
- McConnell Brain Imaging Centre, Department of Neurology and Neurosurgery, Montreal Neurological Institute, McGill University, Montreal, QC Canada
| | - Roy W. R. Dudley
- Department of Pediatric Surgery, Division of Neurosurgery, Montreal Children’s Hospital, Montreal, QC Canada
| |
Collapse
|
17
|
A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science. UNSUPERVISED AND SEMI-SUPERVISED LEARNING 2020. [DOI: 10.1007/978-3-030-22475-2_1] [Citation(s) in RCA: 110] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
18
|
Bartholomai JA, Frieboes HB. Lung Cancer Survival Prediction via Machine Learning Regression, Classification, and Statistical Techniques. PROCEEDINGS OF THE ... IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY. IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY 2019; 2018:632-637. [PMID: 31312809 DOI: 10.1109/isspit.2018.8642753] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A regression model is developed to predict survival time in months for lung cancer patients. It was previously shown that predictive models perform accurately for short survival times of less than 6 months; however, model accuracy is reduced when attempting to predict longer survival times. This study employs an approach for which regression models are used in combination with a classification model to predict survival time. A set of de-identified lung cancer patient data was obtained from the Surveillance, Epidemiology, and End Results (SEER) database. The models use a subset of factors selected by ANOVA. Model accuracy is measured by a confusion matrix for classification and by Root Mean Square Error (RMSE) for regression. Random Forests are used for classification, while general Linear Regression, Gradient Boosted Machines (GBM), and Random Forests are used for regression. The regression results show that RF had the best performance for survival times ≤6 and >24 months (RMSE 10.52 and 20.51, respectively), while GBM performed best for 7-24 months (RMSE 15.65). Comparison plots of the results further indicate that the regression models perform better for shorter survival times than the RMSE values are able to reflect.
Collapse
|
19
|
Song Y, Gao S, Tan W, Qiu Z, Zhou H, Zhao Y. Multiple Machine Learnings Revealed Similar Predictive Accuracy for Prognosis of PNETs from the Surveillance, Epidemiology, and End Result Database. J Cancer 2018; 9:3971-3978. [PMID: 30410601 PMCID: PMC6218767 DOI: 10.7150/jca.26649] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 08/15/2018] [Indexed: 01/16/2023] Open
Abstract
Background: Prognosis prediction is indispensable in clinical practice and machine learning has been proved to be helpful. We expected to predict survival of pancreatic neuroendocrine tumors (PNETs) with machine learning, and compared it with the American Joint Committee on Cancer (AJCC) staging system. Methods: Data of PNETs cases were extracted from The Surveillance, Epidemiology, and End Result (SEER) database. Statistic description, multivariate survival analysis and preprocessing were done before machine learning. Four different algorithms (logistic regression (LR), support vector machines (SVM), random forest (RF) and deep learning (DL)) were used to train the model. We used proper imputations to manage missing data in the database and sensitive analysis was performed to evaluate the imputation. The model with the best predictive accuracy was compared with the AJCC staging system using the SEER cases. Results: The four models had similar predictive accuracy with no significant difference existed (p = 0.664). The DL model showed a slightly better predictive accuracy than others (81.6% (± 1.9%)), thus it was used for further comparison with the AJCC staging system and revealed a better performance for PNETs cases in SEER database (Area under receiver operating characteristic curve: 0.87 vs 0.76). The validity of missing data imputation was supported by sensitivity analysis. Conclusions: The models developed with machine learning performed well in survival prediction of PNETs, and the DL model have a better accuracy and specificity than the AJCC staging system in SEER data. The DL model has potential for clinical application but external validation is needed.
Collapse
Affiliation(s)
- Yiyan Song
- Department of General Surgery, Guangdong Second Provincial General Hospital, Guangzhou, China.,Department of Anesthesia, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Shaowei Gao
- Department of Anesthesia, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Wulin Tan
- Department of Anesthesia, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Zeting Qiu
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Huaqiang Zhou
- Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Yue Zhao
- Department of General Surgery, Guangdong Second Provincial General Hospital, Guangzhou, China
| |
Collapse
|
20
|
Gao S, Mutter S, Casey A, Mäkinen VP. Numero: a statistical framework to define multivariable subgroups in complex population-based datasets. Int J Epidemiol 2018; 48:369-374. [DOI: 10.1093/ije/dyy113] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Accepted: 05/28/2018] [Indexed: 12/11/2022] Open
Affiliation(s)
- Song Gao
- Heart Health Theme, South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Stefan Mutter
- Heart Health Theme, South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Aaron Casey
- Heart Health Theme, South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Ville-Petteri Mäkinen
- Heart Health Theme, South Australian Health and Medical Research Institute, Adelaide, SA, Australia
- School of Biological Sciences, University of Adelaide, Adelaide, SA, Australia
- Computational Medicine, University of Oulu and Biocenter Oulu, Oulu, Finland
| |
Collapse
|
21
|
Meyer P, Noblet V, Mazzara C, Lallement A. Survey on deep learning for radiotherapy. Comput Biol Med 2018; 98:126-146. [PMID: 29787940 DOI: 10.1016/j.compbiomed.2018.05.018] [Citation(s) in RCA: 170] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Revised: 05/15/2018] [Accepted: 05/15/2018] [Indexed: 12/17/2022]
Abstract
More than 50% of cancer patients are treated with radiotherapy, either exclusively or in combination with other methods. The planning and delivery of radiotherapy treatment is a complex process, but can now be greatly facilitated by artificial intelligence technology. Deep learning is the fastest-growing field in artificial intelligence and has been successfully used in recent years in many domains, including medicine. In this article, we first explain the concept of deep learning, addressing it in the broader context of machine learning. The most common network architectures are presented, with a more specific focus on convolutional neural networks. We then present a review of the published works on deep learning methods that can be applied to radiotherapy, which are classified into seven categories related to the patient workflow, and can provide some insights of potential future applications. We have attempted to make this paper accessible to both radiotherapy and deep learning communities, and hope that it will inspire new collaborations between these two communities to develop dedicated radiotherapy applications.
Collapse
Affiliation(s)
- Philippe Meyer
- Department of Medical Physics, Paul Strauss Center, Strasbourg, France.
| | | | | | | |
Collapse
|
22
|
Prediction of Protein-Protein Interactions from Amino Acid Sequences Based on Continuous and Discrete Wavelet Transform Features. Molecules 2018; 23:molecules23040823. [PMID: 29617272 PMCID: PMC6017726 DOI: 10.3390/molecules23040823] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Revised: 03/25/2018] [Accepted: 03/29/2018] [Indexed: 12/12/2022] Open
Abstract
Protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of cells; thus, detecting PPIs is one of the most important issues in current molecular biology. Although much effort has been devoted to using high-throughput techniques to identify protein-protein interactions, the experimental methods are both time-consuming and costly. In addition, they yield high rates of false positive and false negative results. In addition, most of the proposed computational methods are limited in information about protein homology or the interaction marks of the protein partners. In this paper, we report a computational method only using the information from protein sequences. The main improvements come from novel protein sequence representation by combing the continuous and discrete wavelet transforms and from adopting weighted sparse representation-based classifier (WSRC). The proposed method was used to predict PPIs from three different datasets: yeast, human and H. pylori. In addition, we employed the prediction model trained on the PPIs dataset of yeast to predict the PPIs of six datasets of other species. To further evaluate the performance of the prediction model, we compared WSRC with the state-of-the-art support vector machine classifier. When predicting PPIs of yeast, humans and H. pylori dataset, we obtained high average prediction accuracies of 97.38%, 98.92% and 93.93% respectively. In the cross-species experiments, most of the prediction accuracies are over 94%. These promising results show that the proposed method is indeed capable of obtaining higher performance in PPIs detection.
Collapse
|
23
|
Lynch CM, Abdollahi B, Fuqua JD, de Carlo AR, Bartholomai JA, Balgemann RN, van Berkel VH, Frieboes HB. Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int J Med Inform 2017; 108:1-8. [PMID: 29132615 DOI: 10.1016/j.ijmedinf.2017.09.013] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 08/29/2017] [Accepted: 09/23/2017] [Indexed: 12/20/2022]
Abstract
Outcomes for cancer patients have been previously estimated by applying various machine learning techniques to large datasets such as the Surveillance, Epidemiology, and End Results (SEER) program database. In particular for lung cancer, it is not well understood which types of techniques would yield more predictive information, and which data attributes should be used in order to determine this information. In this study, a number of supervised learning techniques is applied to the SEER database to classify lung cancer patients in terms of survival, including linear regression, Decision Trees, Gradient Boosting Machines (GBM), Support Vector Machines (SVM), and a custom ensemble. Key data attributes in applying these methods include tumor grade, tumor size, gender, age, stage, and number of primaries, with the goal to enable comparison of predictive power between the various methods The prediction is treated like a continuous target, rather than a classification into categories, as a first step towards improving survival prediction. The results show that the predicted values agree with actual values for low to moderate survival times, which constitute the majority of the data. The best performing technique was the custom ensemble with a Root Mean Square Error (RMSE) value of 15.05. The most influential model within the custom ensemble was GBM, while Decision Trees may be inapplicable as it had too few discrete outputs. The results further show that among the five individual models generated, the most accurate was GBM with an RMSE value of 15.32. Although SVM underperformed with an RMSE value of 15.82, statistical analysis singles the SVM as the only model that generated a distinctive output. The results of the models are consistent with a classical Cox proportional hazards model used as a reference technique. We conclude that application of these supervised learning techniques to lung cancer data in the SEER database may be of use to estimate patient survival time with the ultimate goal to inform patient care decisions, and that the performance of these techniques with this particular dataset may be on par with that of classical methods.
Collapse
Affiliation(s)
- Chip M Lynch
- Department of Computer Engineering and Computer Science, University of Louisville, KY, USA
| | - Behnaz Abdollahi
- Department of Electrical and Computer Engineering, University of Louisville, KY, USA
| | - Joshua D Fuqua
- Department of Bioengineering, University of Louisville, KY, USA
| | | | | | | | - Victor H van Berkel
- Department of Cardiovascular and Thoracic Surgery, University of Louisville, KY, USA
| | - Hermann B Frieboes
- Department of Bioengineering, University of Louisville, KY, USA; James Graham Brown Cancer Center, University of Louisville, KY, USA.
| |
Collapse
|