1
|
Tripathi A, Waqas A, Venkatesan K, Yilmaz Y, Rasool G. Building Flexible, Scalable, and Machine Learning-Ready Multimodal Oncology Datasets. SENSORS (BASEL, SWITZERLAND) 2024; 24:1634. [PMID: 38475170 DOI: 10.3390/s24051634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 01/25/2024] [Accepted: 02/28/2024] [Indexed: 03/14/2024]
Abstract
The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS)-a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS consolidates over 41,000 cases from across repositories while achieving a high compression ratio relative to the 3.78 PB source data size. It offers sub-5-s query response times for interactive exploration. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability to uncover diagnostic and prognostic insights and enable evidence-based personalized care. MINDS tracks granular end-to-end data provenance, ensuring reproducibility and transparency. The cloud-native architecture of MINDS can handle exponential data growth in a secure, cost-optimized manner while ensuring substantial storage optimization, replication avoidance, and dynamic access capabilities. Auto-scaling, access controls, and other mechanisms guarantee pipelines' scalability and security. MINDS overcomes the limitations of existing biomedical data silos via an interoperable metadata-driven approach that represents a pivotal step toward the future of oncology data integration.
Collapse
Affiliation(s)
- Aakash Tripathi
- Department of Machine Learning, Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
- Department of Electrical Engineering, University of South Florida, Tampa, FL 33620, USA
| | - Asim Waqas
- Department of Machine Learning, Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
- Department of Electrical Engineering, University of South Florida, Tampa, FL 33620, USA
| | - Kavya Venkatesan
- Department of Machine Learning, Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
| | - Yasin Yilmaz
- Department of Electrical Engineering, University of South Florida, Tampa, FL 33620, USA
| | - Ghulam Rasool
- Department of Machine Learning, Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
- Department of Electrical Engineering, University of South Florida, Tampa, FL 33620, USA
- Department of Neuro-Oncology, Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
- Department of Oncologic Sciences, University of South Florida, Tampa, FL 33612, USA
| |
Collapse
|
2
|
Kotronoulas G. Think Big (Data) in Oncology Nursing. Semin Oncol Nurs 2023; 39:151438. [PMID: 37179176 DOI: 10.1016/j.soncn.2023.151438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 04/03/2023] [Indexed: 05/15/2023]
|
3
|
Adeoye J, Akinshipo A, Koohi-Moghadam M, Thomson P, Su YX. Construction of machine learning-based models for cancer outcomes in low and lower-middle income countries: A scoping review. Front Oncol 2022; 12:976168. [DOI: 10.3389/fonc.2022.976168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 11/14/2022] [Indexed: 12/05/2022] Open
Abstract
BackgroundThe impact and utility of machine learning (ML)-based prediction tools for cancer outcomes including assistive diagnosis, risk stratification, and adjunctive decision-making have been largely described and realized in the high income and upper-middle-income countries. However, statistical projections have estimated higher cancer incidence and mortality risks in low and lower-middle-income countries (LLMICs). Therefore, this review aimed to evaluate the utilization, model construction methods, and degree of implementation of ML-based models for cancer outcomes in LLMICs.MethodsPubMed/Medline, Scopus, and Web of Science databases were searched and articles describing the use of ML-based models for cancer among local populations in LLMICs between 2002 and 2022 were included. A total of 140 articles from 22,516 citations that met the eligibility criteria were included in this study.ResultsML-based models from LLMICs were often based on traditional ML algorithms than deep or deep hybrid learning. We found that the construction of ML-based models was skewed to particular LLMICs such as India, Iran, Pakistan, and Egypt with a paucity of applications in sub-Saharan Africa. Moreover, models for breast, head and neck, and brain cancer outcomes were frequently explored. Many models were deemed suboptimal according to the Prediction model Risk of Bias Assessment tool (PROBAST) due to sample size constraints and technical flaws in ML modeling even though their performance accuracy ranged from 0.65 to 1.00. While the development and internal validation were described for all models included (n=137), only 4.4% (6/137) have been validated in independent cohorts and 0.7% (1/137) have been assessed for clinical impact and efficacy.ConclusionOverall, the application of ML for modeling cancer outcomes in LLMICs is increasing. However, model development is largely unsatisfactory. We recommend model retraining using larger sample sizes, intensified external validation practices, and increased impact assessment studies using randomized controlled trial designsSystematic review registrationhttps://www.crd.york.ac.uk/prospero/display_record.php?RecordID=308345, identifier CRD42022308345.
Collapse
|
4
|
Yang X, Mu D, Peng H, Li H, Wang Y, Wang P, Wang Y, Han S. Research and Application of Artificial Intelligence (AI) based on Electronic Health Records from Patients with Cancer: a Systematic Review (Preprint). JMIR Med Inform 2021; 10:e33799. [PMID: 35442195 PMCID: PMC9069295 DOI: 10.2196/33799] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Revised: 01/24/2022] [Accepted: 03/14/2022] [Indexed: 01/12/2023] Open
Abstract
Background With the accumulation of electronic health records and the development of artificial intelligence, patients with cancer urgently need new evidence of more personalized clinical and demographic characteristics and more sophisticated treatment and prevention strategies. However, no research has systematically analyzed the application and significance of artificial intelligence based on electronic health records in cancer care. Objective The aim of this study was to conduct a review to introduce the current state and limitations of artificial intelligence based on electronic health records of patients with cancer and to summarize the performance of artificial intelligence in mining electronic health records and its impact on cancer care. Methods Three databases were systematically searched to retrieve potentially relevant papers published from January 2009 to October 2020. Four principal reviewers assessed the quality of the papers and reviewed them for eligibility based on the inclusion criteria in the extracted data. The summary measures used in this analysis were the number and frequency of occurrence of the themes. Results Of the 1034 papers considered, 148 papers met the inclusion criteria. Cancer care, especially cancers of female organs and digestive organs, could benefit from artificial intelligence based on electronic health records through cancer emergencies and prognostic estimates, cancer diagnosis and prediction, tumor stage detection, cancer case detection, and treatment pattern recognition. The models can always achieve an area under the curve of 0.7. Ensemble methods and deep learning are on the rise. In addition, electronic medical records in the existing studies are mainly in English and from private institutional databases. Conclusions Artificial intelligence based on electronic health records performed well and could be useful for cancer care. Improving the performance of artificial intelligence can help patients receive more scientific-based and accurate treatments. There is a need for the development of new methods and electronic health record data sharing and for increased passion and support from cancer specialists.
Collapse
Affiliation(s)
- Xinyu Yang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Dongmei Mu
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Hao Peng
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Hua Li
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Ying Wang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Ping Wang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Yue Wang
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| | - Siqi Han
- Division of Clinical Research, The First Hospital of Jilin University, Changchun, China
- Department of Medical Informatics, School of Public Health, Jilin University, Changchun, China
| |
Collapse
|
5
|
Shinozaki E, Makiyama A, Kagawa Y, Satake H, Tanizawa Y, Cai Z, Piao Y. Treatment sequences of patients with advanced colorectal cancer and use of second-line FOLFIRI with antiangiogenic drugs in Japan: A retrospective observational study using an administrative database. PLoS One 2021; 16:e0246160. [PMID: 33556095 PMCID: PMC7870079 DOI: 10.1371/journal.pone.0246160] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 01/15/2021] [Indexed: 01/11/2023] Open
Abstract
The objectives were to describe treatment sequences for advanced colorectal cancer (CRC), use of second-line FOLFIRI (leucovorin, 5-fluorouracil, irinotecan) plus antiangiogenic drug (bevacizumab, ramucirumab, aflibercept beta) therapy, and the factors associated with the duration of antitumor drug treatment from second-line antiangiogenic therapy in Japan. This retrospective observational study was conducted using a Japanese hospital-based administrative database. Patients were enrolled if they started adjuvant therapy (and presumably experienced early recurrence) or first-line treatment for advanced CRC between May 2016 and July 2019, and were analysed until September 2019. Factors associated with overall treatment duration from second-line treatment with FOLFIRI plus antiangiogenic drugs were explored with multivariate Cox regression analysis. The most common first-line treatments were FOLFOX (leucovorin, 5-fluorouracil, oxaliplatin) or CAPOX (capecitabine, oxaliplatin) with bevacizumab (presumed RAS-mutant CRC) and FOLFOX with panitumumab (presumed RAS-wild type CRC). The most common second-line treatments were FOLFIRI-based. Many patients did not transition to subsequent lines of therapy. For second-line treatment, antiangiogenic drugs were prescribed more often for patients with presumed RAS-mutant CRC, right-sided CRC, and independent activities of daily living (ADL). The median duration of second-line FOLFIRI plus antiangiogenic drug treatment was 4.5 months; 66.2% of patients transitioned to third-line therapy. Low body mass index and not fully independent ADL were significantly associated with shorter overall duration of antitumor drug treatment from second-line therapy. Left-sided CRC, presumed RAS-wild type CRC, previous use of oral fluoropyrimidines and use of proteinuria qualitative tests, antihypertensives, or anticholinergics during second-line therapy were significantly associated with longer treatment. Treatment of advanced CRC in Japan is consistent with both international and Japanese guidelines, but transition rates to subsequent therapies need improvement. In addition to antitumor drug treatment, better ADL, higher body mass index, management of hypertension, and proteinuria tests were associated with continuation of sequential therapy that included antiangiogenic drugs.
Collapse
Affiliation(s)
- Eiji Shinozaki
- Gastroenterology Center, Japanese Foundation for Cancer Research, Cancer Institute Hospital, Tokyo, Japan
| | | | - Yoshinori Kagawa
- Department of Gastroenterological Surgery, Osaka General Medical Center, Osaka, Japan
| | - Hironaga Satake
- Cancer Treatment Center, Kansai Medical University Hospital, Osaka, Japan
| | - Yoshinori Tanizawa
- Medicines Development Unit-Japan, Eli Lilly Japan K.K., Kobe, Japan
- * E-mail:
| | - Zhihong Cai
- Medicines Development Unit-Japan, Eli Lilly Japan K.K., Kobe, Japan
| | - Yongzhe Piao
- Medicines Development Unit-Japan, Eli Lilly Japan K.K., Kobe, Japan
| |
Collapse
|
6
|
Mannheimer JD, Prasad A, Gustafson DL. Predicting chemosensitivity using drug perturbed gene dynamics. BMC Bioinformatics 2021; 22:15. [PMID: 33413081 PMCID: PMC7789515 DOI: 10.1186/s12859-020-03947-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 12/22/2020] [Indexed: 11/20/2022] Open
Abstract
Background One of the current directions of precision medicine is the use of computational methods to aid in the diagnosis, prognosis, and treatment of disease based on data driven approaches. For instance, in oncology, there has been a particular focus on development of algorithms and biomarkers that can be used for pre-clinical and clinical applications. In particular large-scale omics-based models to predict drug sensitivity in in vitro cancer cell line panels have been used to explore the utility and aid in the development of these models as clinical tools. Additionally, a number of web-based interfaces have been constructed for researchers to explore the potential of drug perturbed gene expression as biomarkers including the NCI Transcriptional Pharmacodynamic Workbench. In this paper we explore the influence of drug perturbed gene dynamics of the NCI Transcriptional Pharmacodynamics Workbench in computational models to predict in vitro drug sensitivity for 15 drugs on the NCI60 cell line panel. Results This work presents three main findings. First, our models show that gene expression profiles that capture changes in gene expression after 24 h of exposure to a high concentration of drug generates the most accurate predictive models compared to the expression profiles under different dosing conditions. Second, signatures of 100 genes are developed for different gene expression profiles; furthermore, when the gene signatures are applied across gene expression profiles model performance is substantially decreased when gene signatures developed using changes in gene expression are applied to non-drugged gene expression. Lastly, we show that the gene interaction networks developed on these signatures show different network topologies and can be used to inform selection of cancer relevant genes. Conclusion Our models suggest that perturbed gene signatures are predictive of drug response, but cannot be applied to predict drug response using unperturbed gene expression. Furthermore, additional drug perturbed gene expression measurements in in vitro cell lines could generate more predictive models; but, more importantly be used in conjunction with computational methods to discover important drug disease relationships.
Collapse
Affiliation(s)
- Joshua D Mannheimer
- School of Biomedical Engineering, Colorado State University, Fort Collins, CO, USA.,Flint Animal Cancer Center, Colorado State University, Fort Collins, CO, USA
| | - Ashok Prasad
- School of Biomedical Engineering, Colorado State University, Fort Collins, CO, USA.,Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO, USA
| | - Daniel L Gustafson
- School of Biomedical Engineering, Colorado State University, Fort Collins, CO, USA. .,Flint Animal Cancer Center, Colorado State University, Fort Collins, CO, USA. .,Department of Clinical Sciences, Colorado State University, Fort Collins, CO, USA. .,University of Colorado, Cancer Center Developmental Therapeutics Program, University of Colorado, Aurora, CO, USA.
| |
Collapse
|
7
|
Agrawal R, Prabakaran S. Big data in digital healthcare: lessons learnt and recommendations for general practice. Heredity (Edinb) 2020; 124:525-534. [PMID: 32139886 PMCID: PMC7080757 DOI: 10.1038/s41437-020-0303-2] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 02/25/2020] [Accepted: 02/25/2020] [Indexed: 12/31/2022] Open
Abstract
Big Data will be an integral part of the next generation of technological developments-allowing us to gain new insights from the vast quantities of data being produced by modern life. There is significant potential for the application of Big Data to healthcare, but there are still some impediments to overcome, such as fragmentation, high costs, and questions around data ownership. Envisioning a future role for Big Data within the digital healthcare context means balancing the benefits of improving patient outcomes with the potential pitfalls of increasing physician burnout due to poor implementation leading to added complexity. Oncology, the field where Big Data collection and utilization got a heard start with programs like TCGA and the Cancer Moon Shot, provides an instructive example as we see different perspectives provided by the United States (US), the United Kingdom (UK) and other nations in the implementation of Big Data in patient care with regards to their centralization and regulatory approach to data. By drawing upon global approaches, we propose recommendations for guidelines and regulations of data use in healthcare centering on the creation of a unique global patient ID that can integrate data from a variety of healthcare providers. In addition, we expand upon the topic by discussing potential pitfalls to Big Data such as the lack of diversity in Big Data research, and the security and transparency risks posed by machine learning algorithms.
Collapse
Affiliation(s)
- Raag Agrawal
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK
- Department of Biology, Columbia University, 116th and Broadway, New York, NY, 10027, USA
| | - Sudhakaran Prabakaran
- Department of Genetics, University of Cambridge, Downing Site, Cambridge, CB2 3EH, UK.
- Department of Biology, Indian Institute of Science Education and Research, Pune, Maharashtra, 411008, India.
- St Edmund's College, University of Cambridge, Cambridge, CB3 0BN, UK.
| |
Collapse
|
8
|
Mannheimer JD, Duval DL, Prasad A, Gustafson DL. A systematic analysis of genomics-based modeling approaches for prediction of drug response to cytotoxic chemotherapies. BMC Med Genomics 2019; 12:87. [PMID: 31208429 PMCID: PMC6580596 DOI: 10.1186/s12920-019-0519-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Accepted: 04/29/2019] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The availability and generation of large amounts of genomic data has led to the development of a new paradigm in cancer treatment emphasizing a precision approach at the molecular and genomic level. Statistical modeling techniques aimed at leveraging broad scale in vitro, in vivo, and clinical data for precision drug treatment has become an active area of research. As a rapidly developing discipline at the crossroads of medicine, computer science, and mathematics, techniques ranging from accepted to those on the cutting edge of artificial intelligence have been utilized. Given the diversity and complexity of these techniques a systematic understanding of fundamental modeling principles is essential to contextualize influential factors to better understand results and develop new approaches. METHODS Using data available from the Genomics of Drug Sensitivity in Cancer (GDSC) and the NCI60 we explore principle components regression, linear and non-linear support vector regression, and artificial neural networks in combination with different implementations of correlation based feature selection (CBF) on the prediction of drug response for several cytotoxic chemotherapeutic agents. RESULTS Our results indicate that the regression method and features used have marginal effects on Spearman correlation between the predicted and measured values as well as prediction error. Detailed analysis of these results reveal that the bulk relationship between tissue of origin and drug response is a major driving factor in model performance. CONCLUSION These results display one of the challenges in building predictive models for drug response in pan-cancer models. Mainly, that bulk genotypic traits where the signal to noise ratio is high is the dominant behavior captured in these models. This suggests that improved techniques of feature selection that can discriminate individual cell response from histotype response will yield more successful pan-cancer models.
Collapse
Affiliation(s)
- Joshua D. Mannheimer
- School of Biomedical Engineering, Colorado State University, Fort Collins, 80523 CO USA
- Flint Animal Cancer Center, Colorado State University, Fort Collins, 80523 CO USA
| | - Dawn L. Duval
- Flint Animal Cancer Center, Colorado State University, Fort Collins, 80523 CO USA
- Department of Clinical Sciences, Colorado State University, Fort Collins, 80523 CO USA
| | - Ashok Prasad
- School of Biomedical Engineering, Colorado State University, Fort Collins, 80523 CO USA
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, 80523 CO USA
| | - Daniel L. Gustafson
- School of Biomedical Engineering, Colorado State University, Fort Collins, 80523 CO USA
- Flint Animal Cancer Center, Colorado State University, Fort Collins, 80523 CO USA
- Department of Clinical Sciences, Colorado State University, Fort Collins, 80523 CO USA
- University of Colorado Cancer Center Developmental Therapeutics Program, University of Colorado, Aurora, 80045 CO USA
| |
Collapse
|