1
|
Xie CX, De Simoni A, Eldridge S, Pinnock H, Relton C. Development of a conceptual framework for defining trial efficiency. PLoS One 2024; 19:e0304187. [PMID: 38781167 PMCID: PMC11115328 DOI: 10.1371/journal.pone.0304187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open
Abstract
BACKGROUND Globally, there is a growing focus on efficient trials, yet numerous interpretations have emerged, suggesting a significant heterogeneity in understanding "efficiency" within the trial context. Therefore in this study, we aimed to dissect the multifaceted nature of trial efficiency by establishing a comprehensive conceptual framework for its definition. OBJECTIVES To collate diverse perspectives regarding trial efficiency and to achieve consensus on a conceptual framework for defining trial efficiency. METHODS From July 2022 to July 2023, we undertook a literature review to identify various terms that have been used to define trial efficiency. We then conducted a modified e-Delphi study, comprising an exploratory open round and a subsequent scoring round to refine and validate the identified items. We recruited a wide range of experts in the global trial community including trialists, funders, sponsors, journal editors and members of the public. Consensus was defined as items rated "without disagreement", measured by the inter-percentile range adjusted for symmetry through the UCLA/RAND approach. RESULTS Seventy-eight studies were identified from a literature review, from which we extracted nine terms related to trial efficiency. We then used review findings as exemplars in the Delphi open round. Forty-nine international experts were recruited to the e-Delphi panel. Open round responses resulted in the refinement of the initial nine terms, which were consequently included in the scoring round. We obtained consensus on all nine items: 1) four constructs that collectively define trial efficiency containing scientific efficiency, operational efficiency, statistical efficiency and economic efficiency; and 2) five essential building blocks for efficient trial comprising trial design, trial process, infrastructure, superstructure, and stakeholders. CONCLUSIONS This is the first attempt to dissect the concept of trial efficiency into theoretical constructs. Having an agreed definition will allow better trial implementation and facilitate effective communication and decision-making across stakeholders. We also identified essential building blocks that are the cornerstones of an efficient trial. In this pursuit of understanding, we are not only unravelling the complexities of trial efficiency but also laying the groundwork for evaluating the efficiency of an individual trial or a trial system in the future.
Collapse
Affiliation(s)
- Charis Xuan Xie
- Wolfson Institute of Population Health, Queen Mary University of London, London, England, United Kingdom
| | - Anna De Simoni
- Wolfson Institute of Population Health, Queen Mary University of London, London, England, United Kingdom
| | - Sandra Eldridge
- Wolfson Institute of Population Health, Queen Mary University of London, London, England, United Kingdom
| | - Hilary Pinnock
- Usher Institute, Asthma UK Centre for Applied Research, The University of Edinburgh, Edinburgh, Scotland, United Kingdom
| | - Clare Relton
- Wolfson Institute of Population Health, Queen Mary University of London, London, England, United Kingdom
| |
Collapse
|
2
|
Gédor M, Desandes E, Chesnel M, Merlin JL, Marchal F, Lambert A, Baudin A. [Development of an artificial intelligence system to improve cancer clinical trial eligibility screening]. Bull Cancer 2024; 111:473-482. [PMID: 38503584 DOI: 10.1016/j.bulcan.2024.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 01/03/2024] [Accepted: 01/12/2024] [Indexed: 03/21/2024]
Abstract
INTRODUCTION The recruitment step of all clinical trials is time consuming, harsh and generate extra costs. Artificial intelligence tools could improve recruitment in order to shorten inclusion phase. The objective was to assess the performance of an artificial intelligence driven tool (text mining, machine learning, classification…) for the screening and detection of patients, potentially eligible for recruitment in one of the clinical trials open at the "Institut de Cancérologie de Lorraine". METHODS Computerized clinical data during the first medical consultation among patients managed in an anticancer center over the 2019-2023 period were used to study the performances of an artificial intelligence tool (SAS® Viya). Recall, precision and F1-score were used to determine the artificial intelligence algorithm effectiveness. Time saved on screening was determined by the difference between the time taken using the artificial intelligence-assisted method and that taken using the standard method in clinical trial participant screening. RESULTS Out of 9876 patients included in the study, the artificial intelligence algorithm obtained the following scores: precision of 96 %, recall of 94 % and a 0.95 F1-score to detect patients with breast cancer (n=2039) and potentially eligible for inclusion in a clinical trial. The screening of 258 potentially eligible patient's files took 20s per file vs. 5min and 6s with standard method. DISCUSSION This study suggests that artificial intelligence could yield sizable improvements over standard practices in several aspects of the patient screening process, as well as in approaches to feasibility, site selection, and trial selection.
Collapse
Affiliation(s)
- Maud Gédor
- Service en charge des données de santé, institut de cancérologie de Lorraine, 6, avenue de Bourgogne, 54519 Vandœuvre-lès-Nancy, France
| | - Emmanuel Desandes
- Service en charge des données de santé, institut de cancérologie de Lorraine, 6, avenue de Bourgogne, 54519 Vandœuvre-lès-Nancy, France; EA 4360 APEMAC, université de Lorraine, 9, avenue de la Forêt-de-Haye, 54505 Vandœuvre-lès-Nancy, France
| | - Mélanie Chesnel
- Direction de la santé numérique, institut de cancérologie de Lorraine, 6, avenue de Bourgogne, 54519 Vandœuvre-lès-Nancy, France
| | - Jean-Louis Merlin
- Service de biologie moléculaire des tumeurs, institut de cancérologie de Lorraine, CNRS UMR 7039 CRAN-université de Lorraine, 6, avenue de Bourgogne CS 30519, 54519 Vandœuvre-lès-Nancy, France
| | - Frédéric Marchal
- Département de chirurgie, institut de cancérologie de Lorraine, 6, avenue de Bourgogne, 54519 Vandœuvre-lès-Nancy, France; Centre de recherche en automatique de Nancy, Centre national de la recherche scientifique, UMR 7039, université de Lorraine, faculté des sciences et technologies-Campus Aiguillettes, 54506 Vandœuvre-lès-Nancy, France
| | - Aurélien Lambert
- EA 4360 APEMAC, université de Lorraine, 9, avenue de la Forêt-de-Haye, 54505 Vandœuvre-lès-Nancy, France; Département d'oncologie médicale, institut de cancérologie de Lorraine, 6 avenue de Bourgogne, 54519 Vandœuvre-lès-Nancy, France
| | - Arnaud Baudin
- Service en charge des données de santé, institut de cancérologie de Lorraine, 6, avenue de Bourgogne, 54519 Vandœuvre-lès-Nancy, France.
| |
Collapse
|
3
|
Foucher J, Azizi L, Öijerstedt L, Kläppe U, Ingre C. The usage of population and disease registries as pre-screening tools for clinical trials, a systematic review. Syst Rev 2024; 13:111. [PMID: 38654383 PMCID: PMC11040983 DOI: 10.1186/s13643-024-02533-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 04/12/2024] [Indexed: 04/25/2024] Open
Abstract
OBJECTIVE This systematic review aims to outline the use of population and disease registries for clinical trial pre-screening. MATERIALS AND METHODS The search was conducted in the time period of January 2014 to December 2022 in three databases: MEDLINE, Embase, and Web of Science Core Collection. References were screened using the Rayyan software, firstly based on titles and abstracts only, and secondly through full text review. Quality of the included studies was assessed using the List of Included Studies and quality Assurance in Review tool, enabling inclusion of publications of only moderate to high quality. RESULTS The search originally identified 1430 citations, but only 24 studies were included, reporting the use of population and/or disease registries for trial pre-screening. Nine disease domains were represented, with 54% of studies using registries based in the USA, and 62.5% of the studies using national registries. Half of the studies reported usage for drug trials, and over 478,679 patients were identified through registries in this review. Main advantages of the pre-screening methodology were reduced financial burden and time reduction. DISCUSSION AND CONCLUSION The use of registries for trial pre-screening increases reproducibility of the pre-screening process across trials and sites, allowing for implementation and improvement of a quality assurance process. Pre-screening strategies seem under-reported, and we encourage more trials to use and describe their pre-screening processes, as there is a need for standardized methodological guidelines.
Collapse
Affiliation(s)
- Juliette Foucher
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden.
- Department of Neurology, Karolinska University Hospital, Stockholm, Sweden.
| | - Louisa Azizi
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
| | - Linn Öijerstedt
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
- Department of Neurology, Karolinska University Hospital, Stockholm, Sweden
| | - Ulf Kläppe
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
- Department of Neurology, Karolinska University Hospital, Stockholm, Sweden
| | - Caroline Ingre
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
- Department of Neurology, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
4
|
Blasini R, Strantz C, Gulden C, Helfer S, Lidke J, Prokosch HU, Sohrabi K, Schneider H. Evaluation of Eligibility Criteria Relevance for the Purpose of IT-Supported Trial Recruitment: Descriptive Quantitative Analysis. JMIR Form Res 2024; 8:e49347. [PMID: 38294862 PMCID: PMC10867759 DOI: 10.2196/49347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 09/28/2023] [Accepted: 11/22/2023] [Indexed: 02/01/2024] Open
Abstract
BACKGROUND Clinical trials (CTs) are crucial for medical research; however, they frequently fall short of the requisite number of participants who meet all eligibility criteria (EC). A clinical trial recruitment support system (CTRSS) is developed to help identify potential participants by performing a search on a specific data pool. The accuracy of the search results is directly related to the quality of the data used for comparison. Data accessibility can present challenges, making it crucial to identify the necessary data for a CTRSS to query. Prior research has examined the data elements frequently used in CT EC but has not evaluated which criteria are actually used to search for participants. Although all EC must be met to enroll a person in a CT, not all criteria have the same importance when searching for potential participants in an existing data pool, such as an electronic health record, because some of the criteria are only relevant at the time of enrollment. OBJECTIVE In this study, we investigated which groups of data elements are relevant in practice for finding suitable participants and whether there are typical elements that are not relevant and can therefore be omitted. METHODS We asked trial experts and CTRSS developers to first categorize the EC of their CTs according to data element groups and then to classify them into 1 of 3 categories: necessary, complementary, and irrelevant. In addition, the experts assessed whether a criterion was documented (on paper or digitally) or whether it was information known only to the treating physicians or patients. RESULTS We reviewed 82 CTs with 1132 unique EC. Of these 1132 EC, 350 (30.9%) were considered necessary, 224 (19.8%) complementary, and 341 (30.1%) total irrelevant. To identify the most relevant data elements, we introduced the data element relevance index (DERI). This describes the percentage of studies in which the corresponding data element occurs and is also classified as necessary or supplementary. We found that the query of "diagnosis" was relevant for finding participants in 79 (96.3%) of the CTs. This group was followed by "date of birth/age" with a DERI of 85.4% (n=70) and "procedure" with a DERI of 35.4% (n=29). CONCLUSIONS The distribution of data element groups in CTs has been heterogeneously described in previous works. Therefore, we recommend identifying the percentage of CTs in which data element groups can be found as a more reliable way to determine the relevance of EC. Only necessary and complementary criteria should be included in this DERI.
Collapse
Affiliation(s)
- Romina Blasini
- Institute of Medical Informatics, Justus Liebig University, Giessen, Germany
| | - Cosima Strantz
- Department of Medical Informatics, Biometrics and Epidemiology, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Christian Gulden
- Department of Medical Informatics, Biometrics and Epidemiology, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Sven Helfer
- Department of Pediatrics, Medical Faculty and University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
| | - Jakub Lidke
- Data Integration Center, Medical Faculty, Philipps University of Marburg, Marburg, Germany
| | - Hans-Ulrich Prokosch
- Department of Medical Informatics, Biometrics and Epidemiology, Friedrich-Alexander Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Keywan Sohrabi
- Faculty of Health Sciences, Technische Hochschule Mittelhessen University of Applied Sciences, Giessen, Germany
| | - Henning Schneider
- Institute of Medical Informatics, Justus Liebig University, Giessen, Germany
- Faculty of Health Sciences, Technische Hochschule Mittelhessen University of Applied Sciences, Giessen, Germany
| |
Collapse
|
5
|
Lombardo G, Couvert C, Kose M, Begum A, Spiertz C, Worrell C, Hasselbaink D, Didden EM, Sforzini L, Todorovic M, Lewi M, Brown M, Vaterkowski M, Gullet N, Amasi-Hartoonian N, Griffon N, Pais R, Rodriguez Navarro S, Kremer A, Maes C, Tan EH, Moinat M, Ferrer JG, Pariante CM, Kalra D, Ammour N, Kalko S. Electronic health records (EHRs) in clinical research and platform trials: Application of the innovative EHR-based methods developed by EU-PEARL. J Biomed Inform 2023; 148:104553. [PMID: 38000766 DOI: 10.1016/j.jbi.2023.104553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 11/13/2023] [Accepted: 11/20/2023] [Indexed: 11/26/2023]
Abstract
OBJECTIVE Electronic Health Record (EHR) systems are digital platforms in clinical practice used to collect patients' clinical information related to their health status and represents a useful storage of real-world data. EHRs have a potential role in research studies, in particular, in platform trials. Platform trials are innovative trial designs including multiple trial arms (conducted simultaneously and/or sequentially) on different treatments under a single master protocol. However, the use of EHRs in research comes with important challenges such as incompleteness of records and the need to translate trial eligibility criteria into interoperable queries. In this paper, we aim to review and to describe our proposed innovative methods to tackle some of the most important challenges identified. This work is part of the Innovative Medicines Initiative (IMI) EU Patient-cEntric clinicAl tRial pLatforms (EU-PEARL) project's work package 3 (WP3), whose objective is to deliver tools and guidance for EHR-based protocol feasibility assessment, clinical site selection, and patient pre-screening in platform trials, investing in the building of a data-driven clinical network framework that can execute these complex innovative designs for which feasibility assessments are critically important. METHODS ISO standards and relevant references informed a readiness survey, producing 354 criteria with corresponding questions selected and harmonised through a 7-round scoring process (0-1) in stakeholder meetings, with 85% of consensus being the threshold of acceptance for a criterium/question. ATLAS cohort definition and Cohort Diagnostics were mainly used to create the trial feasibility eligibility (I/E) criteria as executable interoperable queries. RESULTS The WP3/EU-PEARL group developed a readiness survey (eSurvey) for an efficient selection of clinical sites with suitable EHRs, consisting of yes-or-no questions, and a set-up of interoperable proxy queries using physicians' defined trial criteria. Both actions facilitate recruiting trial participants and alignment between study costs/timelines and data-driven recruitment potential. CONCLUSION The eSurvey will help create an archive of clinical sites with mature EHR systems suitable to participate in clinical trials/platform trials, and the interoperable proxy queries of trial eligibility criteria will help identify the number of potential participants. Ultimately, these tools will contribute to the production of EHR-based protocol design.
Collapse
Affiliation(s)
- Giulia Lombardo
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK.
| | - Camille Couvert
- Sanofi R&D, Global Development, Clinical Science & Operations, Chilly-Mazarin, France
| | - Melisa Kose
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
| | - Amina Begum
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
| | - Cecile Spiertz
- The Janssen Pharmaceutical Companies of Johnson & Johnson, Leiden, The Netherlands
| | - Courtney Worrell
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
| | | | - Eva-Maria Didden
- Actelion, a Janssen company of Johnson & Johnson, Allschwil, Basel-Country, Switzerland
| | - Luca Sforzini
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
| | - Marija Todorovic
- Johnson & Johnson Clinical Operations (JJCO), Johnson & Johnson company, Belgrade, Serbia
| | - Martine Lewi
- Global Commercial Strategy Organization, the Janssen Pharmaceutical Companies of Johnson & Johnson, Raritan, New Jersey, USA
| | - Mollie Brown
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
| | - Morgan Vaterkowski
- Assistance Publique Hôpitaux de Paris, IT Department, Innovation and Data, Paris, France, and EPITA EPITA School of Engineering and Computer Science, Paris, France
| | - Nancy Gullet
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
| | - Nare Amasi-Hartoonian
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
| | - Nicolas Griffon
- Information Technology Department, AP-HP, Paris, France; LIMICS, Inserm U1142, Sorbonne Université, Paris, France
| | - Raluca Pais
- Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Hôpital Pitié-Salpêtrière, Institute of Cardiometabolism and Nutrition, INSERM UMRS_938, Paris, France
| | | | - Andreas Kremer
- Information Technology for Translational Medicine, ITTM S.A, House of BioHealth, Esch-sur-Alzette, Luxembourg
| | - Christophe Maes
- The European Institute for Innovation through health data, and Department Public Health and Primary Care, Unit of Medical Informatics and Statistics, Faculty of Medicine and Health Sciences, Ghent University, Gent, Belgium
| | - Eng Hooi Tan
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Maxim Moinat
- Erasmus University Medical Center, Rotterdam, the Netherlands
| | | | - Carmine M Pariante
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
| | - Dipak Kalra
- The European Institute for Innovation through Health Data and Visiting Professor, University of Ghent, Gent, Belgium
| | - Nadir Ammour
- Sanofi R&D, Global Development, Clinical Science & Operations, Chilly-Mazarin, France
| | - Susana Kalko
- Vall d'Hebron Research Institute (VHIR), Barcelona, Spain.
| |
Collapse
|
6
|
Xu Q, Liu Y, Sun D, Huang X, Li F, Zhai J, Li Y, Zhou Q, Qian N, Niu B. OncoCTMiner: streamlining precision oncology trial matching via molecular profile analysis. Database (Oxford) 2023; 2023:baad077. [PMID: 37935585 PMCID: PMC10630409 DOI: 10.1093/database/baad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 09/08/2023] [Accepted: 10/21/2023] [Indexed: 11/09/2023]
Abstract
By establishing omics sequencing of patient tumors as a crucial element in cancer treatment, the extensive implementation of precision oncology necessitates effective and prompt execution of clinical studies for approving molecular-targeted therapies. However, the substantial volume of patient sequencing data, combined with strict clinical trial criteria, increasingly complicates the process of matching patients to precision oncology studies. To streamline enrollment in these studies, we developed OncoCTMiner, an automated pre-screening platform for molecular cancer clinical trials. Through manual tagging of eligibility criteria for 2227 oncology trials, we identified key bio-concepts such as cancer types, genes, alterations, drugs, biomarkers and therapies. Utilizing this manually annotated corpus along with open-source biomedical natural language processing tools, we trained multiple named entity recognition models specifically designed for precision oncology trials. These models analyzed 460 952 clinical trials, revealing 8.15 million precision medicine concepts, 9.32 million entity-criteria-trial triplets and a comprehensive precision oncology eligibility criteria database. Most significantly, we developed a patient-trial matching system based on cancer patients' clinical and genetic profiles, which can seamlessly integrate with the omics data analysis platform. This system expedites the pre-screening process for potentially suitable precision oncology trials, offering patients swifter access to promising treatment options. Database URL https://oncoctminer.chosenmedinfo.com.
Collapse
Affiliation(s)
- Quan Xu
- Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
- Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
| | - Yueyue Liu
- Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
| | - Dawei Sun
- Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
- Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
| | - Xiaoqian Huang
- Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
| | - Feihong Li
- Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
| | - JinCheng Zhai
- Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
| | - Yang Li
- Beijing International Center for Mathematical Research, Peking University, No. 5 Yiheyuan Road Haidian District, Beijing 100871, China
- Chongqing Research Institute of Big Data, Peking University, Chongqing 401333, China
| | - Qiming Zhou
- Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
- Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
| | - Niansong Qian
- Department of Oncology, Senior Department of Respiratory and Critical Care Medicine, The Eighth Medical Center of Chinese PLA General Hospital, No.17 A Heishanhu Road, Haidian District, Beijing 100853, China
| | - Beifang Niu
- Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
- University of Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
7
|
Su Q, Cheng G, Huang J. A review of research on eligibility criteria for clinical trials. Clin Exp Med 2023; 23:1867-1879. [PMID: 36602707 PMCID: PMC9815064 DOI: 10.1007/s10238-022-00975-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 12/06/2022] [Indexed: 01/06/2023]
Abstract
The purpose of this paper is to systematically sort out and analyze the cutting-edge research on the eligibility criteria of clinical trials. Eligibility criteria are important prerequisites for the success of clinical trials. It directly affects the final results of the clinical trials. Inappropriate eligibility criteria will lead to insufficient recruitment, which is an important reason for the eventual failure of many clinical trials. We have investigated the research status of eligibility criteria for clinical trials on academic platforms such as arXiv and NIH. We have classified and sorted out all the papers we found, so that readers can understand the frontier research in this field. Eligibility criteria are the most important part of a clinical trial study. The ultimate goal of research in this field is to formulate more scientific and reasonable eligibility criteria and speed up the clinical trial process. The global research on the eligibility criteria of clinical trials is mainly divided into four main aspects: natural language processing, patient pre-screening, standard evaluation, and clinical trial query. Compared with the past, people are now using new technologies to study eligibility criteria from a new perspective (big data). In the research process, complex disease concepts, how to choose a suitable dataset, how to prove the validity and scientific of the research results, are challenges faced by researchers (especially for computer-related researchers). Future research will focus on the selection and improvement of artificial intelligence algorithms related to clinical trials and related practical applications such as databases, knowledge graphs, and dictionaries.
Collapse
Affiliation(s)
- Qianmin Su
- Department of Computer Science, School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, No. 333 Longteng Road, Shanghai, 201620, China.
| | - Gaoyi Cheng
- Department of Computer Science, School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, No. 333 Longteng Road, Shanghai, 201620, China
| | - Jihan Huang
- Center for Drug Clinical Research, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China
| |
Collapse
|
8
|
Kaskovich S, Wyatt KD, Oliwa T, Graglia L, Furner B, Lee J, Mayampurath A, Volchenboum SL. Automated Matching of Patients to Clinical Trials: A Patient-Centric Natural Language Processing Approach for Pediatric Leukemia. JCO Clin Cancer Inform 2023; 7:e2300009. [PMID: 37428994 PMCID: PMC10857751 DOI: 10.1200/cci.23.00009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 04/05/2023] [Accepted: 05/10/2023] [Indexed: 07/12/2023] Open
Abstract
PURPOSE Matching patients to clinical trials is cumbersome and costly. Attempts have been made to automate the matching process; however, most have used a trial-centric approach, which focuses on a single trial. In this study, we developed a patient-centric matching tool that matches patient-specific demographic and clinical information with free-text clinical trial inclusion and exclusion criteria extracted using natural language processing to return a list of relevant clinical trials ordered by the patient's likelihood of eligibility. MATERIALS AND METHODS Records from pediatric leukemia clinical trials were downloaded from ClinicalTrials.gov. Regular expressions were used to discretize and extract individual trial criteria. A multilabel support vector machine (SVM) was trained to classify sentence embeddings of criteria into relevant clinical categories. Labeled criteria were parsed using regular expressions to extract numbers, comparators, and relationships. In the validation phase, a patient-trial match score was generated for each trial and returned in the form of a ranked list for each patient. RESULTS In total, 5,251 discretized criteria were extracted from 216 protocols. The most frequent criterion was previous chemotherapy/biologics (17%). The multilabel SVM demonstrated a pooled accuracy of 75%. The text processing pipeline was able to automatically extract 68% of eligibility criteria rules, as compared with 80% in a manual version of the tool. Automated matching was accomplished in approximately 4 seconds, as compared with several hours using manual derivation. CONCLUSION To our knowledge, this project represents the first open-source attempt to generate a patient-centric clinical trial matching tool. The tool demonstrated acceptable performance when compared with a manual version, and it has potential to save time and money when matching patients to trials.
Collapse
Affiliation(s)
| | - Kirk D Wyatt
- Department of Pediatric Hematology/Oncology, Roger Maris Cancer Center, Sanford Health, Fargo, ND
| | - Tomasz Oliwa
- Center for Research Informatics, University of Chicago, Chicago, IL
| | - Luca Graglia
- Department of Pediatrics, University of Chicago, Chicago, IL
| | - Brian Furner
- Department of Pediatrics, University of Chicago, Chicago, IL
| | - Jooho Lee
- Department of Pediatrics, University of Chicago, Chicago, IL
| | - Anoop Mayampurath
- Department of Biostatistics and Medical Informatics, School of Medicine and Public Health, University of Wisconsin, Madison, WI
| | | |
Collapse
|
9
|
Ismail A, Al-Zoubi T, El Naqa I, Saeed H. The role of artificial intelligence in hastening time to recruitment in clinical trials. BJR Open 2023; 5:20220023. [PMID: 37953865 PMCID: PMC10636341 DOI: 10.1259/bjro.20220023] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 03/20/2023] [Accepted: 04/11/2023] [Indexed: 09/01/2023] Open
Abstract
Novel and developing artificial intelligence (AI) systems can be integrated into healthcare settings in numerous ways. For example, in the case of automated image classification and natural language processing, AI systems are beginning to demonstrate near expert level performance in detecting abnormalities such as seizure activity. This paper, however, focuses on AI integration into clinical trials. During the clinical trial recruitment process, considerable labor and time is spent sifting through electronic health record and interviewing patients. With the advancement of deep learning techniques such as natural language processing, intricate electronic health record data can be efficiently processed. This provides utility to workflows such as recruitment for clinical trials. Studies are starting to show promise in shortening the time to recruitment and reducing workload for those involved in clinical trial design. Additionally, numerous guidelines are being constructed to encourage integration of AI into the healthcare setting with meaningful impact. The goal would be to improve the clinical trial process by reducing bias in patient composition, improving retention of participants, and lowering costs and labor.
Collapse
Affiliation(s)
- Abdalah Ismail
- Advocate Aurora Health Care Department of Diagnostic Radiology, Aurora, United States
| | | | | | - Hina Saeed
- Lynn Cancer Institute-Baptist Health City, Boca Raton, United States
| |
Collapse
|
10
|
Meystre SM, Heider PM, Cates A, Bastian G, Pittman T, Gentilin S, Kelechi TJ. Piloting an automated clinical trial eligibility surveillance and provider alert system based on artificial intelligence and standard data models. BMC Med Res Methodol 2023; 23:88. [PMID: 37041475 PMCID: PMC10088225 DOI: 10.1186/s12874-023-01916-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 04/04/2023] [Indexed: 04/13/2023] Open
Abstract
BACKGROUND To advance new therapies into clinical care, clinical trials must recruit enough participants. Yet, many trials fail to do so, leading to delays, early trial termination, and wasted resources. Under-enrolling trials make it impossible to draw conclusions about the efficacy of new therapies. An oft-cited reason for insufficient enrollment is lack of study team and provider awareness about patient eligibility. Automating clinical trial eligibility surveillance and study team and provider notification could offer a solution. METHODS To address this need for an automated solution, we conducted an observational pilot study of our TAES (TriAl Eligibility Surveillance) system. We tested the hypothesis that an automated system based on natural language processing and machine learning algorithms could detect patients eligible for specific clinical trials by linking the information extracted from trial descriptions to the corresponding clinical information in the electronic health record (EHR). To evaluate the TAES information extraction and matching prototype (i.e., TAES prototype), we selected five open cardiovascular and cancer trials at the Medical University of South Carolina and created a new reference standard of 21,974 clinical text notes from a random selection of 400 patients (including at least 100 enrolled in the selected trials), with a small subset of 20 notes annotated in detail. We also developed a simple web interface for a new database that stores all trial eligibility criteria, corresponding clinical information, and trial-patient match characteristics using the Observational Medical Outcomes Partnership (OMOP) common data model. Finally, we investigated options for integrating an automated clinical trial eligibility system into the EHR and for notifying health care providers promptly of potential patient eligibility without interrupting their clinical workflow. RESULTS Although the rapidly implemented TAES prototype achieved only moderate accuracy (recall up to 0.778; precision up to 1.000), it enabled us to assess options for integrating an automated system successfully into the clinical workflow at a healthcare system. CONCLUSIONS Once optimized, the TAES system could exponentially enhance identification of patients potentially eligible for clinical trials, while simultaneously decreasing the burden on research teams of manual EHR review. Through timely notifications, it could also raise physician awareness of patient eligibility for clinical trials.
Collapse
Affiliation(s)
- Stéphane M Meystre
- OnePlanet Research Center and imec, Toernooiveld 300, Nijmegen, 6525 EC, The Netherlands.
| | - Paul M Heider
- Medical University of South Carolina, Charleston, SC, USA
| | - Andrew Cates
- Medical University of South Carolina, Charleston, SC, USA
| | - Grace Bastian
- Medical University of South Carolina, Charleston, SC, USA
| | - Tara Pittman
- Medical University of South Carolina, Charleston, SC, USA
| | | | | |
Collapse
|
11
|
Chow R, Midroni J, Kaur J, Boldt G, Liu G, Eng L, Liu FF, Haibe-Kains B, Lock M, Raman S. Use of artificial intelligence for cancer clinical trial enrollment: a systematic review and meta-analysis. J Natl Cancer Inst 2023; 115:365-374. [PMID: 36688707 PMCID: PMC10086633 DOI: 10.1093/jnci/djad013] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 12/13/2022] [Accepted: 01/11/2023] [Indexed: 01/24/2023] Open
Abstract
BACKGROUND The aim of this study is to provide a comprehensive understanding of the current landscape of artificial intelligence (AI) for cancer clinical trial enrollment and its predictive accuracy in identifying eligible patients for inclusion in such trials. METHODS Databases of PubMed, Embase, and Cochrane CENTRAL were searched until June 2022. Articles were included if they reported on AI actively being used in the clinical trial enrollment process. Narrative synthesis was conducted among all extracted data: accuracy, sensitivity, specificity, positive predictive value, and negative predictive value. For studies where the 2x2 contingency table could be calculated or supplied by authors, a meta-analysis to calculate summary statistics was conducted using the hierarchical summary receiver operating characteristics curve model. RESULTS Ten articles reporting on more than 50 000 patients in 19 datasets were included. Accuracy, sensitivity, and specificity exceeded 80% in all but 1 dataset. Positive predictive value exceeded 80% in 5 of 17 datasets. Negative predictive value exceeded 80% in all datasets. Summary sensitivity was 90.5% (95% confidence interval [CI] = 70.9% to 97.4%); summary specificity was 99.3% (95% CI = 81.8% to 99.9%). CONCLUSIONS AI demonstrated comparable, if not superior, performance to manual screening for patient enrollment into cancer clinical trials. As well, AI is highly efficient, requiring less time and human resources to screen patients. AI should be further investigated and implemented for patient recruitment into cancer clinical trials. Future research should validate the use of AI for clinical trials enrollment in less resource-rich regions and ensure broad inclusion for generalizability to all sexes, ages, and ethnicities.
Collapse
Affiliation(s)
- Ronald Chow
- Princess Margaret Cancer Centre, University Health Network, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- London Regional Cancer Program, London Health Sciences Centre, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada
- Institute of Biomedical Engineering, Faculty of Applied Science and Engineering, University of Toronto, Toronto, ON, Canada
| | - Julie Midroni
- Princess Margaret Cancer Centre, University Health Network, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Jagdeep Kaur
- London Regional Cancer Program, London Health Sciences Centre, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada
| | - Gabriel Boldt
- London Regional Cancer Program, London Health Sciences Centre, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada
| | - Geoffrey Liu
- Princess Margaret Cancer Centre, University Health Network, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Lawson Eng
- Princess Margaret Cancer Centre, University Health Network, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Fei-Fei Liu
- Princess Margaret Cancer Centre, University Health Network, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Centre, University Health Network, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Michael Lock
- London Regional Cancer Program, London Health Sciences Centre, Schulich School of Medicine and Dentistry, University of Western Ontario, London, ON, Canada
| | - Srinivas Raman
- Princess Margaret Cancer Centre, University Health Network, Temerty Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
12
|
Maheshwari K, Cywinski JB, Papay F, Khanna AK, Mathur P. Artificial Intelligence for Perioperative Medicine: Perioperative Intelligence. Anesth Analg 2023; 136:637-645. [PMID: 35203086 DOI: 10.1213/ane.0000000000005952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The anesthesiologist's role has expanded beyond the operating room, and anesthesiologist-led care teams can deliver coordinated care that spans the entire surgical experience, from preoperative optimization to long-term recovery of surgical patients. This expanded role can help reduce postoperative morbidity and mortality, which are regrettably common, unlike rare intraoperative mortality. Postoperative mortality, if considered a disease category, will be the third leading cause of death just after heart disease and cancer. Rapid advances in technologies like artificial intelligence provide an opportunity to build safe perioperative practices. Artificial intelligence helps by analyzing complex data across disparate systems and producing actionable information. Using artificial intelligence technologies, we can critically examine every aspect of perioperative medicine and devise innovative value-based solutions that can potentially improve patient safety and care delivery, while optimizing cost of care. In this narrative review, we discuss specific applications of artificial intelligence that may help advance all aspects of perioperative medicine, including clinical care, education, quality improvement, and research. We also discuss potential limitations of technology and provide our recommendations for successful adoption.
Collapse
Affiliation(s)
| | | | | | - Ashish K Khanna
- Department of Anesthesiology, Section on Critical Care Medicine, Wake Forest University School of Medicine, Winston-Salem, North Carolina
- Outcomes Research Consortium, Cleveland, Ohio
| | | |
Collapse
|
13
|
Williams E, Kienast M, Medawar E, Reinelt J, Merola A, Klopfenstein SAI, Flint AR, Heeren P, Poncette AS, Balzer F, Beimes J, von Bünau P, Chromik J, Arnrich B, Scherf N, Niehaus S. A Standardized Clinical Data Harmonization Pipeline for Scalable AI Application Deployment (FHIR-DHP): Validation and Usability Study. JMIR Med Inform 2023; 11:e43847. [PMID: 36943344 PMCID: PMC10131740 DOI: 10.2196/43847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Increasing digitalization in the medical domain gives rise to large amounts of health care data, which has the potential to expand clinical knowledge and transform patient care if leveraged through artificial intelligence (AI). Yet, big data and AI oftentimes cannot unlock their full potential at scale, owing to nonstandardized data formats, lack of technical and semantic data interoperability, and limited cooperation between stakeholders in the health care system. Despite the existence of standardized data formats for the medical domain, such as Fast Healthcare Interoperability Resources (FHIR), their prevalence and usability for AI remain limited. OBJECTIVE In this paper, we developed a data harmonization pipeline (DHP) for clinical data sets relying on the common FHIR data standard. METHODS We validated the performance and usability of our FHIR-DHP with data from the Medical Information Mart for Intensive Care IV database. RESULTS We present the FHIR-DHP workflow in respect of the transformation of "raw" hospital records into a harmonized, AI-friendly data representation. The pipeline consists of the following 5 key preprocessing steps: querying of data from hospital database, FHIR mapping, syntactic validation, transfer of harmonized data into the patient-model database, and export of data in an AI-friendly format for further medical applications. A detailed example of FHIR-DHP execution was presented for clinical diagnoses records. CONCLUSIONS Our approach enables the scalable and needs-driven data modeling of large and heterogenous clinical data sets. The FHIR-DHP is a pivotal step toward increasing cooperation, interoperability, and quality of patient care in the clinical routine and for medical research.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Anne Rike Flint
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Patrick Heeren
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | | | - Felix Balzer
- Institute of Medical Informatics, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | | | | | - Jonas Chromik
- Digital Health - Connected Healthcare, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
| | - Bert Arnrich
- Digital Health - Connected Healthcare, Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
| | - Nico Scherf
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | | |
Collapse
|
14
|
Greve K, Ni Y, Bailes AF, Vargus-Adams J, Miley AE, Aronow B, McMahon MM, Kurowski BG, Mitelpunkt A. Gross motor function prediction using natural language processing in cerebral palsy. Dev Med Child Neurol 2023; 65:100-106. [PMID: 35665923 PMCID: PMC9720038 DOI: 10.1111/dmcn.15301] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 05/03/2022] [Accepted: 05/05/2022] [Indexed: 01/12/2023]
Abstract
AIM To predict ambulatory status and Gross Motor Function Classification System (GMFCS) levels in patients with cerebral palsy (CP) by applying natural language processing (NLP) to electronic health record (EHR) clinical notes. METHOD Individuals aged 8 to 26 years with a diagnosis of CP in the EHR between January 2009 and November 2020 (~12 years of data) were included in a cross-sectional retrospective cohort of 2483 patients. The cohort was divided into train-test and validation groups. Positive predictive value, sensitivity, specificity, and area under the receiver operating curve (AUC) were calculated for prediction of ambulatory status and GMFCS levels. RESULTS The median age was 15 years (interquartile range 10-20 years) for the total cohort, with 56% being male and 75% White. The validation group resulted in 70% sensitivity, 88% specificity, 81% positive predictive value, and 0.89 AUC for predicting ambulatory status. NLP applied to the EHR differentiated between GMFCS levels I-II and III (15% sensitivity, 96% specificity, 46% positive predictive value, and 0.71 AUC); and IV and V (81% sensitivity, 51% specificity, 70% positive predictive value, and 0.75 AUC). INTERPRETATION NLP applied to the EHR demonstrated excellent differentiation between ambulatory and non-ambulatory status, and good differentiation between GMFCS levels I-II and III, and IV and V. Clinical use of NLP may help to individualize functional characterization and management. WHAT THIS PAPER ADDS Natural language processing (NLP) applied to the electronic health record (EHR) can predict ambulatory status in children with cerebral palsy (CP). NLP provides good prediction of Gross Motor Function Classification System level in children with CP using the EHR. NLP methods described could be integrated in an EHR system to provide real-time information.
Collapse
Affiliation(s)
- Kelly Greve
- Division of Occupational Therapy and Physical Therapy, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Rehabilitation, Exercise and Nutrition Sciences, University of Cincinnati College of Allied Health Sciences, Cincinnati, OH, USA
| | - Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Amy F. Bailes
- Division of Occupational Therapy and Physical Therapy, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Rehabilitation, Exercise and Nutrition Sciences, University of Cincinnati College of Allied Health Sciences, Cincinnati, OH, USA
| | - Jilda Vargus-Adams
- Division of Pediatric Rehabilitation Medicine, Cincinnati Children’s Hospital Medical Center, OH, USA
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, OH, USA
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, College of Medicine, Cincinnati, OH, USA
| | - Aimee E. Miley
- Division of Pediatric Rehabilitation Medicine, Cincinnati Children’s Hospital Medical Center, OH, USA
| | - Bruce Aronow
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, OH, USA
| | - Mary M. McMahon
- Division of Pediatric Rehabilitation Medicine, Cincinnati Children’s Hospital Medical Center, OH, USA
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, OH, USA
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, College of Medicine, Cincinnati, OH, USA
| | - Brad G. Kurowski
- Division of Pediatric Rehabilitation Medicine, Cincinnati Children’s Hospital Medical Center, OH, USA
- Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, OH, USA
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati, College of Medicine, Cincinnati, OH, USA
| | - Alexis Mitelpunkt
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Pediatric Rehabilitation, Department of Rehabilitation, Dana-Dwek Children’s Hospital, Tel Aviv Medical Center, Tel Aviv, Israel
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
15
|
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022; 6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open
Abstract
PURPOSE The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Xiaoyang Ruan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sungrim Moon
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Michelle Mai
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Irbaz B. Riaz
- Department of Hematology/Oncology, Mayo Clinic, Scottsdale, AZ
| | - Nan Wang
- Department of Computer Science and Engineering, College of Science and Engineering, University of Minnesota, Minneapolis, MN
| | - Ping Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| | - Jeremy L. Warner
- Departments of Medicine (Hematology/Oncology), Vanderbilt University, Nashville, TN
- Department Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| |
Collapse
|
16
|
Fang Y, Idnay B, Sun Y, Liu H, Chen Z, Marder K, Xu H, Schnall R, Weng C. Combining human and machine intelligence for clinical trial eligibility querying. J Am Med Inform Assoc 2022; 29:1161-1171. [PMID: 35426943 PMCID: PMC9196697 DOI: 10.1093/jamia/ocac051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 03/29/2022] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE To combine machine efficiency and human intelligence for converting complex clinical trial eligibility criteria text into cohort queries. MATERIALS AND METHODS Criteria2Query (C2Q) 2.0 was developed to enable real-time user intervention for criteria selection and simplification, parsing error correction, and concept mapping. The accuracy, precision, recall, and F1 score of enhanced modules for negation scope detection, temporal and value normalization were evaluated using a previously curated gold standard, the annotated eligibility criteria of 1010 COVID-19 clinical trials. The usability and usefulness were evaluated by 10 research coordinators in a task-oriented usability evaluation using 5 Alzheimer's disease trials. Data were collected by user interaction logging, a demographic questionnaire, the Health Information Technology Usability Evaluation Scale (Health-ITUES), and a feature-specific questionnaire. RESULTS The accuracies of negation scope detection, temporal and value normalization were 0.924, 0.916, and 0.966, respectively. C2Q 2.0 achieved a moderate usability score (3.84 out of 5) and a high learnability score (4.54 out of 5). On average, 9.9 modifications were made for a clinical study. Experienced researchers made more modifications than novice researchers. The most frequent modification was deletion (5.35 per study). Furthermore, the evaluators favored cohort queries resulting from modifications (score 4.1 out of 5) and the user engagement features (score 4.3 out of 5). DISCUSSION AND CONCLUSION Features to engage domain experts and to overcome the limitations in automated machine output are shown to be useful and user-friendly. We concluded that human-computer collaboration is key to improving the adoption and user-friendliness of natural language processing.
Collapse
Affiliation(s)
- Yilu Fang
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Betina Idnay
- School of Nursing, Columbia University, New York, New York, USA.,Department of Neurology, Columbia University, New York, New York, USA
| | - Yingcheng Sun
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Hao Liu
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Zhehuan Chen
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Karen Marder
- Department of Neurology, Columbia University, New York, New York, USA
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Rebecca Schnall
- School of Nursing, Columbia University, New York, New York, USA.,Heilbrunn Department of Population and Family Health, Mailman School of Public Health, Columbia University, New York, New York, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| |
Collapse
|
17
|
Rafee A, Riepenhausen S, Neuhaus P, Meidt A, Dugas M, Varghese J. ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials. BMC Med Res Methodol 2022; 22:141. [PMID: 35568796 PMCID: PMC9107639 DOI: 10.1186/s12874-022-01611-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 04/20/2022] [Indexed: 12/21/2022] Open
Abstract
Background Screening for eligible patients continues to pose a great challenge for many clinical trials. This has led to a rapidly growing interest in standardizing computable representations of eligibility criteria (EC) in order to develop tools that leverage data from electronic health record (EHR) systems. Although laboratory procedures (LP) represent a common entity of EC that is readily available and retrievable from EHR systems, there is a lack of interoperable data models for this entity of EC. A public, specialized data model that utilizes international, widely-adopted terminology for LP, e.g. Logical Observation Identifiers Names and Codes (LOINC®), is much needed to support automated screening tools. Objective The aim of this study is to establish a core dataset for LP most frequently requested to recruit patients for clinical trials using LOINC terminology. Employing such a core dataset could enhance the interface between study feasibility platforms and EHR systems and significantly improve automatic patient recruitment. Methods We used a semi-automated approach to analyze 10,516 screening forms from the Medical Data Models (MDM) portal’s data repository that are pre-annotated with Unified Medical Language System (UMLS). An automated semantic analysis based on concept frequency is followed by an extensive manual expert review performed by physicians to analyze complex recruitment-relevant concepts not amenable to automatic approach. Results Based on analysis of 138,225 EC from 10,516 screening forms, 55 laboratory procedures represented 77.87% of all UMLS laboratory concept occurrences identified in the selected EC forms. We identified 26,413 unique UMLS concepts from 118 UMLS semantic types and covered the vast majority of Medical Subject Headings (MeSH) disease domains. Conclusions Only a small set of common LP covers the majority of laboratory concepts in screening EC forms which supports the feasibility of establishing a focused core dataset for LP. We present ELaPro, a novel, LOINC-mapped, core dataset for the most frequent 55 LP requested in screening for clinical trials. ELaPro is available in multiple machine-readable data formats like CSV, ODM and HL7 FHIR. The extensive manual curation of this large number of free-text EC as well as the combining of UMLS and LOINC terminologies distinguishes this specialized dataset from previous relevant datasets in the literature. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-022-01611-y.
Collapse
Affiliation(s)
- Ahmed Rafee
- Institute of Medical Informatics, University of Münster, Münster, Germany. .,Department of Internal Medicine (D), University Hospital of Münster, Münster, Germany.
| | - Sarah Riepenhausen
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Philipp Neuhaus
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Alexandra Meidt
- Institute of Medical Informatics, University of Münster, Münster, Germany
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Heidelberg, Germany
| | - Julian Varghese
- Institute of Medical Informatics, University of Münster, Münster, Germany.
| |
Collapse
|
18
|
The Role of Artificial Intelligence in Early Cancer Diagnosis. Cancers (Basel) 2022; 14:cancers14061524. [PMID: 35326674 PMCID: PMC8946688 DOI: 10.3390/cancers14061524] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/08/2022] [Accepted: 03/10/2022] [Indexed: 02/01/2023] Open
Abstract
Improving the proportion of patients diagnosed with early-stage cancer is a key priority of the World Health Organisation. In many tumour groups, screening programmes have led to improvements in survival, but patient selection and risk stratification are key challenges. In addition, there are concerns about limited diagnostic workforces, particularly in light of the COVID-19 pandemic, placing a strain on pathology and radiology services. In this review, we discuss how artificial intelligence algorithms could assist clinicians in (1) screening asymptomatic patients at risk of cancer, (2) investigating and triaging symptomatic patients, and (3) more effectively diagnosing cancer recurrence. We provide an overview of the main artificial intelligence approaches, including historical models such as logistic regression, as well as deep learning and neural networks, and highlight their early diagnosis applications. Many data types are suitable for computational analysis, including electronic healthcare records, diagnostic images, pathology slides and peripheral blood, and we provide examples of how these data can be utilised to diagnose cancer. We also discuss the potential clinical implications for artificial intelligence algorithms, including an overview of models currently used in clinical practice. Finally, we discuss the potential limitations and pitfalls, including ethical concerns, resource demands, data security and reporting standards.
Collapse
|
19
|
Kataria S, Ravindran V. Musculoskeletal care - at the confluence of data science, sensors, engineering, and computation. BMC Musculoskelet Disord 2022; 23:169. [PMID: 35193536 PMCID: PMC8863295 DOI: 10.1186/s12891-022-05126-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 02/17/2022] [Indexed: 12/27/2022] Open
Abstract
Data has always been integral to modern medicine in almost all aspects of patient care and the recent proliferation of data has opened up innumerable opportunities for all the stakeholders in trying to improve the quality of care and health outcomes including quality of life and rehabilitation. Greater usage and adoption of digital technologies have led to the convergence of health data in different forms – clinical, self-reported, electronic health records social media, etc. The application and utilization of patient data set continue to get broadened each day with greater availability and access. These are empowering newer cutting-edge solutions such as connected care and artificial intelligence, 3D printing and real-life mimicking prosthetics. The availability of data at micro and macro levels has the potential to act as a catalyst for personalized care based on behavioral, cultural, genetic, and psychological needs for patients with musculoskeletal disorders. Realistic algorithms coupled with biomarkers which can identify relevant interventions and alert the care providers regarding any deterioration. Although in the nascent stage currently, 3D printing, exoskeletons, and virtual rehabilitation hold tremendous potential of cost-effective, precise interventions for the patients.
Collapse
|
20
|
Idnay B, Dreisbach C, Weng C, Schnall R. A systematic review on natural language processing systems for eligibility prescreening in clinical research. J Am Med Inform Assoc 2021; 29:197-206. [PMID: 34725689 PMCID: PMC8714283 DOI: 10.1093/jamia/ocab228] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 08/30/2021] [Accepted: 10/04/2021] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE We conducted a systematic review to assess the effect of natural language processing (NLP) systems in improving the accuracy and efficiency of eligibility prescreening during the clinical research recruitment process. MATERIALS AND METHODS Guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) standards of quality for reporting systematic reviews, a protocol for study eligibility was developed a priori and registered in the PROSPERO database. Using predetermined inclusion criteria, studies published from database inception through February 2021 were identified from 5 databases. The Joanna Briggs Institute Critical Appraisal Checklist for Quasi-experimental Studies was adapted to determine the study quality and the risk of bias of the included articles. RESULTS Eleven studies representing 8 unique NLP systems met the inclusion criteria. These studies demonstrated moderate study quality and exhibited heterogeneity in the study design, setting, and intervention type. All 11 studies evaluated the NLP system's performance for identifying eligible participants; 7 studies evaluated the system's impact on time efficiency; 4 studies evaluated the system's impact on workload; and 2 studies evaluated the system's impact on recruitment. DISCUSSION NLP systems in clinical research eligibility prescreening are an understudied but promising field that requires further research to assess its impact on real-world adoption. Future studies should be centered on continuing to develop and evaluate relevant NLP systems to improve enrollment into clinical studies. CONCLUSION Understanding the role of NLP systems in improving eligibility prescreening is critical to the advancement of clinical research recruitment.
Collapse
Affiliation(s)
- Betina Idnay
- School of Nursing, Columbia University, New York, New York, USA
- Department of Neurology, Columbia University, New York, New York, USA
| | - Caitlin Dreisbach
- Data Science Institute, Columbia University, New York, New York, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Rebecca Schnall
- School of Nursing, Columbia University, New York, New York, USA
| |
Collapse
|
21
|
Shi W, Vasishta S, Dow L, Cavellini D, Palmer C, McKinstry B, Sullivan F. Early experience with an opt-in research register - Scottish Health Research Register (SHARE): a multi-method evaluation of participant recruitment performance. BMC Med Res Methodol 2021; 21:286. [PMID: 34930144 PMCID: PMC8686271 DOI: 10.1186/s12874-021-01479-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 11/28/2021] [Indexed: 01/01/2023] Open
Abstract
Background Recruiting participants to a clinical study is a resource-intensive process with a high failure rate. The Scottish Health Research Register (SHARE) provides recruitment support service which helps researchers recruit participants by searching patients’ Electronic Health Records (EHRs). The current study aims to evaluate the performance of SHARE in participant recruitment. Methods Recruitment projects eligible for evaluation were those that were conducted for clinical trials or observational studies and finished before 2020. For analysis of recruitment data, projects with incomplete data were excluded. For each project we calculated, from SHARE records, 1) the fraction of the participants recruited through SHARE as a percentage of the number requested by researchers (percentage fulfilled), 2) the percentage of the potential candidates provided by SHARE to researchers that were actually recruited (percentage provided and recruited), 3) the percentage of the participants recruited through SHARE of all the potentially eligible candidates identified by searching registrants’ EHRs (percentage identified and recruited). Research teams of the eligible projects were invited to participate in an anonymised online survey. Two metrics were derived from research teams’ responses, including a) the fraction of the recruited over the study target number of participants (percentage fulfilled), and b) the percentage of the participants recruited through SHARE among the candidates received from SHARE (percentage provided and recruited). Results Forty-four projects were eligible for inclusion. Recruitment data for 24 projects were available (20 excluded because of missingness or incompleteness). Survey invites were sent to all the eligible research teams and received 12 responses. Analysis of recruitment data shows the overall percentage fulfilled was 34.2% (interquartile 13.3–45.1%), the percentage provided and recruited 29.3% (interquartile 20.6–52.4%) and percentage identified and recruited 4.9% (interquartile 2.6–10.2%). Based on the data reported by researchers, percentage fulfilled was 31.7% (interquartile 5.8–59.6%) and percentage provided and recruited was 20.2% (interquartile 8.2–31.0%). Conclusions SHARE may be a valuable resource for recruiting participants for some clinical studies. Potential improvements are to expand the registrant base and to incorporate more data generated during patients’ different health care encounters into the candidate-searching step. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01479-4.
Collapse
Affiliation(s)
- Wen Shi
- Population and Behavioural Science Division, School of Medicine, University of St Andrews, North Haugh, Fife, St Andrews, KY16 9TF, UK.
| | - Shobna Vasishta
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, UK
| | - Louise Dow
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, UK
| | - Daniella Cavellini
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, UK
| | - Colin Palmer
- Division of Population Health and Genomics, School of Medicine, University of Dundee, Dundee, UK
| | | | - Frank Sullivan
- Population and Behavioural Science Division, School of Medicine, University of St Andrews, North Haugh, Fife, St Andrews, KY16 9TF, UK
| |
Collapse
|
22
|
Wu J, Yakubov A, Abdul-Hay M, Love E, Kroening G, Cohen D, Spalink C, Joshi A, Balar A, Joseph KA, Ravenell J, Mehnert J. Prescreening to Increase Therapeutic Oncology Trial Enrollment at the Largest Public Hospital in the United States. JCO Oncol Pract 2021; 18:e620-e625. [PMID: 34748371 DOI: 10.1200/op.21.00629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
PURPOSE The recruitment of underserved patients into therapeutic oncology trials is imperative. The National Institutes of Health mandates the inclusion of minorities in clinical research, although their participation remains under-represented. Institutions have used data mining to match patients to clinical trials. In a public health care system, such expensive tools are unavailable. METHODS The NYU Clinical Trials Office implemented a quality improvement program at Bellevue Hospital Cancer Center to increase therapeutic trial enrollment. Patients are screened through the electronic medical record, tumor board conferences, and the cancer registry. Our analysis evaluated two variables: number of patients identified and those enrolled into clinical trials. RESULTS Two years before the program, there were 31 patients enrolled. For a period of 24 months (July 2017 to July 2019), we identified 255 patients, of whom 143 (56.1%) were enrolled. Of those enrolled, 121 (84.6%) received treatment, and 22 (15%) were screen failures. Fifty-five (38.5%) were referred to NYU Perlmutter Cancer Center for therapy. Of the total enrollees, 64% were female, 56% were non-White, and overall median age was 55 years (range: 33-88 years). Our participants spoke 16 different languages, and 57% were non-English-speaking. We enrolled patients into eight different disease categories, with 38% recruited to breast cancer trials. Eighty-three percent of our patients reside in low-income areas, with 62% in both low-income and Health Professional Shortage Areas. CONCLUSION Prescreening at Bellevue has led to a 4.6-fold increase in patient enrollment to clinical trials. Future research into using prescreening programs at public institutions may improve access to clinical trials for underserved populations.
Collapse
Affiliation(s)
- Jennifer Wu
- NYU Grossman School of Medicine, New York, NY
| | | | | | - Erica Love
- NYU Grossman School of Medicine, New York, NY
| | | | | | | | | | - Arjun Balar
- NYU Grossman School of Medicine, New York, NY
| | | | | | | |
Collapse
|
23
|
Ni Y, Bachtel A, Nause K, Beal S. Automated detection of substance use information from electronic health records for a pediatric population. J Am Med Inform Assoc 2021; 28:2116-2127. [PMID: 34333636 PMCID: PMC8449626 DOI: 10.1093/jamia/ocab116] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 05/06/2021] [Accepted: 05/26/2021] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVE Substance use screening in adolescence is unstandardized and often documented in clinical notes, rather than in structured electronic health records (EHRs). The objective of this study was to integrate logic rules with state-of-the-art natural language processing (NLP) and machine learning technologies to detect substance use information from both structured and unstructured EHR data. MATERIALS AND METHODS Pediatric patients (10-20 years of age) with any encounter between July 1, 2012, and October 31, 2017, were included (n = 3890 patients; 19 478 encounters). EHR data were extracted at each encounter, manually reviewed for substance use (alcohol, tobacco, marijuana, opiate, any use), and coded as lifetime use, current use, or family use. Logic rules mapped structured EHR indicators to screening results. A knowledge-based NLP system and a deep learning model detected substance use information from unstructured clinical narratives. System performance was evaluated using positive predictive value, sensitivity, negative predictive value, specificity, and area under the receiver-operating characteristic curve (AUC). RESULTS The dataset included 17 235 structured indicators and 27 141 clinical narratives. Manual review of clinical narratives captured 94.0% of positive screening results, while structured EHR data captured 22.0%. Logic rules detected screening results from structured data with 1.0 and 0.99 for sensitivity and specificity, respectively. The knowledge-based system detected substance use information from clinical narratives with 0.86, 0.79, and 0.88 for AUC, sensitivity, and specificity, respectively. The deep learning model further improved detection capacity, achieving 0.88, 0.81, and 0.85 for AUC, sensitivity, and specificity, respectively. Finally, integrating predictions from structured and unstructured data achieved high detection capacity across all cases (0.96, 0.85, and 0.87 for AUC, sensitivity, and specificity, respectively). CONCLUSIONS It is feasible to detect substance use screening and results among pediatric patients using logic rules, NLP, and machine learning technologies.
Collapse
Affiliation(s)
- Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, Ohio, USA
- Corresponding Author: Yizhao Ni, PhD, Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Department of Pediatrics, University of Cincinnati, 3333 Burnet Avenue, Cincinnati, OH 45229-3039, USA;
| | - Alycia Bachtel
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Katie Nause
- Division of Psychology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Sarah Beal
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, Ohio, USA
- Division of Psychology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| |
Collapse
|
24
|
Ronquillo JG, Lester WT. Practical Aspects of Implementing and Applying Health Care Cloud Computing Services and Informatics to Cancer Clinical Trial Data. JCO Clin Cancer Inform 2021; 5:826-832. [PMID: 34383582 DOI: 10.1200/cci.21.00018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Cloud computing has led to dramatic growth in the volume, variety, and velocity of cancer data. However, cloud platforms and services present new challenges for cancer research, particularly in understanding the practical tradeoffs between cloud performance, cost, and complexity. The goal of this study was to describe the practical challenges when using a cloud-based service to improve the cancer clinical trial matching process. METHODS We collected information for all interventional cancer clinical trials from ClinicalTrials.gov and used the Google Cloud Healthcare Natural Language Application Programming Interface (API) to analyze clinical trial Title and Eligibility Criteria text. An informatics pipeline leveraging interoperability standards summarized the distribution of cancer clinical trials, genes, laboratory tests, and medications extracted from cloud-based entity analysis. RESULTS There were a total of 38,851 cancer-related clinical trials found in this study, with the distribution of cancer categories extracted from Title text significantly different than in ClinicalTrials.gov (P < .001). Cloud-based entity analysis of clinical trial criteria identified a total of 949 genes, 1,782 laboratory tests, 2,086 medications, and 4,902 National Cancer Institute Thesaurus terms, with estimated detection accuracies ranging from 12.8% to 89.9%. A total of 77,702 API calls processed an estimated 167,179 text records, which took a total of 1,979 processing-minutes (33.0 processing-hours), or approximately 1.5 seconds per API call. CONCLUSION Current general-purpose cloud health care tools-like the Google service in this study-should not be used for automated clinical trial matching unless they can perform effective extraction and classification of the clinical, genetic, and medication concepts central to precision oncology research. A strong understanding of the practical aspects of cloud computing will help researchers effectively navigate the vast data ecosystems in cancer research.
Collapse
Affiliation(s)
- Jay G Ronquillo
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD.,Office of Data Science Strategy, National Institutes of Health, Bethesda, MD
| | - William T Lester
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA.,Harvard Medical School, Boston, MA
| |
Collapse
|
25
|
Cai T, Cai F, Dahal KP, Cremone G, Lam E, Golnik C, Seyok T, Hong C, Cai T, Liao KP. Improving the Efficiency of Clinical Trial Recruitment Using an Ensemble Machine Learning to Assist With Eligibility Screening. ACR Open Rheumatol 2021; 3:593-600. [PMID: 34296815 PMCID: PMC8449035 DOI: 10.1002/acr2.11289] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 05/18/2021] [Indexed: 11/22/2022] Open
Abstract
Objective Efficiently identifying eligible patients is a crucial first step for a successful clinical trial. The objective of this study was to test whether an approach using electronic health record (EHR) data and an ensemble machine learning algorithm incorporating billing codes and data from clinical notes processed by natural language processing (NLP) can improve the efficiency of eligibility screening. Methods We studied patients screened for a clinical trial of rheumatoid arthritis (RA) with one or more International Classification of Diseases (ICD) code for RA and age greater than 35 years, from a tertiary care center and a community hospital. The following three groups of EHR features were considered for the algorithm: 1) structured features, 2) the counts of NLP concepts from notes, 3) health care utilization. All features were linked to dates. We applied random forest and logistic regression with least absolute shrinkage and selection operator penalty against the following two standard approaches: 1) one or more RA ICD code and no ICD codes related to exclusion criteria (ScreenRAICD1+EX) and 2) two or more RA ICD codes (ScreenRAICD2). To test the portability, we trained the algorithm at one institution and tested it at the other. Results In total, 3359 patients at Brigham and Women’s Hospital (BWH) and 642 patients at Faulkner Hospital (FH) were studied, with 461 (13.7%) eligible patients at BWH and 84 (13.4%) at FH. The application of the algorithm reduced ineligible patients from chart review by 40.5% at the tertiary care center and by 57.0% at the community hospital. In contrast, ScreenRAICD2 reduced patients for chart review by 2.7% to 11.3%; ScreenRAICD1+EX reduced patients for chart review by 63% to 65% but excluded 22% to 27% of eligible patients. Conclusion The ensemble machine learning algorithm incorporating billing codes and NLP data increased the efficiency of eligibility screening by reducing the number of patients requiring chart review while not excluding eligible patients. Moreover, this approach can be trained at one institution and applied at another for multicenter clinical trials.
Collapse
Affiliation(s)
- Tianrun Cai
- Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Fiona Cai
- Massachusetts Institute of Technology, Cambridge, Massachusetts, United States
| | - Kumar P Dahal
- Brigham and Women's Hospital, Boston, Massachusetts, United States
| | | | - Ethan Lam
- Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Charlotte Golnik
- Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Thany Seyok
- Brigham and Women's Hospital, Boston, Massachusetts, United States
| | - Chuan Hong
- Harvard University, Boston, Massachusetts, United States
| | - Tianxi Cai
- Harvard University, Boston, Massachusetts, United States
| | - Katherine P Liao
- Brigham and Women's Hospital, Harvard University, and Veterans Affairs Boston Healthcare System, Boston, Massachusetts, United States
| |
Collapse
|
26
|
O'Brien EC, Raman SR, Ellis A, Hammill BG, Berdan LG, Rorick T, Janmohamed S, Lampron Z, Hernandez AF, Curtis LH. The use of electronic health records for recruitment in clinical trials: a mixed methods analysis of the Harmony Outcomes Electronic Health Record Ancillary Study. Trials 2021; 22:465. [PMID: 34281607 PMCID: PMC8287813 DOI: 10.1186/s13063-021-05397-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 06/24/2021] [Indexed: 11/22/2022] Open
Abstract
Background The electronic health record (EHR) contains a wealth of clinical data that may be used to streamline the identification of potential clinical trial participants. However, there is little empirical information on site-level facilitators of and barriers to optimal use of EHR systems with respect to trial recruitment. Methods We conducted qualitative focus groups and quantitative surveys as part of the EHR Ancillary Study, which is being conducted alongside the multicenter, global, Harmony Outcomes Trial comparing albiglutide to standard care for the prevention of cardiovascular events in type 2 diabetes. Subject matter experts used findings from focus groups to draft a 20-question survey examining the use of the EHR for participant identification, common site recruitment strategies, and variation in perceived barriers to optimal use of the EHR. The final survey was fielded with 446 site investigators actively enrolling participants in the main trial. Results Nearly two-thirds of respondents were study coordinators (63.2%), 23.1% were principal investigators, and 13.7% held other research roles. Approximately half of the respondents reported using the EHR to find potential trial participants. Of these, 79.4% reported using EHR searches in conjunction with other recruitment methods, including reviewing of upcoming clinic schedules (75.3%) and contacting past trial participants (71.2%). Important barriers to optimal use of the EHR included the lack of availability of certain research-focused EHR modules and limitations on the ability to contact patients cared for by other providers. Of survey respondents who did not use the EHR to find potential participants, one-quarter reported that the EHR was not accessible in their country; this finding varied from 2.6% of respondents in North America to 50% of respondents in the Asia Pacific. Conclusions While EHR screening was commonly used for recruitment in a cardiovascular outcomes trial, important technical, governance, and regulatory barriers persist. Multifaceted, scalable, and customizable strategies are needed to support the optimal use of the EHR for trial participant identification. Trial registration ClinicalTrials.gov NCT02465515. Registered on 8 June 2015 Supplementary Information The online version contains supplementary material available at 10.1186/s13063-021-05397-0.
Collapse
Affiliation(s)
- Emily C O'Brien
- Duke Clinical Research Institute, Durham, NC, USA. .,Department of Population Health Sciences, Duke University School of Medicine, 215 Morris Street, Suite 210, Durham, NC, 27701, USA.
| | - Sudha R Raman
- Department of Population Health Sciences, Duke University School of Medicine, 215 Morris Street, Suite 210, Durham, NC, 27701, USA
| | - Alicia Ellis
- Duke Clinical Research Institute, Durham, NC, USA.,UCB, Durham, NC, USA
| | - Bradley G Hammill
- Duke Clinical Research Institute, Durham, NC, USA.,Department of Population Health Sciences, Duke University School of Medicine, 215 Morris Street, Suite 210, Durham, NC, 27701, USA
| | | | - Tyrus Rorick
- Duke Clinical Research Institute, Durham, NC, USA
| | | | | | - Adrian F Hernandez
- Duke Clinical Research Institute, Durham, NC, USA.,Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - Lesley H Curtis
- Duke Clinical Research Institute, Durham, NC, USA.,Department of Population Health Sciences, Duke University School of Medicine, 215 Morris Street, Suite 210, Durham, NC, 27701, USA
| |
Collapse
|
27
|
von Itzstein MS, Hullings M, Mayo H, Beg MS, Williams EL, Gerber DE. Application of Information Technology to Clinical Trial Evaluation and Enrollment: A Review. JAMA Oncol 2021; 7:1559-1566. [PMID: 34236403 DOI: 10.1001/jamaoncol.2021.1165] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Importance As cancer treatment has become more individualized, oncologic clinical trials have become more complex. Increasingly numerous and stringent eligibility criteria frequently include tumor molecular or genomic characteristics that may not be readily identified in medical records, rendering it difficult to best match clinical trials with clinical sites and to identify potentially eligible patients once a clinical trial has been selected and activated. Partly because of these factors, enrollment rates for cancer clinical trials remain low, creating delays and increased costs for drug development. Information technology (IT) platforms have been applied to the implementation and conduct of clinical trials to improve efficiencies in several medical fields, and these platforms have recently been introduced to oncologic studies. Observations This review summarizes cancer and noncancer studies that used IT platforms for assistance with clinical trial site selection, patient recruitment, and patient screening. The review does not address the use of IT in other aspects of clinical research, such as wearable physical activity monitors or telehealth visits. A large number of IT platforms (which may be patient facing, site or investigator facing, or sponsor facing) are now commercially available. These applications use artificial intelligence and/or natural language processing to identify and summarize protocol eligibility criteria, institutional patient populations, and individual electronic health records. Although there is an expanding body of literature examining the role of this technology, relatively few studies to date have been performed in oncologic settings. Conclusions and Relevance This review found that an increasing number and variety of IT platforms were available to assist in the planning and conduct of clinical trials. Because oncologic clinical care and clinical trial protocols are particularly complex, nuanced, and individualized, published experience with this technology in other fields may not be fully applicable to cancer settings. The extent to which these services will overcome ongoing and increasing challenges in cancer clinical research remains unclear.
Collapse
Affiliation(s)
- Mitchell S von Itzstein
- Department of Internal Medicine, Division of Hematology-Oncology, The University of Texas Southwestern Medical Center, Dallas.,Harold C. Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas
| | - Melanie Hullings
- Harold C. Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas
| | - Helen Mayo
- Southwestern Health Sciences Digital Library and Learning Center, The University of Texas, Dallas
| | - M Shaalan Beg
- Department of Internal Medicine, Division of Hematology-Oncology, The University of Texas Southwestern Medical Center, Dallas.,Harold C. Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas
| | - Erin L Williams
- Harold C. Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas
| | - David E Gerber
- Department of Internal Medicine, Division of Hematology-Oncology, The University of Texas Southwestern Medical Center, Dallas.,Harold C. Simmons Comprehensive Cancer Center, The University of Texas Southwestern Medical Center, Dallas.,Department of Population and Data Sciences, The University of Texas, Southwestern Medical Center, Dallas
| |
Collapse
|
28
|
Rogers JR, Lee J, Zhou Z, Cheung YK, Hripcsak G, Weng C. Contemporary use of real-world data for clinical trial conduct in the United States: a scoping review. J Am Med Inform Assoc 2021; 28:144-154. [PMID: 33164065 DOI: 10.1093/jamia/ocaa224] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 08/11/2020] [Accepted: 09/02/2020] [Indexed: 12/28/2022] Open
Abstract
OBJECTIVE Real-world data (RWD), defined as routinely collected healthcare data, can be a potential catalyst for addressing challenges faced in clinical trials. We performed a scoping review of database-specific RWD applications within clinical trial contexts, synthesizing prominent uses and themes. MATERIALS AND METHODS Querying 3 biomedical literature databases, research articles using electronic health records, administrative claims databases, or clinical registries either within a clinical trial or in tandem with methodology related to clinical trials were included. Articles were required to use at least 1 US RWD source. All abstract screening, full-text screening, and data extraction was performed by 1 reviewer. Two reviewers independently verified all decisions. RESULTS Of 2020 screened articles, 89 qualified: 59 articles used electronic health records, 29 used administrative claims, and 26 used registries. Our synthesis was driven by the general life cycle of a clinical trial, culminating into 3 major themes: trial process tasks (51 articles); dissemination strategies (6); and generalizability assessments (34). Despite a diverse set of diseases studied, <10% of trials using RWD for trial process tasks evaluated medications or procedures (5/51). All articles highlighted data-related challenges, such as missing values. DISCUSSION Database-specific RWD have been occasionally leveraged for various clinical trial tasks. We observed underuse of RWD within conducted medication or procedure trials, though it is subject to the confounder of implicit report of RWD use. CONCLUSION Enhanced incorporation of RWD should be further explored for medication or procedure trials, including better understanding of how to handle related data quality issues to facilitate RWD use.
Collapse
Affiliation(s)
- James R Rogers
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Junghwan Lee
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Ziheng Zhou
- Institute of Human Nutrition, Columbia University, New York, New York, USA
| | - Ying Kuen Cheung
- Department of Biostatistics, Columbia University, New York, New York, USA, and
| | - George Hripcsak
- Department of Biomedical Informatics, Columbia University, New York, New York, USA.,Medical Informatics Services, New York-Presbyterian Hospital, New York, New York, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| |
Collapse
|
29
|
Zong H, Yang J, Zhang Z, Li Z, Zhang X. Semantic categorization of Chinese eligibility criteria in clinical trials using machine learning methods. BMC Med Inform Decis Mak 2021; 21:128. [PMID: 33858409 PMCID: PMC8050926 DOI: 10.1186/s12911-021-01487-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Accepted: 04/01/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Semantic categorization analysis of clinical trials eligibility criteria based on natural language processing technology is crucial for the task of optimizing clinical trials design and building automated patient recruitment system. However, most of related researches focused on English eligibility criteria, and to the best of our knowledge, there are no researches studied the Chinese eligibility criteria. Thus in this study, we aimed to explore the semantic categories of Chinese eligibility criteria. METHODS We downloaded the clinical trials registration files from the website of Chinese Clinical Trial Registry (ChiCTR) and extracted both the Chinese eligibility criteria and corresponding English eligibility criteria. We represented the criteria sentences based on the Unified Medical Language System semantic types and conducted the hierarchical clustering algorithm for the induction of semantic categories. Furthermore, in order to explore the classification performance of Chinese eligibility criteria with our developed semantic categories, we implemented multiple classification algorithms, include four baseline machine learning algorithms (LR, NB, kNN, SVM), three deep learning algorithms (CNN, RNN, FastText) and two pre-trained language models (BERT, ERNIE). RESULTS We totally developed 44 types of semantic categories, summarized 8 topic groups, and investigated the average incidence and prevalence in 272 hepatocellular carcinoma related Chinese clinical trials. Compared with the previous proposed categories in English eligibility criteria, 13 novel categories are identified in Chinese eligibility criteria. The classification result shows that most of semantic categories performed quite well, the pre-trained language model ERNIE achieved best performance with macro-average F1 score of 0.7980 and micro-average F1 score of 0.8484. CONCLUSION As a pilot study of Chinese eligibility criteria analysis, we developed the 44 semantic categories by hierarchical clustering algorithms for the first times, and validated the classification capacity with multiple classification algorithms.
Collapse
Affiliation(s)
- Hui Zong
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Jinxuan Yang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Zeyu Zhang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China
| | - Zuofeng Li
- Philips Research China, Shanghai, 200072, China
| | - Xiaoyan Zhang
- Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, Shanghai, 200092, China.
| |
Collapse
|
30
|
Naceanceno KS, House SL, Asaro PV. Shared-Task Worklists Improve Clinical Trial Recruitment Workflow in an Academic Emergency Department. Appl Clin Inform 2021; 12:293-300. [PMID: 33827142 DOI: 10.1055/s-0041-1727153] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Abstract
BACKGROUND Clinical trials performed in our emergency department at Barnes-Jewish Hospital utilize a centralized infrastructure for alerting, screening, and enrollment with rule-based alerts sent to clinical research coordinators. Previously, all alerts were delivered as text messages via dedicated cellular phones. As the number of ongoing clinical trials increased, the volume of alerts grew to an unmanageable level. Therefore, we have changed our primary notification delivery method to study-specific, shared-task worklists integrated with our pre-existing web-based screening documentation system. OBJECTIVE To evaluate the effects on screening and recruitment workflow of replacing text-message delivery of clinical trial alerts with study-specific shared-task worklists in a high-volume academic emergency department supporting multiple concurrent clinical trials. METHODS We analyzed retrospective data on alerting, screening, and enrollment for 10 active clinical trials pre- and postimplementation of shared-task worklists. RESULTS Notifications signaling the presence of potentially eligible subjects for clinical trials were more likely to result in a screen (p < 0.001) with the implementation of shared-task worklists compared with notifications delivered as text messages for 8/10 clinical trials. The change in workflow did not alter the likelihood of a notification resulting in an enrollment (p = 0.473). The Director of Research reported a substantial reduction in the amount of time spent redirecting clinical research coordinator screening activities. CONCLUSION Shared-task worklists, with the functionalities we have described, offer a viable alternative to delivery of clinical trial alerts via text message directly to clinical research coordinators recruiting for multiple concurrent clinical trials in a high-volume academic emergency department.
Collapse
Affiliation(s)
- Kevin S Naceanceno
- Washington University School of Medicine, St. Louis, Missouri, United States
| | - Stacey L House
- Department of Emergency Medicine, Washington University School of Medicine, St. Louis, Missouri, United States
| | - Phillip V Asaro
- Department of Emergency Medicine, Washington University School of Medicine, St. Louis, Missouri, United States
| |
Collapse
|
31
|
Evaluating eligibility criteria of oncology trials using real-world data and AI. Nature 2021; 592:629-633. [PMID: 33828294 DOI: 10.1038/s41586-021-03430-5] [Citation(s) in RCA: 99] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 03/08/2021] [Indexed: 01/04/2023]
Abstract
There is a growing focus on making clinical trials more inclusive but the design of trial eligibility criteria remains challenging1-3. Here we systematically evaluate the effect of different eligibility criteria on cancer trial populations and outcomes with real-world data using the computational framework of Trial Pathfinder. We apply Trial Pathfinder to emulate completed trials of advanced non-small-cell lung cancer using data from a nationwide database of electronic health records comprising 61,094 patients with advanced non-small-cell lung cancer. Our analyses reveal that many common criteria, including exclusions based on several laboratory values, had a minimal effect on the trial hazard ratios. When we used a data-driven approach to broaden restrictive criteria, the pool of eligible patients more than doubled on average and the hazard ratio of the overall survival decreased by an average of 0.05. This suggests that many patients who were not eligible under the original trial criteria could potentially benefit from the treatments. We further support our findings through analyses of other types of cancer and patient-safety data from diverse clinical trials. Our data-driven methodology for evaluating eligibility criteria can facilitate the design of more-inclusive trials while maintaining safeguards for patient safety.
Collapse
|
32
|
Jain N, Mittendorf KF, Holt M, Lenoue-Newton M, Maurer I, Miller C, Stachowiak M, Botyrius M, Cole J, Micheel C, Levy M. The My Cancer Genome clinical trial data model and trial curation workflow. J Am Med Inform Assoc 2021; 27:1057-1066. [PMID: 32483629 PMCID: PMC7647323 DOI: 10.1093/jamia/ocaa066] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 04/07/2020] [Accepted: 04/17/2020] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE As clinical trials evolve in complexity, clinical trial data models that can capture relevant trial data in meaningful, structured annotations and computable forms are needed to support accrual. MATERIAL AND METHODS We have developed a clinical trial information model, curation information system, and a standard operating procedure for consistent and accurate annotation of cancer clinical trials. Clinical trial documents are pulled into the curation system from publicly available sources. Using a web-based interface, a curator creates structured assertions related to disease-biomarker eligibility criteria, therapeutic context, and treatment cohorts by leveraging our data model features. These structured assertions are published on the My Cancer Genome (MCG) website. RESULTS To date, over 5000 oncology trials have been manually curated. All trial assertion data are available for public view on the MCG website. Querying our structured knowledge base, we performed a landscape analysis to assess the top diseases, biomarker alterations, and drugs featured across all cancer trials. DISCUSSION Beyond curating commonly captured elements, such as disease and biomarker eligibility criteria, we have expanded our model to support the curation of trial interventions and therapeutic context (ie, neoadjuvant, metastatic, etc.), and the respective biomarker-disease treatment cohorts. To the best of our knowledge, this is the first effort to capture these fields in a structured format. CONCLUSION This paper makes a significant contribution to the field of biomedical informatics and knowledge dissemination for precision oncology via the MCG website. KEY WORDS knowledge representation, My Cancer Genome, precision oncology, knowledge curation, cancer informatics, clinical trial data model.
Collapse
Affiliation(s)
- Neha Jain
- Vanderbilt Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Kathleen F Mittendorf
- Vanderbilt Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Marilyn Holt
- Vanderbilt Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Michele Lenoue-Newton
- Vanderbilt Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | | | | | | | | | | | - Christine Micheel
- Vanderbilt Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Medicine, Division of Hematology/Oncology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Mia Levy
- Department of Internal Medicine, Division of Hematology/Oncology, Rush University Medical Center, Chicago, Illinois, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
33
|
Abstract
Machine learning (ML) has been slowly entering every aspect of our lives and its positive impact has been astonishing. To accelerate embedding ML in more applications and incorporating it in real-world scenarios, automated machine learning (AutoML) is emerging. The main purpose of AutoML is to provide seamless integration of ML in various industries, which will facilitate better outcomes in everyday tasks. In healthcare, AutoML has been already applied to easier settings with structured data such as tabular lab data. However, there is still a need for applying AutoML for interpreting medical text, which is being generated at a tremendous rate. For this to happen, a promising method is AutoML for clinical notes analysis, which is an unexplored research area representing a gap in ML research. The main objective of this paper is to fill this gap and provide a comprehensive survey and analytical study towards AutoML for clinical notes. To that end, we first introduce the AutoML technology and review its various tools and techniques. We then survey the literature of AutoML in the healthcare industry and discuss the developments specific to clinical settings, as well as those using general AutoML tools for healthcare applications. With this background, we then discuss challenges of working with clinical notes and highlight the benefits of developing AutoML for medical notes processing. Next, we survey relevant ML research for clinical notes and analyze the literature and the field of AutoML in the healthcare industry. Furthermore, we propose future research directions and shed light on the challenges and opportunities this emerging field holds. With this, we aim to assist the community with the implementation of an AutoML platform for medical notes, which if realized can revolutionize patient outcomes.
Collapse
|
34
|
Bitterman DS, Miller TA, Mak RH, Savova GK. Clinical Natural Language Processing for Radiation Oncology: A Review and Practical Primer. Int J Radiat Oncol Biol Phys 2021; 110:641-655. [PMID: 33545300 DOI: 10.1016/j.ijrobp.2021.01.044] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 12/22/2020] [Accepted: 01/23/2021] [Indexed: 02/07/2023]
Abstract
Natural language processing (NLP), which aims to convert human language into expressions that can be analyzed by computers, is one of the most rapidly developing and widely used technologies in the field of artificial intelligence. Natural language processing algorithms convert unstructured free text data into structured data that can be extracted and analyzed at scale. In medicine, this unlocking of the rich, expressive data within clinical free text in electronic medical records will help untap the full potential of big data for research and clinical purposes. Recent major NLP algorithmic advances have significantly improved the performance of these algorithms, leading to a surge in academic and industry interest in developing tools to automate information extraction and phenotyping from clinical texts. Thus, these technologies are poised to transform medical research and alter clinical practices in the future. Radiation oncology stands to benefit from NLP algorithms if they are appropriately developed and deployed, as they may enable advances such as automated inclusion of radiation therapy details into cancer registries, discovery of novel insights about cancer care, and improved patient data curation and presentation at the point of care. However, challenges remain before the full value of NLP is realized, such as the plethora of jargon specific to radiation oncology, nonstandard nomenclature, a lack of publicly available labeled data for model development, and interoperability limitations between radiation oncology data silos. Successful development and implementation of high quality and high value NLP models for radiation oncology will require close collaboration between computer scientists and the radiation oncology community. Here, we present a primer on artificial intelligence algorithms in general and NLP algorithms in particular; provide guidance on how to assess the performance of such algorithms; review prior research on NLP algorithms for oncology; and describe future avenues for NLP in radiation oncology research and clinics.
Collapse
Affiliation(s)
- Danielle S Bitterman
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, Massachusetts; Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts; Artificial Intelligence in Medicine Program, Brigham and Women's Hospital, Boston, Massachusetts.
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
| | - Raymond H Mak
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, Massachusetts; Artificial Intelligence in Medicine Program, Brigham and Women's Hospital, Boston, Massachusetts
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
| |
Collapse
|
35
|
Stubbs A, Filannino M, Soysal E, Henry S, Uzuner Ö. Cohort selection for clinical trials: n2c2 2018 shared task track 1. J Am Med Inform Assoc 2021; 26:1163-1171. [PMID: 31562516 DOI: 10.1093/jamia/ocz163] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 08/07/2019] [Accepted: 09/18/2019] [Indexed: 01/02/2023] Open
Abstract
OBJECTIVE Track 1 of the 2018 National NLP Clinical Challenges shared tasks focused on identifying which patients in a corpus of longitudinal medical records meet and do not meet identified selection criteria. MATERIALS AND METHODS To address this challenge, we annotated American English clinical narratives for 288 patients according to whether they met these criteria. We chose criteria from existing clinical trials that represented a variety of natural language processing tasks, including concept extraction, temporal reasoning, and inference. RESULTS A total of 47 teams participated in this shared task, with 224 participants in total. The participants represented 18 countries, and the teams submitted 109 total system outputs. The best-performing system achieved a micro F1 score of 0.91 using a rule-based approach. The top 10 teams used rule-based and hybrid systems to approach the problems. DISCUSSION Clinical narratives are open to interpretation, particularly in cases where the selection criterion may be underspecified. This leaves room for annotators to use domain knowledge and intuition in selecting patients, which may lead to error in system outputs. However, teams who consulted medical professionals while building their systems were more likely to have high recall for patients, which is preferable for patient selection systems. CONCLUSIONS There is not yet a 1-size-fits-all solution for natural language processing systems approaching this task. Future research in this area can look to examining criteria requiring even more complex inferences, temporal reasoning, and domain knowledge.
Collapse
Affiliation(s)
- Amber Stubbs
- Department of Mathematics and Computer Science, Simmons University, Boston, Massachusetts, USA
| | - Michele Filannino
- Information Sciences and Technology, George Mason University, Fairfax, Virginia, USA.,Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Ergin Soysal
- School of Biomedical Informatics, University of Texas Health Science Center, Houston, Texas, USA
| | - Samuel Henry
- Information Sciences and Technology, George Mason University, Fairfax, Virginia, USA
| | - Özlem Uzuner
- Information Sciences and Technology, George Mason University, Fairfax, Virginia, USA.,Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
36
|
Nehme F, Feldman K. Evolving Role and Future Directions of Natural Language Processing in Gastroenterology. Dig Dis Sci 2021; 66:29-40. [PMID: 32107677 DOI: 10.1007/s10620-020-06156-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 02/18/2020] [Indexed: 02/06/2023]
Abstract
In line with the current trajectory of healthcare reform, significant emphasis has been placed on improving the utilization of data collected during a clinical encounter. Although the structured fields of electronic health records have provided a convenient foundation on which to begin such efforts, it was well understood that a substantial portion of relevant information is confined in the free-text narratives documenting care. Unfortunately, extracting meaningful information from such narratives is a non-trivial task, traditionally requiring significant manual effort. Today, computational approaches from a field known as Natural Language Processing (NLP) are poised to make a transformational impact in the analysis and utilization of these documents across healthcare practice and research, particularly in procedure-heavy sub-disciplines such as gastroenterology (GI). As such, this manuscript provides a clinically focused review of NLP systems in GI practice. It begins with a detailed synopsis around the state of NLP techniques, presenting state-of-the-art methods and typical use cases in both clinical settings and across other domains. Next, it will present a robust literature review around current applications of NLP within four prominent areas of gastroenterology including endoscopy, inflammatory bowel disease, pancreaticobiliary, and liver diseases. Finally, it concludes with a discussion of open problems and future opportunities of this technology in the field of gastroenterology and health care as a whole.
Collapse
Affiliation(s)
- Fredy Nehme
- Department of Gastroenterology and Hepatology, University of Missouri-Kansas City School of Medicine, 5000 Holmes Street, Kansas City, MO, 64110, USA.
| | - Keith Feldman
- Division of Health Services and Outcomes Research, Children's Mercy Kansas City, Kansas City, MO, USA.,Department of Pediatrics, University of Missouri-Kansas City School of Medicine, Kansas City, MO, USA
| |
Collapse
|
37
|
Artificial intelligence in oncology. Artif Intell Med 2021. [DOI: 10.1016/b978-0-12-821259-2.00018-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
38
|
Chamberlin SR, Bedrick SD, Cohen AM, Wang Y, Wen A, Liu S, Liu H, Hersh WR. Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task. JAMIA Open 2020; 3:395-404. [PMID: 33215074 PMCID: PMC7660955 DOI: 10.1093/jamiaopen/ooaa026] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 04/17/2020] [Accepted: 06/03/2020] [Indexed: 11/24/2022] Open
Abstract
OBJECTIVE Growing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well understood. The objective of this research was to assess patient-level information retrieval methods using electronic health records for different types of cohort definition retrieval. MATERIALS AND METHODS We developed a test collection consisting of about 100 000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated information retrieval tasks using word-based approaches were performed, varying 4 different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics. RESULTS The best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision but were still not able to recall all relevant patients found by the automated queries. CONCLUSION While word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Future work will focus on using the test collection to develop and evaluate new approaches to query structure, weighting algorithms, and application of semantic methods.
Collapse
Affiliation(s)
- Steven R Chamberlin
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Steven D Bedrick
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
- Center for Spoken Language Understanding, Oregon Health & Science University, Portland, Oregon, USA
| | - Aaron M Cohen
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Yanshan Wang
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Andrew Wen
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Sijia Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hongfang Liu
- Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - William R Hersh
- Department of Medical Informatics & Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| |
Collapse
|
39
|
Beck JT, Rammage M, Jackson GP, Preininger AM, Dankwa-Mullan I, Roebuck MC, Torres A, Holtzen H, Coverdill SE, Williamson MP, Chau Q, Rhee K, Vinegra M. Artificial Intelligence Tool for Optimizing Eligibility Screening for Clinical Trials in a Large Community Cancer Center. JCO Clin Cancer Inform 2020; 4:50-59. [DOI: 10.1200/cci.19.00079] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Less than 5% of patients with cancer enroll in clinical trials, and 1 in 5 trials are stopped for poor accrual. We evaluated an automated clinical trial matching system that uses natural language processing to extract patient and trial characteristics from unstructured sources and machine learning to match patients to clinical trials. PATIENTS AND METHODS Medical records from 997 patients with breast cancer were assessed for trial eligibility at Highlands Oncology Group between May and August 2016. System and manual attribute extraction and eligibility determinations were compared using the percentage of agreement for 239 patients and 4 trials. Sensitivity and specificity of system-generated eligibility determinations were measured, and the time required for manual review and system-assisted eligibility determinations were compared. RESULTS Agreement between system and manual attribute extraction ranged from 64.3% to 94.0%. Agreement between system and manual eligibility determinations was 81%-96%. System eligibility determinations demonstrated specificities between 76% and 99%, with sensitivities between 91% and 95% for 3 trials and 46.7% for the 4th. Manual eligibility screening of 90 patients for 3 trials took 110 minutes; system-assisted eligibility determinations of the same patients for the same trials required 24 minutes. CONCLUSION In this study, the clinical trial matching system displayed a promising performance in screening patients with breast cancer for trial eligibility. System-assisted trial eligibility determinations were substantially faster than manual review, and the system reliably excluded ineligible patients for all trials and identified eligible patients for most trials.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Helen Holtzen
- Research Department, Highlands Oncology Group, Fayetteville, AR
| | | | - M. Paul Williamson
- US Oncology Medical, Novartis Pharmaceuticals Corporation, East Hanover, NJ
| | - Quincy Chau
- US Oncology Medical, Novartis Pharmaceuticals Corporation, East Hanover, NJ
| | - Kyu Rhee
- IBM Watson Health, IBM Corporation, Cambridge, MA
| | - Michael Vinegra
- US Oncology Medical, Novartis Pharmaceuticals Corporation, East Hanover, NJ
| |
Collapse
|
40
|
Johnson EA, Carrington JM. Clinical Research Integration Within the Electronic Health Record: A Literature Review. Comput Inform Nurs 2020; 39:129-135. [PMID: 33657055 DOI: 10.1097/cin.0000000000000659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Clinical trials have become commonplace as a treatment option. As clinical trial participants are integrated into all healthcare delivery settings, organizations are tasked with sustaining specific care regimens with appropriate documentation and maintenance of participant protections within electronic health records. Our aim was to identify the common elements necessary for electronic health record integration of clinical research for optimal trial conduct and participant management. Review of literature was conducted utilizing PubMed and CINAHL to identify relevant publications that described use of the electronic health record to directly support trial conduct, with a total of 15 publications ultimately meeting inclusion criteria. Three thematic groupings emerged that categorized common aspects of clinical research integration: functional, structural, and procedural components. These components include technological requirements (platform/system), regulatory and legal compliance, and stakeholder involvement with clinical trial procedures (recruitment of participants). Without a centralized means of providing clinicians with current treatment and adverse event management information, participant injury or likelihood of withdrawal will increase. Further research is required to develop an optimal model of research-related integration within commercial electronic health records.
Collapse
Affiliation(s)
- Elizabeth A Johnson
- Author Affiliations: The University of Arizona (Ms Johnson), Tucson; and University of Florida (Dr Carrington), Gainesville
| | | |
Collapse
|
41
|
Frampton GK, Shepherd J, Pickett K, Griffiths G, Wyatt JC. Digital tools for the recruitment and retention of participants in randomised controlled trials: a systematic map. Trials 2020; 21:478. [PMID: 32498690 PMCID: PMC7273688 DOI: 10.1186/s13063-020-04358-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 04/28/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Recruiting and retaining participants in randomised controlled trials (RCTs) is challenging. Digital tools, such as social media, data mining, email or text-messaging, could improve recruitment or retention, but an overview of this research area is lacking. We aimed to systematically map the characteristics of digital recruitment and retention tools for RCTs, and the features of the comparative studies that have evaluated the effectiveness of these tools during the past 10 years. METHODS We searched Medline, Embase, other databases, the Internet, and relevant web sites in July 2018 to identify comparative studies of digital tools for recruiting and/or retaining participants in health RCTs. Two reviewers independently screened references against protocol-specified eligibility criteria. Included studies were coded by one reviewer with 20% checked by a second reviewer, using pre-defined keywords to describe characteristics of the studies, populations and digital tools evaluated. RESULTS We identified 9163 potentially relevant references, of which 104 articles reporting 105 comparative studies were included in the systematic map. The number of published studies on digital tools has doubled in the past decade, but most studies evaluated digital tools for recruitment rather than retention. The key health areas investigated were health promotion, cancers, circulatory system diseases and mental health. Few studies focussed on minority or under-served populations, and most studies were observational. The most frequently-studied digital tools were social media, Internet sites, email and tv/radio for recruitment; and email and text-messaging for retention. One quarter of the studies measured efficiency (cost per recruited or retained participant) but few studies have evaluated people's attitudes towards the use of digital tools. CONCLUSIONS This systematic map highlights a number of evidence gaps and may help stakeholders to identify and prioritise further research needs. In particular, there is a need for rigorous research on the efficiency of the digital tools and their impact on RCT participants and investigators, perhaps as studies-within-a-trial (SWAT) research. There is also a need for research into how digital tools may improve participant retention in RCTs which is currently underrepresented relative to recruitment research. REGISTRATION Not registered; based on a pre-specified protocol, peer-reviewed by the project's Advisory Board.
Collapse
Affiliation(s)
- Geoff K. Frampton
- Southampton Health Technology Assessments Centre (SHTAC), Wessex Institute, Faculty of Medicine, University of Southampton, Alpha House, Southampton Science Park, Southampton, SO16 7NS UK
- Wessex Institute, Faculty of Medicine, University of Southampton, Alpha House, Southampton Science Park, Southampton, SO16 7NS UK
| | - Jonathan Shepherd
- Southampton Health Technology Assessments Centre (SHTAC), Wessex Institute, Faculty of Medicine, University of Southampton, Alpha House, Southampton Science Park, Southampton, SO16 7NS UK
- Wessex Institute, Faculty of Medicine, University of Southampton, Alpha House, Southampton Science Park, Southampton, SO16 7NS UK
| | - Karen Pickett
- Southampton Health Technology Assessments Centre (SHTAC), Wessex Institute, Faculty of Medicine, University of Southampton, Alpha House, Southampton Science Park, Southampton, SO16 7NS UK
- Wessex Institute, Faculty of Medicine, University of Southampton, Alpha House, Southampton Science Park, Southampton, SO16 7NS UK
| | - Gareth Griffiths
- Southampton Clinical Trials Unit, University of Southampton and Southampton University Hospital NHS Foundation Trust, Southampton General Hospital, Southampton, SO16 6YD UK
| | - Jeremy C. Wyatt
- Wessex Institute, Faculty of Medicine, University of Southampton, Alpha House, Southampton Science Park, Southampton, SO16 7NS UK
| |
Collapse
|
42
|
Alexander M, Solomon B, Ball DL, Sheerin M, Dankwa-Mullan I, Preininger AM, Jackson GP, Herath DM. Evaluation of an artificial intelligence clinical trial matching system in Australian lung cancer patients. JAMIA Open 2020; 3:209-215. [PMID: 32734161 PMCID: PMC7382632 DOI: 10.1093/jamiaopen/ooaa002] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 01/31/2020] [Indexed: 11/21/2022] Open
Abstract
Objective The objective of this technical study was to evaluate the performance of an artificial intelligence (AI)-based system for clinical trials matching for a cohort of lung cancer patients in an Australian cancer hospital. Methods A lung cancer cohort was derived from clinical data from patients attending an Australian cancer hospital. Ten phases I–III clinical trials registered on clinicaltrials.gov and open to lung cancer patients at this institution were utilized for assessments. The trial matching system performance was compared to a gold standard established by clinician consensus for trial eligibility. Results The study included 102 lung cancer patients. The trial matching system evaluated 7252 patient attributes (per patient median 74, range 53–100) against 11 467 individual trial eligibility criteria (per trial median 597, range 243–4132). Median time for the system to run a query and return results was 15.5 s (range 7.2–37.8). In establishing the gold standard, clinician interrater agreement was high (Cohen’s kappa 0.70–1.00). On a per-patient basis, the performance of the trial matching system for eligibility was as follows: accuracy, 91.6%; recall (sensitivity), 83.3%; precision (positive predictive value), 76.5%; negative predictive value, 95.7%; and specificity, 93.8%. Discussion and Conclusion The AI-based clinical trial matching system allows efficient and reliable screening of cancer patients for clinical trials with 95.7% accuracy for exclusion and 91.6% accuracy for overall eligibility assessment; however, clinician input and oversight are still required. The automated system demonstrates promise as a clinical decision support tool to prescreen a large patient cohort to identify subjects suitable for further assessment.
Collapse
Affiliation(s)
- Marliese Alexander
- Department of Pharmacy, Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.,Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Victoria, Australia
| | - Benjamin Solomon
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Victoria, Australia.,Department of Medical Oncology, Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia
| | - David L Ball
- Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Victoria, Australia.,Department of Radiation Oncology, Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia
| | - Mimi Sheerin
- IBM Watson Health, Cambridge, Massachusetts, USA
| | | | | | | | - Dishan M Herath
- Department of Medical Oncology, Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia
| |
Collapse
|
43
|
Hassanzadeh H, Karimi S, Nguyen A. Matching patients to clinical trials using semantically enriched document representation. J Biomed Inform 2020; 105:103406. [DOI: 10.1016/j.jbi.2020.103406] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 01/28/2020] [Accepted: 03/02/2020] [Indexed: 12/16/2022]
|
44
|
Ni Y, Barzman D, Bachtel A, Griffey M, Osborn A, Sorter M. Finding warning markers: Leveraging natural language processing and machine learning technologies to detect risk of school violence. Int J Med Inform 2020; 139:104137. [PMID: 32361146 DOI: 10.1016/j.ijmedinf.2020.104137] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 02/20/2020] [Accepted: 03/28/2020] [Indexed: 10/24/2022]
Abstract
INTRODUCTION School violence has a far-reaching effect, impacting the entire school population including staff, students and their families. Among youth attending the most violent schools, studies have reported higher dropout rates, poor school attendance, and poor scholastic achievement. It was noted that the largest crime-prevention results occurred when youth at elevated risk were given an individualized prevention program. However, much work is needed to establish an effective approach to identify at-risk subjects. OBJECTIVE In our earlier research, we developed a risk assessment program to interview subjects, identify risk and protective factors, and evaluate risk for school violence. This study focused on developing natural language processing (NLP) and machine learning technologies to automate the risk assessment process. MATERIAL AND METHODS We prospectively recruited 131 students with or without behavioral concerns from 89 schools between 05/01/2015 and 04/30/2018. The subjects were interviewed with two risk assessment scales and a questionnaire, and their risk of violence were determined by pediatric psychiatrists based on clinical judgment. Using NLP technologies, different types of linguistic features were extracted from the interview content. Machine learning classifiers were then applied to predict risk of school violence for individual subjects. A two-stage feature selection was implemented to identify violence-related predictors. The performance was validated on the psychiatrist-generated reference standard of risk levels, where positive predictive value (PPV), sensitivity (SEN), negative predictive value (NPV), specificity (SPEC) and area under the ROC curve (AUC) were assessed. RESULTS Compared to subjects' sociodemographic information, use of linguistic features significantly improved classifiers' predictive performance (P < 0.01). The best-performing classifier with n-gram features achieved 86.5 %/86.5 %/85.7 %/85.7 %/94.0 % (PPV/SEN/NPV/SPEC/AUC) on the cross-validation set and 83.3 %/93.8 %/91.7 %/78.6 %/94.6 % (PPV/SEN/NPV/SPEC/AUC) on the test data. The feature selection process identified a set of predictors covering the discussion of subjects' thoughts, perspectives, behaviors, individual characteristics, peers and family dynamics, and protective factors. CONCLUSIONS By analyzing the content from subject interviews, the NLP and machine learning algorithms showed good capacity for detecting risk of school violence. The feature selection uncovered multiple warning markers that could deliver useful clinical insights to assist personalizing intervention. Consequently, the developed approach offered the promise of an accurate and scalable computerized screening service for preventing school violence.
Collapse
Affiliation(s)
- Yizhao Ni
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States; Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States.
| | - Drew Barzman
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States; Division of Child and Adolescent Psychiatry, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Alycia Bachtel
- Division of Child and Adolescent Psychiatry, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Marcus Griffey
- Division of Child and Adolescent Psychiatry, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Alexander Osborn
- Division of Child and Adolescent Psychiatry, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Michael Sorter
- Department of Pediatrics, College of Medicine, University of Cincinnati, Cincinnati, OH, United States; Division of Child and Adolescent Psychiatry, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| |
Collapse
|
45
|
Tissot HC, Shah AD, Brealey D, Harris S, Agbakoba R, Folarin A, Romao L, Roguski L, Dobson R, Asselbergs FW. Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-Automated Simulation Based on the LeoPARDS Trial. IEEE J Biomed Health Inform 2020; 24:2950-2959. [PMID: 32149659 DOI: 10.1109/jbhi.2020.2977925] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Clinical trials often fail to recruit an adequate number of appropriate patients. Identifying eligible trial participants is resource-intensive when relying on manual review of clinical notes, particularly in critical care settings where the time window is short. Automated review of electronic health records (EHR) may help, but much of the information is in free text rather than a computable form. We applied natural language processing (NLP) to free text EHR data using the CogStack platform to simulate recruitment into the LeoPARDS study, a clinical trial aiming to reduce organ dysfunction in septic shock. We applied an algorithm to identify eligible patients using a moving 1-hour time window, and compared patients identified by our approach with those actually screened and recruited for the trial, for the time period that data were available. We manually reviewed records of a random sample of patients identified by the algorithm but not screened in the original trial. Our method identified 376 patients, including 34 patients with EHR data available who were actually recruited to LeoPARDS in our centre. The sensitivity of CogStack for identifying patients screened was 90% (95% CI 85%, 93%). Of the 203 patients identified by both manual screening and CogStack, the index date matched in 95 (47%) and CogStack was earlier in 94 (47%). In conclusion, analysis of EHR data using NLP could effectively replicate recruitment in a critical care trial, and identify some eligible patients at an earlier stage, potentially improving trial recruitment if implemented in real time.
Collapse
|
46
|
Abstract
OBJECTIVE Challenges with efficient patient recruitment including sociotechnical barriers for clinical trials are major barriers to the timely and efficacious conduct of translational studies. We conducted a time-and-motion study to investigate the workflow of clinical trial enrollment in a pediatric emergency department. METHODS We observed clinical research coordinators during 3 clinically staffed shifts. One clinical research coordinator was shadowed at a time. Tasks were marked in 30-second intervals and annotated to include patient screening, patient contact, performing procedures, and physician contact. Statistical analysis was conducted on the patient enrollment activities. RESULTS We conducted fifteen 120-minute observations from December 12, 2013, to January 3, 2014 and shadowed 8 clinical research coordinators. Patient screening took 31.62% of their time, patient contact took 18.67%, performing procedures took 17.6%, physician contact was 1%, and other activities took 31.0%. CONCLUSIONS Screening patients for eligibility constituted the most time. Automated screening methods could help reduce this time. The findings suggest improvement areas in recruitment planning to increase the efficiency of clinical trial enrollment.
Collapse
|
47
|
Spasic I, Krzeminski D, Corcoran P, Balinsky A. Cohort Selection for Clinical Trials From Longitudinal Patient Records: Text Mining Approach. JMIR Med Inform 2019; 7:e15980. [PMID: 31674914 PMCID: PMC6913747 DOI: 10.2196/15980] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 09/29/2019] [Accepted: 10/02/2019] [Indexed: 12/17/2022] Open
Abstract
Background Clinical trials are an important step in introducing new interventions into clinical practice by generating data on their safety and efficacy. Clinical trials need to ensure that participants are similar so that the findings can be attributed to the interventions studied and not to some other factors. Therefore, each clinical trial defines eligibility criteria, which describe characteristics that must be shared by the participants. Unfortunately, the complexities of eligibility criteria may not allow them to be translated directly into readily executable database queries. Instead, they may require careful analysis of the narrative sections of medical records. Manual screening of medical records is time consuming, thus negatively affecting the timeliness of the recruitment process. Objective Track 1 of the 2018 National Natural Language Processing Clinical Challenge focused on the task of cohort selection for clinical trials, aiming to answer the following question: Can natural language processing be applied to narrative medical records to identify patients who meet eligibility criteria for clinical trials? The task required the participating systems to analyze longitudinal patient records to determine if the corresponding patients met the given eligibility criteria. We aimed to describe a system developed to address this task. Methods Our system consisted of 13 classifiers, one for each eligibility criterion. All classifiers used a bag-of-words document representation model. To prevent the loss of relevant contextual information associated with such representation, a pattern-matching approach was used to extract context-sensitive features. They were embedded back into the text as lexically distinguishable tokens, which were consequently featured in the bag-of-words representation. Supervised machine learning was chosen wherever a sufficient number of both positive and negative instances was available to learn from. A rule-based approach focusing on a small set of relevant features was chosen for the remaining criteria. Results The system was evaluated using microaveraged F measure. Overall, 4 machine algorithms, including support vector machine, logistic regression, naïve Bayesian classifier, and gradient tree boosting (GTB), were evaluated on the training data using 10–fold cross-validation. Overall, GTB demonstrated the most consistent performance. Its performance peaked when oversampling was used to balance the training data. The final evaluation was performed on previously unseen test data. On average, the F measure of 89.04% was comparable to 3 of the top ranked performances in the shared task (91.11%, 90.28%, and 90.21%). With an F measure of 88.14%, we significantly outperformed these systems (81.03%, 78.50%, and 70.81%) in identifying patients with advanced coronary artery disease. Conclusions The holdout evaluation provides evidence that our system was able to identify eligible patients for the given clinical trial with high accuracy. Our approach demonstrates how rule-based knowledge infusion can improve the performance of machine learning algorithms even when trained on a relatively small dataset.
Collapse
Affiliation(s)
- Irena Spasic
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | | | - Padraig Corcoran
- School of Computer Science & Informatics, Cardiff University, Cardiff, United Kingdom
| | | |
Collapse
|
48
|
Narayan VM, Dahm P. The future of clinical trials in urological oncology. Nat Rev Urol 2019; 16:722-733. [PMID: 31605037 DOI: 10.1038/s41585-019-0243-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2019] [Indexed: 12/11/2022]
Abstract
Well-designed clinical trials in urological oncology help to guide treatment decisions and aid in counselling patients, ultimately serving to improve outcomes. Since the term evidence-based medicine was first used by Gordon Guyatt in 1991, a renewed emphasis on methodology, transparent trial design and study reporting has helped to improve clinical research and in turn, the landscape of medical literature. Novel clinical trial designs (including multi-arm, multistage trials, basket and umbrella studies and research from big data sources, such as electronic health records, administrative claims databases and quality monitoring registries) are well suited to advance innovation in urological oncology. Existing urological clinical trials are often limited by small numbers, are statistically underpowered and many face difficulties with accrual. Thus, efforts to improve trial design are of considerable importance. The development and use of standard outcome sets and adherence to reporting guidelines offer researchers the opportunity to guide value-oriented care, minimize research waste and efficiently identify solutions to the unanswered questions in urology cancer care.
Collapse
Affiliation(s)
- Vikram M Narayan
- Minneapolis VA Medical Center and University of Minnesota Department of Urology, Minneapolis, MN, 55417, USA.,University of Texas MD Anderson Cancer Center, Department of Urology, Houston, TX, 77030, USA
| | - Philipp Dahm
- Minneapolis VA Medical Center and University of Minnesota Department of Urology, Minneapolis, MN, 55417, USA.
| |
Collapse
|
49
|
|
50
|
Kersloot MG, Lau F, Abu-Hanna A, Arts DL, Cornet R. Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES. J Biomed Semantics 2019; 10:14. [PMID: 31533810 PMCID: PMC6749652 DOI: 10.1186/s13326-019-0207-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 08/13/2019] [Indexed: 12/05/2022] Open
Abstract
Background Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. However, most algorithms in MLP are institution-specific or address only one clinical need, and thus cannot be broadly applied. In addition, most MLP systems do not detect concepts in misspelled text and cannot detect attribute relationships between concepts. The objective of this study was to develop and evaluate an MLP application that includes generic algorithms for the detection of (misspelled) concepts and of attribute relationships between them. Methods An implementation of the MLP system cTAKES, called DIRECT, was developed with generic SNOMED CT concept filter, concept relationship detection, and attribute relationship detection algorithms and a custom dictionary. Four implementations of cTAKES were evaluated by comparing 98 manually annotated oncology charts with the output of DIRECT. The F1-score was determined for named-entity recognition and attribute relationship detection for the concepts ‘lung cancer’, ‘non-small cell lung cancer’, and ‘recurrence’. The performance of the four implementations was compared with a two-tailed permutation test. Results DIRECT detected lung cancer and non-small cell lung cancer concepts with F1-scores between 0.828 and 0.947 and between 0.862 and 0.933, respectively. The concept recurrence was detected with a significantly higher F1-score of 0.921, compared to the other implementations, and the relationship between recurrence and lung cancer with an F1-score of 0.857. The precision of the detection of lung cancer, non-small cell lung cancer, and recurrence concepts were 1.000, 0.966, and 0.879, compared to precisions of 0.943, 0.967, and 0.000 in the original implementation, respectively. Conclusion DIRECT can detect oncology concepts and attribute relationships with high precision and can detect recurrence with significant increase in F1-score, compared to the original implementation of cTAKES, due to the usage of a custom dictionary and a generic concept relationship detection algorithm. These concepts and relationships can be used to encode clinical narratives, and can thus substantially reduce manual chart abstraction efforts, saving time for clinicians and researchers.
Collapse
Affiliation(s)
- Martijn G Kersloot
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105AZ, Amsterdam, The Netherlands.
| | - Francis Lau
- School of Health Information Science, University of Victoria, Victoria, Canada
| | - Ameen Abu-Hanna
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105AZ, Amsterdam, The Netherlands
| | - Derk L Arts
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105AZ, Amsterdam, The Netherlands
| | - Ronald Cornet
- Department of Medical Informatics, Amsterdam Public Health Research Institute, Amsterdam UMC, University of Amsterdam, Meibergdreef 9, 1105AZ, Amsterdam, The Netherlands
| |
Collapse
|