Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ni Y, Wright J, Perentesis J, Lingren T, Deleger L, Kaiser M, Kohane I, Solti I. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients. BMC Med Inform Decis Mak 2015;15:28. [PMID: 25881112 PMCID: PMC4407835 DOI: 10.1186/s12911-015-0149-3] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 03/24/2015] [Indexed: 11/22/2022] Open

For:	Ni Y, Wright J, Perentesis J, Lingren T, Deleger L, Kaiser M, Kohane I, Solti I. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility pre-screening for pediatric oncology patients. BMC Med Inform Decis Mak 2015;15:28. [PMID: 25881112 PMCID: PMC4407835 DOI: 10.1186/s12911-015-0149-3] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 03/24/2015] [Indexed: 11/22/2022] Open

Number

Cited by Other Article(s)

Xie CX, De Simoni A, Eldridge S, Pinnock H, Relton C. Development of a conceptual framework for defining trial efficiency. PLoS One 2024;19:e0304187. [PMID: 38781167 PMCID: PMC11115328 DOI: 10.1371/journal.pone.0304187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 05/07/2024] [Indexed: 05/25/2024] Open

Abstract

BACKGROUND

Globally, there is a growing focus on efficient trials, yet numerous interpretations have emerged, suggesting a significant heterogeneity in understanding "efficiency" within the trial context. Therefore in this study, we aimed to dissect the multifaceted nature of trial efficiency by establishing a comprehensive conceptual framework for its definition.

OBJECTIVES

To collate diverse perspectives regarding trial efficiency and to achieve consensus on a conceptual framework for defining trial efficiency.

METHODS

From July 2022 to July 2023, we undertook a literature review to identify various terms that have been used to define trial efficiency. We then conducted a modified e-Delphi study, comprising an exploratory open round and a subsequent scoring round to refine and validate the identified items. We recruited a wide range of experts in the global trial community including trialists, funders, sponsors, journal editors and members of the public. Consensus was defined as items rated "without disagreement", measured by the inter-percentile range adjusted for symmetry through the UCLA/RAND approach.

RESULTS

Seventy-eight studies were identified from a literature review, from which we extracted nine terms related to trial efficiency. We then used review findings as exemplars in the Delphi open round. Forty-nine international experts were recruited to the e-Delphi panel. Open round responses resulted in the refinement of the initial nine terms, which were consequently included in the scoring round. We obtained consensus on all nine items: 1) four constructs that collectively define trial efficiency containing scientific efficiency, operational efficiency, statistical efficiency and economic efficiency; and 2) five essential building blocks for efficient trial comprising trial design, trial process, infrastructure, superstructure, and stakeholders.

CONCLUSIONS

This is the first attempt to dissect the concept of trial efficiency into theoretical constructs. Having an agreed definition will allow better trial implementation and facilitate effective communication and decision-making across stakeholders. We also identified essential building blocks that are the cornerstones of an efficient trial. In this pursuit of understanding, we are not only unravelling the complexities of trial efficiency but also laying the groundwork for evaluating the efficiency of an individual trial or a trial system in the future.

Collapse

Gédor M, Desandes E, Chesnel M, Merlin JL, Marchal F, Lambert A, Baudin A. [Development of an artificial intelligence system to improve cancer clinical trial eligibility screening]. Bull Cancer 2024;111:473-482. [PMID: 38503584 DOI: 10.1016/j.bulcan.2024.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 01/03/2024] [Accepted: 01/12/2024] [Indexed: 03/21/2024]

Foucher J, Azizi L, Öijerstedt L, Kläppe U, Ingre C. The usage of population and disease registries as pre-screening tools for clinical trials, a systematic review. Syst Rev 2024;13:111. [PMID: 38654383 PMCID: PMC11040983 DOI: 10.1186/s13643-024-02533-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 04/12/2024] [Indexed: 04/25/2024] Open

Blasini R, Strantz C, Gulden C, Helfer S, Lidke J, Prokosch HU, Sohrabi K, Schneider H. Evaluation of Eligibility Criteria Relevance for the Purpose of IT-Supported Trial Recruitment: Descriptive Quantitative Analysis. JMIR Form Res 2024;8:e49347. [PMID: 38294862 PMCID: PMC10867759 DOI: 10.2196/49347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 09/28/2023] [Accepted: 11/22/2023] [Indexed: 02/01/2024] Open

Abstract

BACKGROUND

Clinical trials (CTs) are crucial for medical research; however, they frequently fall short of the requisite number of participants who meet all eligibility criteria (EC). A clinical trial recruitment support system (CTRSS) is developed to help identify potential participants by performing a search on a specific data pool. The accuracy of the search results is directly related to the quality of the data used for comparison. Data accessibility can present challenges, making it crucial to identify the necessary data for a CTRSS to query. Prior research has examined the data elements frequently used in CT EC but has not evaluated which criteria are actually used to search for participants. Although all EC must be met to enroll a person in a CT, not all criteria have the same importance when searching for potential participants in an existing data pool, such as an electronic health record, because some of the criteria are only relevant at the time of enrollment.

OBJECTIVE

In this study, we investigated which groups of data elements are relevant in practice for finding suitable participants and whether there are typical elements that are not relevant and can therefore be omitted.

METHODS

We asked trial experts and CTRSS developers to first categorize the EC of their CTs according to data element groups and then to classify them into 1 of 3 categories: necessary, complementary, and irrelevant. In addition, the experts assessed whether a criterion was documented (on paper or digitally) or whether it was information known only to the treating physicians or patients.

RESULTS

We reviewed 82 CTs with 1132 unique EC. Of these 1132 EC, 350 (30.9%) were considered necessary, 224 (19.8%) complementary, and 341 (30.1%) total irrelevant. To identify the most relevant data elements, we introduced the data element relevance index (DERI). This describes the percentage of studies in which the corresponding data element occurs and is also classified as necessary or supplementary. We found that the query of "diagnosis" was relevant for finding participants in 79 (96.3%) of the CTs. This group was followed by "date of birth/age" with a DERI of 85.4% (n=70) and "procedure" with a DERI of 35.4% (n=29).

CONCLUSIONS

The distribution of data element groups in CTs has been heterogeneously described in previous works. Therefore, we recommend identifying the percentage of CTs in which data element groups can be found as a more reliable way to determine the relevance of EC. Only necessary and complementary criteria should be included in this DERI.

Collapse

Lombardo G, Couvert C, Kose M, Begum A, Spiertz C, Worrell C, Hasselbaink D, Didden EM, Sforzini L, Todorovic M, Lewi M, Brown M, Vaterkowski M, Gullet N, Amasi-Hartoonian N, Griffon N, Pais R, Rodriguez Navarro S, Kremer A, Maes C, Tan EH, Moinat M, Ferrer JG, Pariante CM, Kalra D, Ammour N, Kalko S. Electronic health records (EHRs) in clinical research and platform trials: Application of the innovative EHR-based methods developed by EU-PEARL. J Biomed Inform 2023;148:104553. [PMID: 38000766 DOI: 10.1016/j.jbi.2023.104553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 11/13/2023] [Accepted: 11/20/2023] [Indexed: 11/26/2023]

Abstract

OBJECTIVE

Electronic Health Record (EHR) systems are digital platforms in clinical practice used to collect patients' clinical information related to their health status and represents a useful storage of real-world data. EHRs have a potential role in research studies, in particular, in platform trials. Platform trials are innovative trial designs including multiple trial arms (conducted simultaneously and/or sequentially) on different treatments under a single master protocol. However, the use of EHRs in research comes with important challenges such as incompleteness of records and the need to translate trial eligibility criteria into interoperable queries. In this paper, we aim to review and to describe our proposed innovative methods to tackle some of the most important challenges identified. This work is part of the Innovative Medicines Initiative (IMI) EU Patient-cEntric clinicAl tRial pLatforms (EU-PEARL) project's work package 3 (WP3), whose objective is to deliver tools and guidance for EHR-based protocol feasibility assessment, clinical site selection, and patient pre-screening in platform trials, investing in the building of a data-driven clinical network framework that can execute these complex innovative designs for which feasibility assessments are critically important.

METHODS

ISO standards and relevant references informed a readiness survey, producing 354 criteria with corresponding questions selected and harmonised through a 7-round scoring process (0-1) in stakeholder meetings, with 85% of consensus being the threshold of acceptance for a criterium/question. ATLAS cohort definition and Cohort Diagnostics were mainly used to create the trial feasibility eligibility (I/E) criteria as executable interoperable queries.

RESULTS

The WP3/EU-PEARL group developed a readiness survey (eSurvey) for an efficient selection of clinical sites with suitable EHRs, consisting of yes-or-no questions, and a set-up of interoperable proxy queries using physicians' defined trial criteria. Both actions facilitate recruiting trial participants and alignment between study costs/timelines and data-driven recruitment potential.

CONCLUSION

The eSurvey will help create an archive of clinical sites with mature EHR systems suitable to participate in clinical trials/platform trials, and the interoperable proxy queries of trial eligibility criteria will help identify the number of potential participants. Ultimately, these tools will contribute to the production of EHR-based protocol design.

Collapse

Affiliation(s)

Giulia Lombardo King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK.
Camille Couvert Sanofi R&D, Global Development, Clinical Science & Operations, Chilly-Mazarin, France
Melisa Kose King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
Amina Begum King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
Cecile Spiertz The Janssen Pharmaceutical Companies of Johnson & Johnson, Leiden, The Netherlands
Courtney Worrell King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
Danny Hasselbaink Janssen Biologics B.V., Leiden, the Netherlands
Eva-Maria Didden Actelion, a Janssen company of Johnson & Johnson, Allschwil, Basel-Country, Switzerland
Luca Sforzini King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
Marija Todorovic Johnson & Johnson Clinical Operations (JJCO), Johnson & Johnson company, Belgrade, Serbia
Martine Lewi Global Commercial Strategy Organization, the Janssen Pharmaceutical Companies of Johnson & Johnson, Raritan, New Jersey, USA
Mollie Brown King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
Morgan Vaterkowski Assistance Publique Hôpitaux de Paris, IT Department, Innovation and Data, Paris, France, and EPITA EPITA School of Engineering and Computer Science, Paris, France
Nancy Gullet King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
Nare Amasi-Hartoonian King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
Nicolas Griffon Information Technology Department, AP-HP, Paris, France; LIMICS, Inserm U1142, Sorbonne Université, Paris, France
Raluca Pais Sorbonne Université, Assistance Publique-Hôpitaux de Paris, Hôpital Pitié-Salpêtrière, Institute of Cardiometabolism and Nutrition, INSERM UMRS_938, Paris, France
Sarai Rodriguez Navarro Vall d'Hebron Research Institute (VHIR), Barcelona, Spain
Andreas Kremer Information Technology for Translational Medicine, ITTM S.A, House of BioHealth, Esch-sur-Alzette, Luxembourg
Christophe Maes The European Institute for Innovation through health data, and Department Public Health and Primary Care, Unit of Medical Informatics and Statistics, Faculty of Medicine and Health Sciences, Ghent University, Gent, Belgium
Eng Hooi Tan Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
Maxim Moinat Erasmus University Medical Center, Rotterdam, the Netherlands
Joan Genescà Ferrer Vall d'Hebron Research Institute (VHIR), Barcelona, Spain
Carmine M Pariante King's College London, Institute of Psychiatry, Psychology and Neuroscience, Department of Psychological Medicine, London, UK
Dipak Kalra The European Institute for Innovation through Health Data and Visiting Professor, University of Ghent, Gent, Belgium
Nadir Ammour Sanofi R&D, Global Development, Clinical Science & Operations, Chilly-Mazarin, France
Susana Kalko Vall d'Hebron Research Institute (VHIR), Barcelona, Spain.

Collapse

Xu Q, Liu Y, Sun D, Huang X, Li F, Zhai J, Li Y, Zhou Q, Qian N, Niu B. OncoCTMiner: streamlining precision oncology trial matching via molecular profile analysis. Database (Oxford) 2023;2023:baad077. [PMID: 37935585 PMCID: PMC10630409 DOI: 10.1093/database/baad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 09/08/2023] [Accepted: 10/21/2023] [Indexed: 11/09/2023]

Affiliation(s)

Quan Xu Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
Yueyue Liu Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
Dawei Sun Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
Xiaoqian Huang Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
Feihong Li Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
JinCheng Zhai Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China
Yang Li Beijing International Center for Mathematical Research, Peking University, No. 5 Yiheyuan Road Haidian District, Beijing 100871, China Chongqing Research Institute of Big Data, Peking University, Chongqing 401333, China
Qiming Zhou Department of Bioinformatics, Beijing ChosenMed Clinical Laboratory Co. Ltd., Jinghai Industrial Park, 156 Jinghai 4th Road, Economic and Technological Development Area, Beijing 100176, China Research and Development Center, ChosenMed Technology (Zhejiang) Co. Ltd., Room 101, Building 8, Jincheng International Science and Technology City, No. 26 Zhenxing East Road, Linping District, Hangzhou, 311103, China
Niansong Qian Department of Oncology, Senior Department of Respiratory and Critical Care Medicine, The Eighth Medical Center of Chinese PLA General Hospital, No.17 A Heishanhu Road, Haidian District, Beijing 100853, China
Beifang Niu Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100190, China

Collapse

Su Q, Cheng G, Huang J. A review of research on eligibility criteria for clinical trials. Clin Exp Med 2023;23:1867-1879. [PMID: 36602707 PMCID: PMC9815064 DOI: 10.1007/s10238-022-00975-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 12/06/2022] [Indexed: 01/06/2023]

Kaskovich S, Wyatt KD, Oliwa T, Graglia L, Furner B, Lee J, Mayampurath A, Volchenboum SL. Automated Matching of Patients to Clinical Trials: A Patient-Centric Natural Language Processing Approach for Pediatric Leukemia. JCO Clin Cancer Inform 2023;7:e2300009. [PMID: 37428994 PMCID: PMC10857751 DOI: 10.1200/cci.23.00009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 04/05/2023] [Accepted: 05/10/2023] [Indexed: 07/12/2023] Open

Abstract

PURPOSE

Matching patients to clinical trials is cumbersome and costly. Attempts have been made to automate the matching process; however, most have used a trial-centric approach, which focuses on a single trial. In this study, we developed a patient-centric matching tool that matches patient-specific demographic and clinical information with free-text clinical trial inclusion and exclusion criteria extracted using natural language processing to return a list of relevant clinical trials ordered by the patient's likelihood of eligibility.

MATERIALS AND METHODS

Records from pediatric leukemia clinical trials were downloaded from ClinicalTrials.gov. Regular expressions were used to discretize and extract individual trial criteria. A multilabel support vector machine (SVM) was trained to classify sentence embeddings of criteria into relevant clinical categories. Labeled criteria were parsed using regular expressions to extract numbers, comparators, and relationships. In the validation phase, a patient-trial match score was generated for each trial and returned in the form of a ranked list for each patient.

RESULTS

In total, 5,251 discretized criteria were extracted from 216 protocols. The most frequent criterion was previous chemotherapy/biologics (17%). The multilabel SVM demonstrated a pooled accuracy of 75%. The text processing pipeline was able to automatically extract 68% of eligibility criteria rules, as compared with 80% in a manual version of the tool. Automated matching was accomplished in approximately 4 seconds, as compared with several hours using manual derivation.

CONCLUSION

To our knowledge, this project represents the first open-source attempt to generate a patient-centric clinical trial matching tool. The tool demonstrated acceptable performance when compared with a manual version, and it has potential to save time and money when matching patients to trials.

Collapse

Ismail A, Al-Zoubi T, El Naqa I, Saeed H. The role of artificial intelligence in hastening time to recruitment in clinical trials. BJR Open 2023;5:20220023. [PMID: 37953865 PMCID: PMC10636341 DOI: 10.1259/bjro.20220023] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 03/20/2023] [Accepted: 04/11/2023] [Indexed: 09/01/2023] Open

Meystre SM, Heider PM, Cates A, Bastian G, Pittman T, Gentilin S, Kelechi TJ. Piloting an automated clinical trial eligibility surveillance and provider alert system based on artificial intelligence and standard data models. BMC Med Res Methodol 2023;23:88. [PMID: 37041475 PMCID: PMC10088225 DOI: 10.1186/s12874-023-01916-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Accepted: 04/04/2023] [Indexed: 04/13/2023] Open

Abstract

BACKGROUND

To advance new therapies into clinical care, clinical trials must recruit enough participants. Yet, many trials fail to do so, leading to delays, early trial termination, and wasted resources. Under-enrolling trials make it impossible to draw conclusions about the efficacy of new therapies. An oft-cited reason for insufficient enrollment is lack of study team and provider awareness about patient eligibility. Automating clinical trial eligibility surveillance and study team and provider notification could offer a solution.

METHODS

To address this need for an automated solution, we conducted an observational pilot study of our TAES (TriAl Eligibility Surveillance) system. We tested the hypothesis that an automated system based on natural language processing and machine learning algorithms could detect patients eligible for specific clinical trials by linking the information extracted from trial descriptions to the corresponding clinical information in the electronic health record (EHR). To evaluate the TAES information extraction and matching prototype (i.e., TAES prototype), we selected five open cardiovascular and cancer trials at the Medical University of South Carolina and created a new reference standard of 21,974 clinical text notes from a random selection of 400 patients (including at least 100 enrolled in the selected trials), with a small subset of 20 notes annotated in detail. We also developed a simple web interface for a new database that stores all trial eligibility criteria, corresponding clinical information, and trial-patient match characteristics using the Observational Medical Outcomes Partnership (OMOP) common data model. Finally, we investigated options for integrating an automated clinical trial eligibility system into the EHR and for notifying health care providers promptly of potential patient eligibility without interrupting their clinical workflow.

RESULTS

Although the rapidly implemented TAES prototype achieved only moderate accuracy (recall up to 0.778; precision up to 1.000), it enabled us to assess options for integrating an automated system successfully into the clinical workflow at a healthcare system.

CONCLUSIONS

Once optimized, the TAES system could exponentially enhance identification of patients potentially eligible for clinical trials, while simultaneously decreasing the burden on research teams of manual EHR review. Through timely notifications, it could also raise physician awareness of patient eligibility for clinical trials.

Collapse

Chow R, Midroni J, Kaur J, Boldt G, Liu G, Eng L, Liu FF, Haibe-Kains B, Lock M, Raman S. Use of artificial intelligence for cancer clinical trial enrollment: a systematic review and meta-analysis. J Natl Cancer Inst 2023;115:365-374. [PMID: 36688707 PMCID: PMC10086633 DOI: 10.1093/jnci/djad013] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 12/13/2022] [Accepted: 01/11/2023] [Indexed: 01/24/2023] Open

Maheshwari K, Cywinski JB, Papay F, Khanna AK, Mathur P. Artificial Intelligence for Perioperative Medicine: Perioperative Intelligence. Anesth Analg 2023;136:637-645. [PMID: 35203086 DOI: 10.1213/ane.0000000000005952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Williams E, Kienast M, Medawar E, Reinelt J, Merola A, Klopfenstein SAI, Flint AR, Heeren P, Poncette AS, Balzer F, Beimes J, von Bünau P, Chromik J, Arnrich B, Scherf N, Niehaus S. A Standardized Clinical Data Harmonization Pipeline for Scalable AI Application Deployment (FHIR-DHP): Validation and Usability Study. JMIR Med Inform 2023;11:e43847. [PMID: 36943344 PMCID: PMC10131740 DOI: 10.2196/43847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/24/2023] [Accepted: 01/25/2023] [Indexed: 01/26/2023] Open

Greve K, Ni Y, Bailes AF, Vargus-Adams J, Miley AE, Aronow B, McMahon MM, Kurowski BG, Mitelpunkt A. Gross motor function prediction using natural language processing in cerebral palsy. Dev Med Child Neurol 2023;65:100-106. [PMID: 35665923 PMCID: PMC9720038 DOI: 10.1111/dmcn.15301] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 05/03/2022] [Accepted: 05/05/2022] [Indexed: 01/12/2023]

Abstract

AIM

To predict ambulatory status and Gross Motor Function Classification System (GMFCS) levels in patients with cerebral palsy (CP) by applying natural language processing (NLP) to electronic health record (EHR) clinical notes.

METHOD

Individuals aged 8 to 26 years with a diagnosis of CP in the EHR between January 2009 and November 2020 (~12 years of data) were included in a cross-sectional retrospective cohort of 2483 patients. The cohort was divided into train-test and validation groups. Positive predictive value, sensitivity, specificity, and area under the receiver operating curve (AUC) were calculated for prediction of ambulatory status and GMFCS levels.

RESULTS

The median age was 15 years (interquartile range 10-20 years) for the total cohort, with 56% being male and 75% White. The validation group resulted in 70% sensitivity, 88% specificity, 81% positive predictive value, and 0.89 AUC for predicting ambulatory status. NLP applied to the EHR differentiated between GMFCS levels I-II and III (15% sensitivity, 96% specificity, 46% positive predictive value, and 0.71 AUC); and IV and V (81% sensitivity, 51% specificity, 70% positive predictive value, and 0.75 AUC).

INTERPRETATION

NLP applied to the EHR demonstrated excellent differentiation between ambulatory and non-ambulatory status, and good differentiation between GMFCS levels I-II and III, and IV and V. Clinical use of NLP may help to individualize functional characterization and management.

WHAT THIS PAPER ADDS

Natural language processing (NLP) applied to the electronic health record (EHR) can predict ambulatory status in children with cerebral palsy (CP). NLP provides good prediction of Gross Motor Function Classification System level in children with CP using the EHR. NLP methods described could be integrated in an EHR system to provide real-time information.

Collapse

Affiliation(s)

Kelly Greve Division of Occupational Therapy and Physical Therapy, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA Department of Rehabilitation, Exercise and Nutrition Sciences, University of Cincinnati College of Allied Health Sciences, Cincinnati, OH, USA
Yizhao Ni Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
Amy F. Bailes Division of Occupational Therapy and Physical Therapy, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA Department of Rehabilitation, Exercise and Nutrition Sciences, University of Cincinnati College of Allied Health Sciences, Cincinnati, OH, USA
Jilda Vargus-Adams Division of Pediatric Rehabilitation Medicine, Cincinnati Children’s Hospital Medical Center, OH, USA Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, OH, USA Department of Neurology and Rehabilitation Medicine, University of Cincinnati, College of Medicine, Cincinnati, OH, USA
Aimee E. Miley Division of Pediatric Rehabilitation Medicine, Cincinnati Children’s Hospital Medical Center, OH, USA
Bruce Aronow Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, OH, USA
Mary M. McMahon Division of Pediatric Rehabilitation Medicine, Cincinnati Children’s Hospital Medical Center, OH, USA Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, OH, USA Department of Neurology and Rehabilitation Medicine, University of Cincinnati, College of Medicine, Cincinnati, OH, USA
Brad G. Kurowski Division of Pediatric Rehabilitation Medicine, Cincinnati Children’s Hospital Medical Center, OH, USA Department of Pediatrics, University of Cincinnati, College of Medicine, Cincinnati, OH, USA Department of Neurology and Rehabilitation Medicine, University of Cincinnati, College of Medicine, Cincinnati, OH, USA
Alexis Mitelpunkt Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA Pediatric Rehabilitation, Department of Rehabilitation, Dana-Dwek Children’s Hospital, Tel Aviv Medical Center, Tel Aviv, Israel Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel

Collapse

Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022;6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open

Fang Y, Idnay B, Sun Y, Liu H, Chen Z, Marder K, Xu H, Schnall R, Weng C. Combining human and machine intelligence for clinical trial eligibility querying. J Am Med Inform Assoc 2022;29:1161-1171. [PMID: 35426943 PMCID: PMC9196697 DOI: 10.1093/jamia/ocac051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 03/29/2022] [Indexed: 11/13/2022] Open

Rafee A, Riepenhausen S, Neuhaus P, Meidt A, Dugas M, Varghese J. ELaPro, a LOINC-mapped core dataset for top laboratory procedures of eligibility screening for clinical trials. BMC Med Res Methodol 2022;22:141. [PMID: 35568796 PMCID: PMC9107639 DOI: 10.1186/s12874-022-01611-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 04/20/2022] [Indexed: 12/21/2022] Open

Abstract

Background

Screening for eligible patients continues to pose a great challenge for many clinical trials. This has led to a rapidly growing interest in standardizing computable representations of eligibility criteria (EC) in order to develop tools that leverage data from electronic health record (EHR) systems. Although laboratory procedures (LP) represent a common entity of EC that is readily available and retrievable from EHR systems, there is a lack of interoperable data models for this entity of EC. A public, specialized data model that utilizes international, widely-adopted terminology for LP, e.g. Logical Observation Identifiers Names and Codes (LOINC®), is much needed to support automated screening tools.

Objective

The aim of this study is to establish a core dataset for LP most frequently requested to recruit patients for clinical trials using LOINC terminology. Employing such a core dataset could enhance the interface between study feasibility platforms and EHR systems and significantly improve automatic patient recruitment.

Methods

We used a semi-automated approach to analyze 10,516 screening forms from the Medical Data Models (MDM) portal’s data repository that are pre-annotated with Unified Medical Language System (UMLS). An automated semantic analysis based on concept frequency is followed by an extensive manual expert review performed by physicians to analyze complex recruitment-relevant concepts not amenable to automatic approach.

Results

Based on analysis of 138,225 EC from 10,516 screening forms, 55 laboratory procedures represented 77.87% of all UMLS laboratory concept occurrences identified in the selected EC forms. We identified 26,413 unique UMLS concepts from 118 UMLS semantic types and covered the vast majority of Medical Subject Headings (MeSH) disease domains.

Conclusions

Only a small set of common LP covers the majority of laboratory concepts in screening EC forms which supports the feasibility of establishing a focused core dataset for LP. We present ELaPro, a novel, LOINC-mapped, core dataset for the most frequent 55 LP requested in screening for clinical trials. ELaPro is available in multiple machine-readable data formats like CSV, ODM and HL7 FHIR. The extensive manual curation of this large number of free-text EC as well as the combining of UMLS and LOINC terminologies distinguishes this specialized dataset from previous relevant datasets in the literature.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-022-01611-y.

Collapse

The Role of Artificial Intelligence in Early Cancer Diagnosis. Cancers (Basel) 2022;14:cancers14061524. [PMID: 35326674 PMCID: PMC8946688 DOI: 10.3390/cancers14061524] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/08/2022] [Accepted: 03/10/2022] [Indexed: 02/01/2023] Open

Kataria S, Ravindran V. Musculoskeletal care - at the confluence of data science, sensors, engineering, and computation. BMC Musculoskelet Disord 2022;23:169. [PMID: 35193536 PMCID: PMC8863295 DOI: 10.1186/s12891-022-05126-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 02/17/2022] [Indexed: 12/27/2022] Open

Idnay B, Dreisbach C, Weng C, Schnall R. A systematic review on natural language processing systems for eligibility prescreening in clinical research. J Am Med Inform Assoc 2021;29:197-206. [PMID: 34725689 PMCID: PMC8714283 DOI: 10.1093/jamia/ocab228] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 08/30/2021] [Accepted: 10/04/2021] [Indexed: 11/14/2022] Open

Shi W, Vasishta S, Dow L, Cavellini D, Palmer C, McKinstry B, Sullivan F. Early experience with an opt-in research register - Scottish Health Research Register (SHARE): a multi-method evaluation of participant recruitment performance. BMC Med Res Methodol 2021;21:286. [PMID: 34930144 PMCID: PMC8686271 DOI: 10.1186/s12874-021-01479-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 11/28/2021] [Indexed: 01/01/2023] Open

Abstract

Background

Recruiting participants to a clinical study is a resource-intensive process with a high failure rate. The Scottish Health Research Register (SHARE) provides recruitment support service which helps researchers recruit participants by searching patients’ Electronic Health Records (EHRs). The current study aims to evaluate the performance of SHARE in participant recruitment.

Methods

Recruitment projects eligible for evaluation were those that were conducted for clinical trials or observational studies and finished before 2020. For analysis of recruitment data, projects with incomplete data were excluded. For each project we calculated, from SHARE records, 1) the fraction of the participants recruited through SHARE as a percentage of the number requested by researchers (percentage fulfilled), 2) the percentage of the potential candidates provided by SHARE to researchers that were actually recruited (percentage provided and recruited), 3) the percentage of the participants recruited through SHARE of all the potentially eligible candidates identified by searching registrants’ EHRs (percentage identified and recruited). Research teams of the eligible projects were invited to participate in an anonymised online survey. Two metrics were derived from research teams’ responses, including a) the fraction of the recruited over the study target number of participants (percentage fulfilled), and b) the percentage of the participants recruited through SHARE among the candidates received from SHARE (percentage provided and recruited).

Results

Forty-four projects were eligible for inclusion. Recruitment data for 24 projects were available (20 excluded because of missingness or incompleteness). Survey invites were sent to all the eligible research teams and received 12 responses. Analysis of recruitment data shows the overall percentage fulfilled was 34.2% (interquartile 13.3–45.1%), the percentage provided and recruited 29.3% (interquartile 20.6–52.4%) and percentage identified and recruited 4.9% (interquartile 2.6–10.2%). Based on the data reported by researchers, percentage fulfilled was 31.7% (interquartile 5.8–59.6%) and percentage provided and recruited was 20.2% (interquartile 8.2–31.0%).

Conclusions

SHARE may be a valuable resource for recruiting participants for some clinical studies. Potential improvements are to expand the registrant base and to incorporate more data generated during patients’ different health care encounters into the candidate-searching step.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12874-021-01479-4.

Collapse

Wu J, Yakubov A, Abdul-Hay M, Love E, Kroening G, Cohen D, Spalink C, Joshi A, Balar A, Joseph KA, Ravenell J, Mehnert J. Prescreening to Increase Therapeutic Oncology Trial Enrollment at the Largest Public Hospital in the United States. JCO Oncol Pract 2021;18:e620-e625. [PMID: 34748371 DOI: 10.1200/op.21.00629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Ni Y, Bachtel A, Nause K, Beal S. Automated detection of substance use information from electronic health records for a pediatric population. J Am Med Inform Assoc 2021;28:2116-2127. [PMID: 34333636 PMCID: PMC8449626 DOI: 10.1093/jamia/ocab116] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 05/06/2021] [Accepted: 05/26/2021] [Indexed: 11/12/2022] Open

Abstract

OBJECTIVE

Substance use screening in adolescence is unstandardized and often documented in clinical notes, rather than in structured electronic health records (EHRs). The objective of this study was to integrate logic rules with state-of-the-art natural language processing (NLP) and machine learning technologies to detect substance use information from both structured and unstructured EHR data.

MATERIALS AND METHODS

Pediatric patients (10-20 years of age) with any encounter between July 1, 2012, and October 31, 2017, were included (n = 3890 patients; 19 478 encounters). EHR data were extracted at each encounter, manually reviewed for substance use (alcohol, tobacco, marijuana, opiate, any use), and coded as lifetime use, current use, or family use. Logic rules mapped structured EHR indicators to screening results. A knowledge-based NLP system and a deep learning model detected substance use information from unstructured clinical narratives. System performance was evaluated using positive predictive value, sensitivity, negative predictive value, specificity, and area under the receiver-operating characteristic curve (AUC).

RESULTS

The dataset included 17 235 structured indicators and 27 141 clinical narratives. Manual review of clinical narratives captured 94.0% of positive screening results, while structured EHR data captured 22.0%. Logic rules detected screening results from structured data with 1.0 and 0.99 for sensitivity and specificity, respectively. The knowledge-based system detected substance use information from clinical narratives with 0.86, 0.79, and 0.88 for AUC, sensitivity, and specificity, respectively. The deep learning model further improved detection capacity, achieving 0.88, 0.81, and 0.85 for AUC, sensitivity, and specificity, respectively. Finally, integrating predictions from structured and unstructured data achieved high detection capacity across all cases (0.96, 0.85, and 0.87 for AUC, sensitivity, and specificity, respectively).

CONCLUSIONS

It is feasible to detect substance use screening and results among pediatric patients using logic rules, NLP, and machine learning technologies.

Collapse

Ronquillo JG, Lester WT. Practical Aspects of Implementing and Applying Health Care Cloud Computing Services and Informatics to Cancer Clinical Trial Data. JCO Clin Cancer Inform 2021;5:826-832. [PMID: 34383582 DOI: 10.1200/cci.21.00018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract

PURPOSE

Cloud computing has led to dramatic growth in the volume, variety, and velocity of cancer data. However, cloud platforms and services present new challenges for cancer research, particularly in understanding the practical tradeoffs between cloud performance, cost, and complexity. The goal of this study was to describe the practical challenges when using a cloud-based service to improve the cancer clinical trial matching process.

METHODS

We collected information for all interventional cancer clinical trials from ClinicalTrials.gov and used the Google Cloud Healthcare Natural Language Application Programming Interface (API) to analyze clinical trial Title and Eligibility Criteria text. An informatics pipeline leveraging interoperability standards summarized the distribution of cancer clinical trials, genes, laboratory tests, and medications extracted from cloud-based entity analysis.

RESULTS

There were a total of 38,851 cancer-related clinical trials found in this study, with the distribution of cancer categories extracted from Title text significantly different than in ClinicalTrials.gov (P < .001). Cloud-based entity analysis of clinical trial criteria identified a total of 949 genes, 1,782 laboratory tests, 2,086 medications, and 4,902 National Cancer Institute Thesaurus terms, with estimated detection accuracies ranging from 12.8% to 89.9%. A total of 77,702 API calls processed an estimated 167,179 text records, which took a total of 1,979 processing-minutes (33.0 processing-hours), or approximately 1.5 seconds per API call.

CONCLUSION

Current general-purpose cloud health care tools-like the Google service in this study-should not be used for automated clinical trial matching unless they can perform effective extraction and classification of the clinical, genetic, and medication concepts central to precision oncology research. A strong understanding of the practical aspects of cloud computing will help researchers effectively navigate the vast data ecosystems in cancer research.

Collapse

Cai T, Cai F, Dahal KP, Cremone G, Lam E, Golnik C, Seyok T, Hong C, Cai T, Liao KP. Improving the Efficiency of Clinical Trial Recruitment Using an Ensemble Machine Learning to Assist With Eligibility Screening. ACR Open Rheumatol 2021;3:593-600. [PMID: 34296815 PMCID: PMC8449035 DOI: 10.1002/acr2.11289] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 05/18/2021] [Indexed: 11/22/2022] Open

Abstract

Objective

Efficiently identifying eligible patients is a crucial first step for a successful clinical trial. The objective of this study was to test whether an approach using electronic health record (EHR) data and an ensemble machine learning algorithm incorporating billing codes and data from clinical notes processed by natural language processing (NLP) can improve the efficiency of eligibility screening.

Methods

We studied patients screened for a clinical trial of rheumatoid arthritis (RA) with one or more International Classification of Diseases (ICD) code for RA and age greater than 35 years, from a tertiary care center and a community hospital. The following three groups of EHR features were considered for the algorithm: 1) structured features, 2) the counts of NLP concepts from notes, 3) health care utilization. All features were linked to dates. We applied random forest and logistic regression with least absolute shrinkage and selection operator penalty against the following two standard approaches: 1) one or more RA ICD code and no ICD codes related to exclusion criteria (Screen_RAICD1_+EX) and 2) two or more RA ICD codes (Screen_RAICD2). To test the portability, we trained the algorithm at one institution and tested it at the other.

Results

In total, 3359 patients at Brigham and Women’s Hospital (BWH) and 642 patients at Faulkner Hospital (FH) were studied, with 461 (13.7%) eligible patients at BWH and 84 (13.4%) at FH. The application of the algorithm reduced ineligible patients from chart review by 40.5% at the tertiary care center and by 57.0% at the community hospital. In contrast, Screen_RAICD2 reduced patients for chart review by 2.7% to 11.3%; Screen_RAICD1+EX reduced patients for chart review by 63% to 65% but excluded 22% to 27% of eligible patients.

Conclusion

The ensemble machine learning algorithm incorporating billing codes and NLP data increased the efficiency of eligibility screening by reducing the number of patients requiring chart review while not excluding eligible patients. Moreover, this approach can be trained at one institution and applied at another for multicenter clinical trials.

Collapse

O'Brien EC, Raman SR, Ellis A, Hammill BG, Berdan LG, Rorick T, Janmohamed S, Lampron Z, Hernandez AF, Curtis LH. The use of electronic health records for recruitment in clinical trials: a mixed methods analysis of the Harmony Outcomes Electronic Health Record Ancillary Study. Trials 2021;22:465. [PMID: 34281607 PMCID: PMC8287813 DOI: 10.1186/s13063-021-05397-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 06/24/2021] [Indexed: 11/22/2022] Open

Abstract

Background

The electronic health record (EHR) contains a wealth of clinical data that may be used to streamline the identification of potential clinical trial participants. However, there is little empirical information on site-level facilitators of and barriers to optimal use of EHR systems with respect to trial recruitment.

Methods

We conducted qualitative focus groups and quantitative surveys as part of the EHR Ancillary Study, which is being conducted alongside the multicenter, global, Harmony Outcomes Trial comparing albiglutide to standard care for the prevention of cardiovascular events in type 2 diabetes. Subject matter experts used findings from focus groups to draft a 20-question survey examining the use of the EHR for participant identification, common site recruitment strategies, and variation in perceived barriers to optimal use of the EHR. The final survey was fielded with 446 site investigators actively enrolling participants in the main trial.

Results

Nearly two-thirds of respondents were study coordinators (63.2%), 23.1% were principal investigators, and 13.7% held other research roles. Approximately half of the respondents reported using the EHR to find potential trial participants. Of these, 79.4% reported using EHR searches in conjunction with other recruitment methods, including reviewing of upcoming clinic schedules (75.3%) and contacting past trial participants (71.2%). Important barriers to optimal use of the EHR included the lack of availability of certain research-focused EHR modules and limitations on the ability to contact patients cared for by other providers. Of survey respondents who did not use the EHR to find potential participants, one-quarter reported that the EHR was not accessible in their country; this finding varied from 2.6% of respondents in North America to 50% of respondents in the Asia Pacific.

Conclusions

While EHR screening was commonly used for recruitment in a cardiovascular outcomes trial, important technical, governance, and regulatory barriers persist. Multifaceted, scalable, and customizable strategies are needed to support the optimal use of the EHR for trial participant identification.

Trial registration

ClinicalTrials.gov NCT02465515. Registered on 8 June 2015

Supplementary Information

The online version contains supplementary material available at 10.1186/s13063-021-05397-0.

Collapse

von Itzstein MS, Hullings M, Mayo H, Beg MS, Williams EL, Gerber DE. Application of Information Technology to Clinical Trial Evaluation and Enrollment: A Review. JAMA Oncol 2021;7:1559-1566. [PMID: 34236403 DOI: 10.1001/jamaoncol.2021.1165] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Abstract

Importance

As cancer treatment has become more individualized, oncologic clinical trials have become more complex. Increasingly numerous and stringent eligibility criteria frequently include tumor molecular or genomic characteristics that may not be readily identified in medical records, rendering it difficult to best match clinical trials with clinical sites and to identify potentially eligible patients once a clinical trial has been selected and activated. Partly because of these factors, enrollment rates for cancer clinical trials remain low, creating delays and increased costs for drug development. Information technology (IT) platforms have been applied to the implementation and conduct of clinical trials to improve efficiencies in several medical fields, and these platforms have recently been introduced to oncologic studies.

Observations

This review summarizes cancer and noncancer studies that used IT platforms for assistance with clinical trial site selection, patient recruitment, and patient screening. The review does not address the use of IT in other aspects of clinical research, such as wearable physical activity monitors or telehealth visits. A large number of IT platforms (which may be patient facing, site or investigator facing, or sponsor facing) are now commercially available. These applications use artificial intelligence and/or natural language processing to identify and summarize protocol eligibility criteria, institutional patient populations, and individual electronic health records. Although there is an expanding body of literature examining the role of this technology, relatively few studies to date have been performed in oncologic settings.

Conclusions and Relevance

This review found that an increasing number and variety of IT platforms were available to assist in the planning and conduct of clinical trials. Because oncologic clinical care and clinical trial protocols are particularly complex, nuanced, and individualized, published experience with this technology in other fields may not be fully applicable to cancer settings. The extent to which these services will overcome ongoing and increasing challenges in cancer clinical research remains unclear.

Collapse

Rogers JR, Lee J, Zhou Z, Cheung YK, Hripcsak G, Weng C. Contemporary use of real-world data for clinical trial conduct in the United States: a scoping review. J Am Med Inform Assoc 2021;28:144-154. [PMID: 33164065 DOI: 10.1093/jamia/ocaa224] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 08/11/2020] [Accepted: 09/02/2020] [Indexed: 12/28/2022] Open

Zong H, Yang J, Zhang Z, Li Z, Zhang X. Semantic categorization of Chinese eligibility criteria in clinical trials using machine learning methods. BMC Med Inform Decis Mak 2021;21:128. [PMID: 33858409 PMCID: PMC8050926 DOI: 10.1186/s12911-021-01487-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Accepted: 04/01/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Semantic categorization analysis of clinical trials eligibility criteria based on natural language processing technology is crucial for the task of optimizing clinical trials design and building automated patient recruitment system. However, most of related researches focused on English eligibility criteria, and to the best of our knowledge, there are no researches studied the Chinese eligibility criteria. Thus in this study, we aimed to explore the semantic categories of Chinese eligibility criteria.

METHODS

We downloaded the clinical trials registration files from the website of Chinese Clinical Trial Registry (ChiCTR) and extracted both the Chinese eligibility criteria and corresponding English eligibility criteria. We represented the criteria sentences based on the Unified Medical Language System semantic types and conducted the hierarchical clustering algorithm for the induction of semantic categories. Furthermore, in order to explore the classification performance of Chinese eligibility criteria with our developed semantic categories, we implemented multiple classification algorithms, include four baseline machine learning algorithms (LR, NB, kNN, SVM), three deep learning algorithms (CNN, RNN, FastText) and two pre-trained language models (BERT, ERNIE).

RESULTS

We totally developed 44 types of semantic categories, summarized 8 topic groups, and investigated the average incidence and prevalence in 272 hepatocellular carcinoma related Chinese clinical trials. Compared with the previous proposed categories in English eligibility criteria, 13 novel categories are identified in Chinese eligibility criteria. The classification result shows that most of semantic categories performed quite well, the pre-trained language model ERNIE achieved best performance with macro-average F1 score of 0.7980 and micro-average F1 score of 0.8484.

CONCLUSION

As a pilot study of Chinese eligibility criteria analysis, we developed the 44 semantic categories by hierarchical clustering algorithms for the first times, and validated the classification capacity with multiple classification algorithms.

Collapse

Naceanceno KS, House SL, Asaro PV. Shared-Task Worklists Improve Clinical Trial Recruitment Workflow in an Academic Emergency Department. Appl Clin Inform 2021;12:293-300. [PMID: 33827142 DOI: 10.1055/s-0041-1727153] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open

Abstract

BACKGROUND

Clinical trials performed in our emergency department at Barnes-Jewish Hospital utilize a centralized infrastructure for alerting, screening, and enrollment with rule-based alerts sent to clinical research coordinators. Previously, all alerts were delivered as text messages via dedicated cellular phones. As the number of ongoing clinical trials increased, the volume of alerts grew to an unmanageable level. Therefore, we have changed our primary notification delivery method to study-specific, shared-task worklists integrated with our pre-existing web-based screening documentation system.

OBJECTIVE

To evaluate the effects on screening and recruitment workflow of replacing text-message delivery of clinical trial alerts with study-specific shared-task worklists in a high-volume academic emergency department supporting multiple concurrent clinical trials.

METHODS

We analyzed retrospective data on alerting, screening, and enrollment for 10 active clinical trials pre- and postimplementation of shared-task worklists.

RESULTS

Notifications signaling the presence of potentially eligible subjects for clinical trials were more likely to result in a screen (p < 0.001) with the implementation of shared-task worklists compared with notifications delivered as text messages for 8/10 clinical trials. The change in workflow did not alter the likelihood of a notification resulting in an enrollment (p = 0.473). The Director of Research reported a substantial reduction in the amount of time spent redirecting clinical research coordinator screening activities.

CONCLUSION

Shared-task worklists, with the functionalities we have described, offer a viable alternative to delivery of clinical trial alerts via text message directly to clinical research coordinators recruiting for multiple concurrent clinical trials in a high-volume academic emergency department.

Collapse

Evaluating eligibility criteria of oncology trials using real-world data and AI. Nature 2021;592:629-633. [PMID: 33828294 DOI: 10.1038/s41586-021-03430-5] [Citation(s) in RCA: 99] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Accepted: 03/08/2021] [Indexed: 01/04/2023]

Jain N, Mittendorf KF, Holt M, Lenoue-Newton M, Maurer I, Miller C, Stachowiak M, Botyrius M, Cole J, Micheel C, Levy M. The My Cancer Genome clinical trial data model and trial curation workflow. J Am Med Inform Assoc 2021;27:1057-1066. [PMID: 32483629 PMCID: PMC7647323 DOI: 10.1093/jamia/ocaa066] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 04/07/2020] [Accepted: 04/17/2020] [Indexed: 11/14/2022] Open

Automated Machine Learning for Healthcare and Clinical Notes Analysis. COMPUTERS 2021. [DOI: 10.3390/computers10020024] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Abstract Machine learning (ML) has been slowly entering every aspect of our lives and its positive impact has been astonishing. To accelerate embedding ML in more applications and incorporating it in real-world scenarios, automated machine learning (AutoML) is emerging. The main purpose of AutoML is to provide seamless integration of ML in various industries, which will facilitate better outcomes in everyday tasks. In healthcare, AutoML has been already applied to easier settings with structured data such as tabular lab data. However, there is still a need for applying AutoML for interpreting medical text, which is being generated at a tremendous rate. For this to happen, a promising method is AutoML for clinical notes analysis, which is an unexplored research area representing a gap in ML research. The main objective of this paper is to fill this gap and provide a comprehensive survey and analytical study towards AutoML for clinical notes. To that end, we first introduce the AutoML technology and review its various tools and techniques. We then survey the literature of AutoML in the healthcare industry and discuss the developments specific to clinical settings, as well as those using general AutoML tools for healthcare applications. With this background, we then discuss challenges of working with clinical notes and highlight the benefits of developing AutoML for medical notes processing. Next, we survey relevant ML research for clinical notes and analyze the literature and the field of AutoML in the healthcare industry. Furthermore, we propose future research directions and shed light on the challenges and opportunities this emerging field holds. With this, we aim to assist the community with the implementation of an AutoML platform for medical notes, which if realized can revolutionize patient outcomes. Collapse

Bitterman DS, Miller TA, Mak RH, Savova GK. Clinical Natural Language Processing for Radiation Oncology: A Review and Practical Primer. Int J Radiat Oncol Biol Phys 2021;110:641-655. [PMID: 33545300 DOI: 10.1016/j.ijrobp.2021.01.044] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 12/22/2020] [Accepted: 01/23/2021] [Indexed: 02/07/2023]

Abstract

Natural language processing (NLP), which aims to convert human language into expressions that can be analyzed by computers, is one of the most rapidly developing and widely used technologies in the field of artificial intelligence. Natural language processing algorithms convert unstructured free text data into structured data that can be extracted and analyzed at scale. In medicine, this unlocking of the rich, expressive data within clinical free text in electronic medical records will help untap the full potential of big data for research and clinical purposes. Recent major NLP algorithmic advances have significantly improved the performance of these algorithms, leading to a surge in academic and industry interest in developing tools to automate information extraction and phenotyping from clinical texts. Thus, these technologies are poised to transform medical research and alter clinical practices in the future. Radiation oncology stands to benefit from NLP algorithms if they are appropriately developed and deployed, as they may enable advances such as automated inclusion of radiation therapy details into cancer registries, discovery of novel insights about cancer care, and improved patient data curation and presentation at the point of care. However, challenges remain before the full value of NLP is realized, such as the plethora of jargon specific to radiation oncology, nonstandard nomenclature, a lack of publicly available labeled data for model development, and interoperability limitations between radiation oncology data silos. Successful development and implementation of high quality and high value NLP models for radiation oncology will require close collaboration between computer scientists and the radiation oncology community. Here, we present a primer on artificial intelligence algorithms in general and NLP algorithms in particular; provide guidance on how to assess the performance of such algorithms; review prior research on NLP algorithms for oncology; and describe future avenues for NLP in radiation oncology research and clinics.

Collapse

Stubbs A, Filannino M, Soysal E, Henry S, Uzuner Ö. Cohort selection for clinical trials: n2c2 2018 shared task track 1. J Am Med Inform Assoc 2021;26:1163-1171. [PMID: 31562516 DOI: 10.1093/jamia/ocz163] [Citation(s) in RCA: 31] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 08/07/2019] [Accepted: 09/18/2019] [Indexed: 01/02/2023] Open

Nehme F, Feldman K. Evolving Role and Future Directions of Natural Language Processing in Gastroenterology. Dig Dis Sci 2021;66:29-40. [PMID: 32107677 DOI: 10.1007/s10620-020-06156-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 02/18/2020] [Indexed: 02/06/2023]

Artificial intelligence in oncology. Artif Intell Med 2021. [DOI: 10.1016/b978-0-12-821259-2.00018-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Chamberlin SR, Bedrick SD, Cohen AM, Wang Y, Wen A, Liu S, Liu H, Hersh WR. Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task. JAMIA Open 2020;3:395-404. [PMID: 33215074 PMCID: PMC7660955 DOI: 10.1093/jamiaopen/ooaa026] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 04/17/2020] [Accepted: 06/03/2020] [Indexed: 11/24/2022] Open

Abstract

OBJECTIVE

Growing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well understood. The objective of this research was to assess patient-level information retrieval methods using electronic health records for different types of cohort definition retrieval.

MATERIALS AND METHODS

We developed a test collection consisting of about 100 000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated information retrieval tasks using word-based approaches were performed, varying 4 different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics.

RESULTS

The best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision but were still not able to recall all relevant patients found by the automated queries.

CONCLUSION

While word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Future work will focus on using the test collection to develop and evaluate new approaches to query structure, weighting algorithms, and application of semantic methods.

Collapse

Beck JT, Rammage M, Jackson GP, Preininger AM, Dankwa-Mullan I, Roebuck MC, Torres A, Holtzen H, Coverdill SE, Williamson MP, Chau Q, Rhee K, Vinegra M. Artificial Intelligence Tool for Optimizing Eligibility Screening for Clinical Trials in a Large Community Cancer Center. JCO Clin Cancer Inform 2020;4:50-59. [DOI: 10.1200/cci.19.00079] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Johnson EA, Carrington JM. Clinical Research Integration Within the Electronic Health Record: A Literature Review. Comput Inform Nurs 2020;39:129-135. [PMID: 33657055 DOI: 10.1097/cin.0000000000000659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Frampton GK, Shepherd J, Pickett K, Griffiths G, Wyatt JC. Digital tools for the recruitment and retention of participants in randomised controlled trials: a systematic map. Trials 2020;21:478. [PMID: 32498690 PMCID: PMC7273688 DOI: 10.1186/s13063-020-04358-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 04/28/2020] [Indexed: 12/14/2022] Open

Abstract

BACKGROUND

Recruiting and retaining participants in randomised controlled trials (RCTs) is challenging. Digital tools, such as social media, data mining, email or text-messaging, could improve recruitment or retention, but an overview of this research area is lacking. We aimed to systematically map the characteristics of digital recruitment and retention tools for RCTs, and the features of the comparative studies that have evaluated the effectiveness of these tools during the past 10 years.

METHODS

We searched Medline, Embase, other databases, the Internet, and relevant web sites in July 2018 to identify comparative studies of digital tools for recruiting and/or retaining participants in health RCTs. Two reviewers independently screened references against protocol-specified eligibility criteria. Included studies were coded by one reviewer with 20% checked by a second reviewer, using pre-defined keywords to describe characteristics of the studies, populations and digital tools evaluated.

RESULTS

We identified 9163 potentially relevant references, of which 104 articles reporting 105 comparative studies were included in the systematic map. The number of published studies on digital tools has doubled in the past decade, but most studies evaluated digital tools for recruitment rather than retention. The key health areas investigated were health promotion, cancers, circulatory system diseases and mental health. Few studies focussed on minority or under-served populations, and most studies were observational. The most frequently-studied digital tools were social media, Internet sites, email and tv/radio for recruitment; and email and text-messaging for retention. One quarter of the studies measured efficiency (cost per recruited or retained participant) but few studies have evaluated people's attitudes towards the use of digital tools.

CONCLUSIONS

This systematic map highlights a number of evidence gaps and may help stakeholders to identify and prioritise further research needs. In particular, there is a need for rigorous research on the efficiency of the digital tools and their impact on RCT participants and investigators, perhaps as studies-within-a-trial (SWAT) research. There is also a need for research into how digital tools may improve participant retention in RCTs which is currently underrepresented relative to recruitment research.

REGISTRATION

Not registered; based on a pre-specified protocol, peer-reviewed by the project's Advisory Board.

Collapse

Alexander M, Solomon B, Ball DL, Sheerin M, Dankwa-Mullan I, Preininger AM, Jackson GP, Herath DM. Evaluation of an artificial intelligence clinical trial matching system in Australian lung cancer patients. JAMIA Open 2020;3:209-215. [PMID: 32734161 PMCID: PMC7382632 DOI: 10.1093/jamiaopen/ooaa002] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 01/31/2020] [Indexed: 11/21/2022] Open

Hassanzadeh H, Karimi S, Nguyen A. Matching patients to clinical trials using semantically enriched document representation. J Biomed Inform 2020;105:103406. [DOI: 10.1016/j.jbi.2020.103406] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 01/28/2020] [Accepted: 03/02/2020] [Indexed: 12/16/2022]

Ni Y, Barzman D, Bachtel A, Griffey M, Osborn A, Sorter M. Finding warning markers: Leveraging natural language processing and machine learning technologies to detect risk of school violence. Int J Med Inform 2020;139:104137. [PMID: 32361146 DOI: 10.1016/j.ijmedinf.2020.104137] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 02/20/2020] [Accepted: 03/28/2020] [Indexed: 10/24/2022]

Abstract

INTRODUCTION

School violence has a far-reaching effect, impacting the entire school population including staff, students and their families. Among youth attending the most violent schools, studies have reported higher dropout rates, poor school attendance, and poor scholastic achievement. It was noted that the largest crime-prevention results occurred when youth at elevated risk were given an individualized prevention program. However, much work is needed to establish an effective approach to identify at-risk subjects.

OBJECTIVE

In our earlier research, we developed a risk assessment program to interview subjects, identify risk and protective factors, and evaluate risk for school violence. This study focused on developing natural language processing (NLP) and machine learning technologies to automate the risk assessment process.

MATERIAL AND METHODS

We prospectively recruited 131 students with or without behavioral concerns from 89 schools between 05/01/2015 and 04/30/2018. The subjects were interviewed with two risk assessment scales and a questionnaire, and their risk of violence were determined by pediatric psychiatrists based on clinical judgment. Using NLP technologies, different types of linguistic features were extracted from the interview content. Machine learning classifiers were then applied to predict risk of school violence for individual subjects. A two-stage feature selection was implemented to identify violence-related predictors. The performance was validated on the psychiatrist-generated reference standard of risk levels, where positive predictive value (PPV), sensitivity (SEN), negative predictive value (NPV), specificity (SPEC) and area under the ROC curve (AUC) were assessed.

RESULTS

Compared to subjects' sociodemographic information, use of linguistic features significantly improved classifiers' predictive performance (P < 0.01). The best-performing classifier with n-gram features achieved 86.5 %/86.5 %/85.7 %/85.7 %/94.0 % (PPV/SEN/NPV/SPEC/AUC) on the cross-validation set and 83.3 %/93.8 %/91.7 %/78.6 %/94.6 % (PPV/SEN/NPV/SPEC/AUC) on the test data. The feature selection process identified a set of predictors covering the discussion of subjects' thoughts, perspectives, behaviors, individual characteristics, peers and family dynamics, and protective factors.

CONCLUSIONS

By analyzing the content from subject interviews, the NLP and machine learning algorithms showed good capacity for detecting risk of school violence. The feature selection uncovered multiple warning markers that could deliver useful clinical insights to assist personalizing intervention. Consequently, the developed approach offered the promise of an accurate and scalable computerized screening service for preventing school violence.

Collapse

Tissot HC, Shah AD, Brealey D, Harris S, Agbakoba R, Folarin A, Romao L, Roguski L, Dobson R, Asselbergs FW. Natural Language Processing for Mimicking Clinical Trial Recruitment in Critical Care: A Semi-Automated Simulation Based on the LeoPARDS Trial. IEEE J Biomed Health Inform 2020;24:2950-2959. [PMID: 32149659 DOI: 10.1109/jbhi.2020.2977925] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

A Time-and-Motion Study of Clinical Trial Eligibility Screening in a Pediatric Emergency Department. Pediatr Emerg Care 2019;35:868-873. [PMID: 30281551 PMCID: PMC6445787 DOI: 10.1097/pec.0000000000001592] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Spasic I, Krzeminski D, Corcoran P, Balinsky A. Cohort Selection for Clinical Trials From Longitudinal Patient Records: Text Mining Approach. JMIR Med Inform 2019;7:e15980. [PMID: 31674914 PMCID: PMC6913747 DOI: 10.2196/15980] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 09/29/2019] [Accepted: 10/02/2019] [Indexed: 12/17/2022] Open

Abstract

Background

Clinical trials are an important step in introducing new interventions into clinical practice by generating data on their safety and efficacy. Clinical trials need to ensure that participants are similar so that the findings can be attributed to the interventions studied and not to some other factors. Therefore, each clinical trial defines eligibility criteria, which describe characteristics that must be shared by the participants. Unfortunately, the complexities of eligibility criteria may not allow them to be translated directly into readily executable database queries. Instead, they may require careful analysis of the narrative sections of medical records. Manual screening of medical records is time consuming, thus negatively affecting the timeliness of the recruitment process.

Objective

Track 1 of the 2018 National Natural Language Processing Clinical Challenge focused on the task of cohort selection for clinical trials, aiming to answer the following question: Can natural language processing be applied to narrative medical records to identify patients who meet eligibility criteria for clinical trials? The task required the participating systems to analyze longitudinal patient records to determine if the corresponding patients met the given eligibility criteria. We aimed to describe a system developed to address this task.

Methods

Our system consisted of 13 classifiers, one for each eligibility criterion. All classifiers used a bag-of-words document representation model. To prevent the loss of relevant contextual information associated with such representation, a pattern-matching approach was used to extract context-sensitive features. They were embedded back into the text as lexically distinguishable tokens, which were consequently featured in the bag-of-words representation. Supervised machine learning was chosen wherever a sufficient number of both positive and negative instances was available to learn from. A rule-based approach focusing on a small set of relevant features was chosen for the remaining criteria.

Results

The system was evaluated using microaveraged F measure. Overall, 4 machine algorithms, including support vector machine, logistic regression, naïve Bayesian classifier, and gradient tree boosting (GTB), were evaluated on the training data using 10–fold cross-validation. Overall, GTB demonstrated the most consistent performance. Its performance peaked when oversampling was used to balance the training data. The final evaluation was performed on previously unseen test data. On average, the F measure of 89.04% was comparable to 3 of the top ranked performances in the shared task (91.11%, 90.28%, and 90.21%). With an F measure of 88.14%, we significantly outperformed these systems (81.03%, 78.50%, and 70.81%) in identifying patients with advanced coronary artery disease.

Conclusions

The holdout evaluation provides evidence that our system was able to identify eligible patients for the given clinical trial with high accuracy. Our approach demonstrates how rule-based knowledge infusion can improve the performance of machine learning algorithms even when trained on a relatively small dataset.

Collapse

Narayan VM, Dahm P. The future of clinical trials in urological oncology. Nat Rev Urol 2019;16:722-733. [PMID: 31605037 DOI: 10.1038/s41585-019-0243-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/19/2019] [Indexed: 12/11/2022]

The future of clinical trials in urological oncology. Nat Rev Urol 2019. [DOI: 10.1038/s41585-019-0243-x [internet]] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Kersloot MG, Lau F, Abu-Hanna A, Arts DL, Cornet R. Automated SNOMED CT concept and attribute relationship detection through a web-based implementation of cTAKES. J Biomed Semantics 2019;10:14. [PMID: 31533810 PMCID: PMC6749652 DOI: 10.1186/s13326-019-0207-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 08/13/2019] [Indexed: 12/05/2022] Open

Abstract

Background

Information in Electronic Health Records is largely stored as unstructured free text. Natural language processing (NLP), or Medical Language Processing (MLP) in medicine, aims at extracting structured information from free text, and is less expensive and time-consuming than manual extraction. However, most algorithms in MLP are institution-specific or address only one clinical need, and thus cannot be broadly applied. In addition, most MLP systems do not detect concepts in misspelled text and cannot detect attribute relationships between concepts. The objective of this study was to develop and evaluate an MLP application that includes generic algorithms for the detection of (misspelled) concepts and of attribute relationships between them.

Methods

An implementation of the MLP system cTAKES, called DIRECT, was developed with generic SNOMED CT concept filter, concept relationship detection, and attribute relationship detection algorithms and a custom dictionary. Four implementations of cTAKES were evaluated by comparing 98 manually annotated oncology charts with the output of DIRECT. The F₁-score was determined for named-entity recognition and attribute relationship detection for the concepts ‘lung cancer’, ‘non-small cell lung cancer’, and ‘recurrence’. The performance of the four implementations was compared with a two-tailed permutation test.

Results

DIRECT detected lung cancer and non-small cell lung cancer concepts with F₁-scores between 0.828 and 0.947 and between 0.862 and 0.933, respectively. The concept recurrence was detected with a significantly higher F₁-score of 0.921, compared to the other implementations, and the relationship between recurrence and lung cancer with an F₁-score of 0.857. The precision of the detection of lung cancer, non-small cell lung cancer, and recurrence concepts were 1.000, 0.966, and 0.879, compared to precisions of 0.943, 0.967, and 0.000 in the original implementation, respectively.

Conclusion

DIRECT can detect oncology concepts and attribute relationships with high precision and can detect recurrence with significant increase in F₁-score, compared to the original implementation of cTAKES, due to the usage of a custom dictionary and a generic concept relationship detection algorithm. These concepts and relationships can be used to encode clinical narratives, and can thus substantially reduce manual chart abstraction efforts, saving time for clinicians and researchers.

Collapse