1
|
Aletaha A, Nemati-Anaraki L, Keshtkar A, Sedghi S, Keramatfar A, Korolyova A. A Scoping Review of Adopted Information Extraction Methods for RCTs. Med J Islam Repub Iran 2023; 37:95. [PMID: 38021383 PMCID: PMC10657257 DOI: 10.47176/mjiri.37.95] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Indexed: 12/01/2023] Open
Abstract
Background Randomized controlled trials (RCTs) provide the strongest evidence for therapeutic interventions and their effects on groups of subjects. However, the large amount of unstructured information in these trials makes it challenging and time-consuming to make decisions and identify important concepts and valid evidence. This study aims to explore methods for automating or semi-automating information extraction from reports of RCT studies. Methods We conducted a systematic search of PubMed, ACM Digital Library, and Web of Science to identify relevant articles published between January 1, 2010, and 2022. We focused on published Natural Language Processing (NLP), machine learning, and deep learning methods that automate or semi-automate key elements of information extraction in the context of RCTs. Results A total of 26 publications were included, which discussed the automatic extraction of key characteristics of RCTs using various PICO frameworks (PIBOSO and PECODR). Among these publications, 14 (53.8%) extracted key characteristics based on PICO, PIBOSO, and PECODR, while 12 (46.1%) discussed information extraction methods in RCT studies. Common approaches mentioned included word/phrase matching, machine learning algorithms such as binary classification using the Naïve Bayes algorithm and powerful BERT network for feature extraction, support vector machine for data classification, conditional random field, non-machine-dependent automation, and machine learning or deep learning approaches. Conclusion The lack of publicly available software and limited access to existing software makes it difficult to determine the most powerful information extraction system. However, deep learning models like Transformers and BERT language models have shown better performance in natural language processing.
Collapse
Affiliation(s)
- Azadeh Aletaha
- Department of Medical Library and Information Science, School of Health
Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
- Evidence-Based Medicine Research Center, Endocrinology and Metabolism Clinical
Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Leila Nemati-Anaraki
- Department of Medical Library and Information Science, School of Health
Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
- Health Management and Economics Research Center, Health Management Research
Institute, Iran University of Medical Sciences, Tehran, Iran
| | - AbbasAli Keshtkar
- Department of Health Science Educational Development, School of Public Health,
Tehran University of Medical Sciences. Tehran, Iran
| | - Shahram Sedghi
- Department of Medical Library and Information Science, School of Health
Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
- Economics Research Center, Iran University of Medical Sciences, PO Box
14665-354, Tehran, Iran
| | | | - Anna Korolyova
- Computer Science Laboratory for Mechanics and Engineering Sciences (LIMSI),
CNRS, Universit´e Paris-Saclay, F-91405 Orsay, France
- School of Life Sciences and Facility Management Zurich University of Applied
Sciences (ZHAW)
- Fraser House, White Cross Business Park, Lancaster, LA1 4XQ
| |
Collapse
|
2
|
Jung HA, Lim J, Choi YL, Lee SH, Joung JG, Jeon YJ, Choi JW, Shin S, Cho JH, Kim HK, Choi YS, Zo JI, Shim YM, Park S, Sun JM, Ahn JS, Ahn MJ, Han J, Park WY, Kim J, Park K. Clinical, Pathologic, and Molecular Prognostic Factors in Patients with Early-Stage EGFR-Mutant NSCLC. Clin Cancer Res 2022; 28:4312-4321. [PMID: 35838647 DOI: 10.1158/1078-0432.ccr-22-0879] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 05/17/2022] [Accepted: 07/13/2022] [Indexed: 12/14/2022]
Abstract
PURPOSE In early-stage, EGFR mutation-positive (EGFR-M+) non-small cell lung cancer (NSCLC), surgery remains the primary treatment, without personalized adjuvant treatments. We aimed to identify risk factors for recurrence-free survival (RFS) to suggest personalized adjuvant strategies in resected early-stage EGFR-M+ NSCLC. EXPERIMENTAL DESIGN From January 2008 to August 2020, a total of 2,340 patients with pathologic stage (pStage) IB-IIIA, non-squamous NSCLC underwent curative surgery. To identify clinicopathologic risk factors, 1,181 patients with pStage IB-IIIA, common EGFR-M+ NSCLC who underwent surgical resection were analyzed. To identify molecular risk factors, comprehensive genomic analysis was conducted in 56 patients with matched case-controls (pStage II and IIIA and type of EGFR mutation). RESULTS Median follow-up duration was 38.8 months (0.5-156.2). Among 1,181 patients, pStage IB, II, and IIIA comprised 577 (48.9%), 331 (28.0%), and 273 (23.1%) subjects, respectively. Median RFS was 73.5 months [95% confidence interval (CI), 62.1-84.9], 48.7 months (95% CI, 41.2-56.3), and 22.7 months (95% CI, 19.4-26.0) for pStage IB, II, and IIIA, respectively (P < 0.001). In multivariate analysis of clinicopathologic risk factors, pStage, micropapillary subtype, vascular invasion, and pleural invasion, and pathologic classification by cell of origin (type II pneumocyte-like tumor cell vs. bronchial surface epithelial cell-like tumor cell) were associated with RFS. As molecular risk factors, the non-terminal respiratory unit (non-TRU) of the RNA subtype (HR, 3.49; 95% CI, 1.72-7.09; P < 0.01) and TP53 mutation (HR, 2.50; 95% CI, 1.24-5.04; P = 0.01) were associated with poor RFS independent of pStage II or IIIA. Among the patients with recurrence, progression-free survival of EGFR-tyrosine kinase inhibitor (TKI) in those with the Apolipoprotein B mRNA Editing Catalytic Polypeptide-like (APOBEC) mutation signature was inferior compared with that of patients without this signature (8.6 vs. 28.8 months; HR, 4.16; 95% CI, 1.28-13.46; P = 0.02). CONCLUSIONS The low-risk group with TRU subtype and TP53 wild-type without clinicopathologic risk factors might not need adjuvant EGFR-TKIs. In the high-risk group, with non-TRU subtype and/or TP 53 mutation, or clinicopathologic risk factors, a novel adjuvant strategy of EGFR-TKI with others, e.g., chemotherapy or antiangiogenic agents needs to be investigated. Given the poor outcome to EGFR-TKIs after recurrence in patients with the APOBEC mutation signature, an alternative adjuvant strategy might be needed.
Collapse
Affiliation(s)
- Hyun Ae Jung
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, School of Medicine, Sungkyunkwan University, Seoul, Republic of Korea
| | - Jinyeong Lim
- Department of Health Sciences and Technology, Samsung Advanced Institute for Health Science and Technology, Sungkyunkwan University, Seoul, Republic of Korea.,Samsung Genome Institute, Samsung Medical Center, Sungkyunkwan University, Seoul, Republic of Korea
| | - Yoon-La Choi
- Department of Pathology and Translational Genomics, Samsung Medical Center, School of Medicine, Sungkyunkwan University, Seoul, Republic of Korea
| | - Se-Hoon Lee
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, School of Medicine, Sungkyunkwan University, Seoul, Republic of Korea
| | - Je-Gun Joung
- Department of Biomedical Science, College of Life Science, CHA University, Seongnam, Republic of Korea
| | - Yeong Jeong Jeon
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jae Won Choi
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Sumin Shin
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jong Ho Cho
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Hong Kwan Kim
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Yong Soo Choi
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Jae Ill Zo
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Young Mog Shim
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Sehhoon Park
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, School of Medicine, Sungkyunkwan University, Seoul, Republic of Korea
| | - Jong-Mu Sun
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, School of Medicine, Sungkyunkwan University, Seoul, Republic of Korea
| | - Jin Seok Ahn
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, School of Medicine, Sungkyunkwan University, Seoul, Republic of Korea
| | - Myung-Ju Ahn
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, School of Medicine, Sungkyunkwan University, Seoul, Republic of Korea
| | - Joungho Han
- Department of Pathology and Translational Genomics, Samsung Medical Center, School of Medicine, Sungkyunkwan University, Seoul, Republic of Korea
| | - Woong-Yang Park
- Department of Health Sciences and Technology, Samsung Advanced Institute for Health Science and Technology, Sungkyunkwan University, Seoul, Republic of Korea.,Samsung Genome Institute, Samsung Medical Center, Sungkyunkwan University, Seoul, Republic of Korea
| | - Jhingook Kim
- Department of Thoracic and Cardiovascular Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
| | - Keunchil Park
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, School of Medicine, Sungkyunkwan University, Seoul, Republic of Korea
| |
Collapse
|
4
|
Chen X, Xie H, Cheng G, Poon LKM, Leng M, Wang FL. Trends and Features of the Applications of Natural Language Processing Techniques for Clinical Trials Text Analysis. Applied Sciences 2020; 10:2157. [DOI: 10.3390/app10062157] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Natural language processing (NLP) is an effective tool for generating structured information from unstructured data, the one that is commonly found in clinical trial texts. Such interdisciplinary research has gradually grown into a flourishing research field with accumulated scientific outputs available. In this study, bibliographical data collected from Web of Science, PubMed, and Scopus databases from 2001 to 2018 had been investigated with the use of three prominent methods, including performance analysis, science mapping, and, particularly, an automatic text analysis approach named structural topic modeling. Topical trend visualization and test analysis were further employed to quantify the effects of the year of publication on topic proportions. Topical diverse distributions across prolific countries/regions and institutions were also visualized and compared. In addition, scientific collaborations between countries/regions, institutions, and authors were also explored using social network analysis. The findings obtained were essential for facilitating the development of the NLP-enhanced clinical trial texts processing, boosting scientific and technological NLP-enhanced clinical trial research, and facilitating inter-country/region and inter-institution collaborations.
Collapse
|