1
|
Warner E, Lee J, Hsu W, Syeda-Mahmood T, Kahn Jr. CE, Gevaert O, Rao A. Multimodal Machine Learning in Image-Based and Clinical Biomedicine: Survey and Prospects. Int J Comput Vis 2024; 132:3753-3769. [PMID: 39211895 PMCID: PMC11349845 DOI: 10.1007/s11263-024-02032-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 02/09/2024] [Indexed: 09/04/2024]
Abstract
Machine learning (ML) applications in medical artificial intelligence (AI) systems have shifted from traditional and statistical methods to increasing application of deep learning models. This survey navigates the current landscape of multimodal ML, focusing on its profound impact on medical image analysis and clinical decision support systems. Emphasizing challenges and innovations in addressing multimodal representation, fusion, translation, alignment, and co-learning, the paper explores the transformative potential of multimodal models for clinical predictions. It also highlights the need for principled assessments and practical implementation of such models, bringing attention to the dynamics between decision support systems and healthcare providers and personnel. Despite advancements, challenges such as data biases and the scarcity of "big data" in many biomedical domains persist. We conclude with a discussion on principled innovation and collaborative efforts to further the mission of seamless integration of multimodal ML models into biomedical practice.
Collapse
Affiliation(s)
- Elisa Warner
- Department of Computational Medicine and Bioinformatics, University of Michigan Ann Arbor, 100 Washtenaw Ave., Ann Arbor, MI 48109 USA
| | - Joonsang Lee
- Department of Computational Medicine and Bioinformatics, University of Michigan Ann Arbor, 100 Washtenaw Ave., Ann Arbor, MI 48109 USA
| | - William Hsu
- Department of Medical and Imaging Informatics, University of California Los Angeles, 924 Westwood Blvd Ste 420, Los Angeles, CA 90024 USA
| | | | - Charles E. Kahn Jr.
- Department of Radiology, University of Pennsylvania, 3400 Spruce St., Philadelphia, PA 19104 USA
| | - Olivier Gevaert
- Center for Biomedical Informatics Research, Stanford, 1265 Welch Road, Stanford, CA 94305 USA
| | - Arvind Rao
- Department of Computational Medicine and Bioinformatics, University of Michigan Ann Arbor, 100 Washtenaw Ave., Ann Arbor, MI 48109 USA
| |
Collapse
|
2
|
Rizvi SA, Tang R, Jiang X, Ma X, Hu X. Local Contrastive Learning for Medical Image Recognition. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2024; 2023:1236-1245. [PMID: 38222415 PMCID: PMC10785845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
The proliferation of Deep Learning (DL)-based methods for radiographic image analysis has created a great demand for expert-labeled radiology data. Recent self-supervised frameworks have alleviated the need for expert labeling by obtaining supervision from associated radiology reports. These frameworks, however, struggle to distinguish the subtle differences between different pathologies in medical images. Additionally, many of them do not provide interpretation between image regions and text, making it difficult for radiologists to assess model predictions. In this work, we propose Local Region Contrastive Learning (LRCLR), a flexible fine-tuning framework that adds layers for significant image region selection as well as cross-modality interaction. Our results on an external validation set of chest x-rays suggest that LRCLR identifies significant local image regions and provides meaningful interpretation against radiology text while improving zero-shot performance on several chest x-ray medical findings.
Collapse
Affiliation(s)
| | | | | | - Xiaotian Ma
- University of Texas Health Science Center, Houston, TX
| | - Xia Hu
- Rice University, Houston, TX
| |
Collapse
|
3
|
Liu B, Lu D, Wei D, Wu X, Wang Y, Zhang Y, Zheng Y. Improving Medical Vision-Language Contrastive Pretraining With Semantics-Aware Triage. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3579-3589. [PMID: 37440389 DOI: 10.1109/tmi.2023.3294980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/15/2023]
Abstract
Medical contrastive vision-language pretraining has shown great promise in many downstream tasks, such as data-efficient/zero-shot recognition. Current studies pretrain the network with contrastive loss by treating the paired image-reports as positive samples and the unpaired ones as negative samples. However, unlike natural datasets, many medical images or reports from different cases could have large similarity especially for the normal cases, and treating all the unpaired ones as negative samples could undermine the learned semantic structure and impose an adverse effect on the representations. Therefore, we design a simple yet effective approach for better contrastive learning in medical vision-language field. Specifically, by simplifying the computation of similarity between medical image-report pairs into the calculation of the inter-report similarity, the image-report tuples are divided into positive, negative, and additional neutral groups. With this better categorization of samples, more suitable contrastive loss is constructed. For evaluation, we perform extensive experiments by applying the proposed model-agnostic strategy to two state-of-the-art pretraining frameworks. The consistent improvements on four common downstream tasks, including cross-modal retrieval, zero-shot/data-efficient image classification, and image segmentation, demonstrate the effectiveness of the proposed strategy in medical field.
Collapse
|
4
|
Cui C, Yang H, Wang Y, Zhao S, Asad Z, Coburn LA, Wilson KT, Landman BA, Huo Y. Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: a review. PROGRESS IN BIOMEDICAL ENGINEERING (BRISTOL, ENGLAND) 2023; 5:10.1088/2516-1091/acc2fe. [PMID: 37360402 PMCID: PMC10288577 DOI: 10.1088/2516-1091/acc2fe] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/28/2023]
Abstract
The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on various images (e.g. radiology, pathology and camera images) and non-image data (e.g. clinical data and genomic data). However, such decision-making procedures can be subjective, qualitative, and have large inter-subject variabilities. With the recent advances in multimodal deep learning technologies, an increasingly large number of efforts have been devoted to a key question: how do we extract and aggregate multimodal information to ultimately provide more objective, quantitative computer-aided clinical decision making? This paper reviews the recent studies on dealing with such a question. Briefly, this review will include the (a) overview of current multimodal learning workflows, (b) summarization of multimodal fusion methods, (c) discussion of the performance, (d) applications in disease diagnosis and prognosis, and (e) challenges and future directions.
Collapse
Affiliation(s)
- Can Cui
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
| | - Haichun Yang
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
| | - Yaohong Wang
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
| | - Shilin Zhao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
| | - Zuhayr Asad
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
| | - Lori A Coburn
- Division of Gastroenterology Hepatology, and Nutrition, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States of America
- Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN 37212, United States of America
| | - Keith T Wilson
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
- Division of Gastroenterology Hepatology, and Nutrition, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States of America
- Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN 37212, United States of America
| | - Bennett A Landman
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN 37235, United States of America
| | - Yuankai Huo
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN 37235, United States of America
| |
Collapse
|
5
|
Pachade S, Datta S, Dong Y, Salazar-Marioni S, Abdelkhaleq R, Niktabe A, Roberts K, Sheth SA, Giancardo L. SELF-SUPERVISED LEARNING WITH RADIOLOGY REPORTS, A COMPARATIVE ANALYSIS OF STRATEGIES FOR LARGE VESSEL OCCLUSION AND BRAIN CTA IMAGES. PROCEEDINGS. IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING 2023; 2023:10.1109/isbi53787.2023.10230623. [PMID: 37711217 PMCID: PMC10498780 DOI: 10.1109/isbi53787.2023.10230623] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/16/2023]
Abstract
Scarcity of labels for medical images is a significant barrier for training representation learning approaches based on deep neural networks. This limitation is also present when using imaging data collected during routine clinical care stored in picture archiving communication systems (PACS), as these data rarely have attached the high-quality labels required for medical image computing tasks. However, medical images extracted from PACS are commonly coupled with descriptive radiology reports that contain significant information and could be leveraged to pre-train imaging models, which could serve as starting points for further task-specific fine-tuning. In this work, we perform a head-to-head comparison of three different self-supervised strategies to pre-train the same imaging model on 3D brain computed tomography angiogram (CTA) images, with large vessel occlusion (LVO) detection as the downstream task. These strategies evaluate two natural language processing (NLP) approaches, one to extract 100 explicit radiology concepts (Rad-SpatialNet) and the other to create general-purpose radiology reports embeddings (DistilBERT). In addition, we experiment with learning radiology concepts directly or by using a recent self-supervised learning approach (CLIP) that learns by ranking the distance between language and image vector embeddings. The LVO detection task was selected because it requires 3D imaging data, is clinically important, and requires the algorithm to learn outputs not explicitly stated in the radiology report. Pre-training was performed on an unlabeled dataset containing 1,542 3D CTA - reports pairs. The downstream task was tested on a labeled dataset of 402 subjects for LVO. We find that the pre-training performed with CLIP-based strategies improve the performance of the imaging model to detect LVO compared to a model trained only on the labeled data. The best performance was achieved by pre-training using the explicit radiology concepts and CLIP strategy.
Collapse
Affiliation(s)
- S Pachade
- School of Biomedical Informatics, University of Texas Health Science Center at Houston (UTHealth), Houston, TX 77030
| | - S Datta
- School of Biomedical Informatics, University of Texas Health Science Center at Houston (UTHealth), Houston, TX 77030
| | - Y Dong
- School of Biomedical Informatics, University of Texas Health Science Center at Houston (UTHealth), Houston, TX 77030
| | | | - R Abdelkhaleq
- McGovern Medical School, UTHealth, Houston, TX 77030, USA
| | - A Niktabe
- McGovern Medical School, UTHealth, Houston, TX 77030, USA
| | - K Roberts
- School of Biomedical Informatics, University of Texas Health Science Center at Houston (UTHealth), Houston, TX 77030
| | - S A Sheth
- McGovern Medical School, UTHealth, Houston, TX 77030, USA
| | - L Giancardo
- School of Biomedical Informatics, University of Texas Health Science Center at Houston (UTHealth), Houston, TX 77030
- Institute for Stroke and Cerebrovascular Diseases, UTHealth, Houston, TX 77030, USA
| |
Collapse
|
6
|
Event-Based Clinical Finding Extraction from Radiology Reports with Pre-trained Language Model. J Digit Imaging 2023; 36:91-104. [PMID: 36253581 PMCID: PMC9576130 DOI: 10.1007/s10278-022-00717-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2021] [Revised: 08/31/2022] [Accepted: 09/30/2022] [Indexed: 11/16/2022] Open
Abstract
Radiology reports contain a diverse and rich set of clinical abnormalities documented by radiologists during their interpretation of the images. Comprehensive semantic representations of radiological findings would enable a wide range of secondary use applications to support diagnosis, triage, outcomes prediction, and clinical research. In this paper, we present a new corpus of radiology reports annotated with clinical findings. Our annotation schema captures detailed representations of pathologic findings that are observable on imaging ("lesions") and other types of clinical problems ("medical problems"). The schema used an event-based representation to capture fine-grained details, including assertion, anatomy, characteristics, size, and count. Our gold standard corpus contained a total of 500 annotated computed tomography (CT) reports. We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT. We then predicted the linkages between trigger and argument entities (referred to as argument roles) using a BERT-based relation extraction model. We achieved the best extraction performance using a BERT model pre-trained on 3 million radiology reports from our institution: 90.9-93.4% F1 for finding triggers and 72.0-85.6% F1 for argument roles. To assess model generalizability, we used an external validation set randomly sampled from the MIMIC Chest X-ray (MIMIC-CXR) database. The extraction performance on this validation set was 95.6% for finding triggers and 79.1-89.7% for argument roles, demonstrating that the model generalized well to the cross-institutional data with a different imaging modality. We extracted the finding events from all the radiology reports in the MIMIC-CXR database and provided the extractions to the research community.
Collapse
|
7
|
Mostafa FA, Elrefaei LA, Fouda MM, Hossam A. A Survey on AI Techniques for Thoracic Diseases Diagnosis Using Medical Images. Diagnostics (Basel) 2022; 12:3034. [PMID: 36553041 PMCID: PMC9777249 DOI: 10.3390/diagnostics12123034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/20/2022] [Accepted: 11/22/2022] [Indexed: 12/12/2022] Open
Abstract
Thoracic diseases refer to disorders that affect the lungs, heart, and other parts of the rib cage, such as pneumonia, novel coronavirus disease (COVID-19), tuberculosis, cardiomegaly, and fracture. Millions of people die every year from thoracic diseases. Therefore, early detection of these diseases is essential and can save many lives. Earlier, only highly experienced radiologists examined thoracic diseases, but recent developments in image processing and deep learning techniques are opening the door for the automated detection of these diseases. In this paper, we present a comprehensive review including: types of thoracic diseases; examination types of thoracic images; image pre-processing; models of deep learning applied to the detection of thoracic diseases (e.g., pneumonia, COVID-19, edema, fibrosis, tuberculosis, chronic obstructive pulmonary disease (COPD), and lung cancer); transfer learning background knowledge; ensemble learning; and future initiatives for improving the efficacy of deep learning models in applications that detect thoracic diseases. Through this survey paper, researchers may be able to gain an overall and systematic knowledge of deep learning applications in medical thoracic images. The review investigates a performance comparison of various models and a comparison of various datasets.
Collapse
Affiliation(s)
- Fatma A. Mostafa
- Department of Electrical Engineering, Faculty of Engineering at Shoubra, Benha University, Cairo 11672, Egypt
| | - Lamiaa A. Elrefaei
- Department of Electrical Engineering, Faculty of Engineering at Shoubra, Benha University, Cairo 11672, Egypt
| | - Mostafa M. Fouda
- Department of Electrical and Computer Engineering, College of Science and Engineering, Idaho State University, Pocatello, ID 83209, USA
| | - Aya Hossam
- Department of Electrical Engineering, Faculty of Engineering at Shoubra, Benha University, Cairo 11672, Egypt
| |
Collapse
|
8
|
Moukheiber D, Mahindre S, Moukheiber L, Moukheiber M, Wang S, Ma C, Shih G, Peng Y, Gao M. Few-Shot Learning Geometric Ensemble for Multi-label Classification of Chest X-Rays. DATA AUGMENTATION, LABELLING, AND IMPERFECTIONS : SECOND MICCAI WORKSHOP, DALI 2022, HELD IN CONJUNCTION WITH MICCAI 2022, SINGAPORE, SEPTEMBER 22, 2022, PROCEEDINGS. DALI (WORKSHOP) (2ND : 2022 : SINGAPORE) 2022; 13567:112-122. [PMID: 36383493 PMCID: PMC9652771 DOI: 10.1007/978-3-031-17027-0_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
This paper aims to identify uncommon cardiothoracic diseases and patterns on chest X-ray images. Training a machine learning model to classify rare diseases with multi-label indications is challenging without sufficient labeled training samples. Our model leverages the information from common diseases and adapts to perform on less common mentions. We propose to use multi-label few-shot learning (FSL) schemes including neighborhood component analysis loss, generating additional samples using distribution calibration and fine-tuning based on multi-label classification loss. We utilize the fact that the widely adopted nearest neighbor-based FSL schemes like ProtoNet are Voronoi diagrams in feature space. In our method, the Voronoi diagrams in the features space generated from multi-label schemes are combined into our geometric DeepVoro Multi-label ensemble. The improved performance in multi-label few-shot classification using the multi-label ensemble is demonstrated in our experiments (The code is publicly available at https://github.com/Saurabh7/Few-shot-learning-multilabel-cxray).
Collapse
Affiliation(s)
| | - Saurabh Mahindre
- University at Buffalo, The State University of New York, Buffalo, NY, USA
| | | | | | - Song Wang
- The University of Texas at Austin, Austin, TX, USA
| | - Chunwei Ma
- University at Buffalo, The State University of New York, Buffalo, NY, USA
| | | | - Yifan Peng
- Weill Cornell Medicine, New York, NY, USA
| | - Mingchen Gao
- University at Buffalo, The State University of New York, Buffalo, NY, USA
| |
Collapse
|
9
|
Ji Z, Shaikh MA, Moukheiber D, Srihari SN, Peng Y, Gao M. Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment. MACHINE LEARNING IN MEDICAL IMAGING. MLMI (WORKSHOP) 2021; 12966:110-119. [PMID: 35647616 DOI: 10.1007/978-3-030-87589-3_12] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Self-supervised learning provides an opportunity to explore unlabeled chest X-rays and their associated free-text reports accumulated in clinical routine without manual supervision. This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports. The model was pre-trained on both the global image-sentence level and the local image region-word level for visual-textual matching. Both are bidirectionally constrained on Cross-Entropy based and ranking-based Triplet Matching Losses. The region-word matching is calculated using the attention mechanism without direct supervision about their mapping. The pre-trained multi-modal representation learning paves the way for downstream tasks concerning image and/or text encoding. We demonstrate the representation learning quality by cross-modality retrievals and multi-label classifications on two datasets: OpenI-IU and MIMIC-CXR. Our code is available at https://github.com/mshaikh2/JoImTeR_MLMI_2021.
Collapse
Affiliation(s)
- Zhanghexuan Ji
- Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Mohammad Abuzar Shaikh
- Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Dana Moukheiber
- Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Sargur N Srihari
- Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Yifan Peng
- Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Mingchen Gao
- Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| |
Collapse
|
10
|
Liao R, Moyer D, Cha M, Quigley K, Berkowitz S, Horng S, Golland P, Wells WM. Multimodal Representation Learning via Maximization of Local Mutual Information. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2021; 12902:273-283. [PMID: 36282980 PMCID: PMC9576150 DOI: 10.1007/978-3-030-87196-3_26] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We propose and demonstrate a representation learning approach by maximizing the mutual information between local features of images and text. The goal of this approach is to learn useful image representations by taking advantage of the rich information contained in the free text that describes the findings in the image. Our method trains image and text encoders by encouraging the resulting representations to exhibit high local mutual information. We make use of recent advances in mutual information estimation with neural network discriminators. We argue that the sum of local mutual information is typically a lower bound on the global mutual information. Our experimental results in the downstream image classification tasks demonstrate the advantages of using local features for image-text representation learning. Our code is available at: https://github.com/RayRuizhiLiao/mutual_info_img_txt.
Collapse
Affiliation(s)
- Ruizhi Liao
- CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Daniel Moyer
- CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Miriam Cha
- MIT Lincoln Laboratory, Lexington, MA, USA
| | | | - Seth Berkowitz
- Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Steven Horng
- Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
| | - Polina Golland
- CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - William M Wells
- CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA
- Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|