1
|
Teoh JR, Dong J, Zuo X, Lai KW, Hasikin K, Wu X. Advancing healthcare through multimodal data fusion: a comprehensive review of techniques and applications. PeerJ Comput Sci 2024; 10:e2298. [PMID: 39650483 PMCID: PMC11623190 DOI: 10.7717/peerj-cs.2298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 08/09/2024] [Indexed: 12/11/2024]
Abstract
With the increasing availability of diverse healthcare data sources, such as medical images and electronic health records, there is a growing need to effectively integrate and fuse this multimodal data for comprehensive analysis and decision-making. However, despite its potential, multimodal data fusion in healthcare remains limited. This review paper provides an overview of existing literature on multimodal data fusion in healthcare, covering 69 relevant works published between 2018 and 2024. It focuses on methodologies that integrate different data types to enhance medical analysis, including techniques for integrating medical images with structured and unstructured data, combining multiple image modalities, and other features. Additionally, the paper reviews various approaches to multimodal data fusion, such as early, intermediate, and late fusion methods, and examines the challenges and limitations associated with these techniques. The potential benefits and applications of multimodal data fusion in various diseases are highlighted, illustrating specific strategies employed in healthcare artificial intelligence (AI) model development. This research synthesizes existing information to facilitate progress in using multimodal data for improved medical diagnosis and treatment planning.
Collapse
Affiliation(s)
- Jing Ru Teoh
- Department of Biomedical Engineering, University of Malaya, Kuala Lumpur, Malaysia
| | - Jian Dong
- China Electronics Standardization Institute, Beijing, China
| | - Xiaowei Zuo
- Department of Psychiatry, The Affiliated Xuzhou Oriental Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Khin Wee Lai
- Department of Biomedical Engineering, University of Malaya, Kuala Lumpur, Malaysia
| | - Khairunnisa Hasikin
- Department of Biomedical Engineering, University of Malaya, Kuala Lumpur, Malaysia
- Faculty of Engineering, Centre of Intelligent Systems for Emerging Technology (CISET), Kuala Lumpur, Malaysia
| | - Xiang Wu
- Department of Biomedical Engineering, University of Malaya, Kuala Lumpur, Malaysia
- Institute of Medical Information Security, Xuzhou, Jiangsu, China
| |
Collapse
|
2
|
Li J, Liao L, Jia M, Chen Z, Liu X. Latent relation shared learning for endometrial cancer diagnosis with incomplete multi-modality medical images. iScience 2024; 27:110509. [PMID: 39161958 PMCID: PMC11332793 DOI: 10.1016/j.isci.2024.110509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 05/22/2024] [Accepted: 07/11/2024] [Indexed: 08/21/2024] Open
Abstract
Magnetic resonance imaging (MRI), ultrasound (US), and contrast-enhanced ultrasound (CEUS) can provide different image data about uterus, which have been used in the preoperative assessment of endometrial cancer. In practice, not all the patients have complete multi-modality medical images due to the high cost or long examination period. Most of the existing methods need to perform data cleansing or discard samples with missing modalities, which will influence the performance of the model. In this work, we propose an incomplete multi-modality images data fusion method based on latent relation shared to overcome this limitation. The shared space contains the common latent feature representation and modality-specific latent feature representation from the complete and incomplete multi-modality data, which jointly exploits both consistent and complementary information among multiple images. The experimental results show that our method outperforms the current representative approaches in terms of classification accuracy, sensitivity, specificity, and area under curve (AUC). Furthermore, our method performs well under varying imaging missing rates.
Collapse
Affiliation(s)
- Jiaqi Li
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Beijing 100081, China
| | - Lejian Liao
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Beijing 100081, China
| | - Meihuizi Jia
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Beijing 100081, China
| | - Zhendong Chen
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, Beijing 100081, China
| | - Xin Liu
- Department of Ultrasound, Beijing Tiantan Hospital, Capital Medical University, Beijing 100070, China
| |
Collapse
|
3
|
Li Y, El Habib Daho M, Conze PH, Zeghlache R, Le Boité H, Tadayoni R, Cochener B, Lamard M, Quellec G. A review of deep learning-based information fusion techniques for multimodal medical image classification. Comput Biol Med 2024; 177:108635. [PMID: 38796881 DOI: 10.1016/j.compbiomed.2024.108635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 03/18/2024] [Accepted: 05/18/2024] [Indexed: 05/29/2024]
Abstract
Multimodal medical imaging plays a pivotal role in clinical diagnosis and research, as it combines information from various imaging modalities to provide a more comprehensive understanding of the underlying pathology. Recently, deep learning-based multimodal fusion techniques have emerged as powerful tools for improving medical image classification. This review offers a thorough analysis of the developments in deep learning-based multimodal fusion for medical classification tasks. We explore the complementary relationships among prevalent clinical modalities and outline three main fusion schemes for multimodal classification networks: input fusion, intermediate fusion (encompassing single-level fusion, hierarchical fusion, and attention-based fusion), and output fusion. By evaluating the performance of these fusion techniques, we provide insight into the suitability of different network architectures for various multimodal fusion scenarios and application domains. Furthermore, we delve into challenges related to network architecture selection, handling incomplete multimodal data management, and the potential limitations of multimodal fusion. Finally, we spotlight the promising future of Transformer-based multimodal fusion techniques and give recommendations for future research in this rapidly evolving field.
Collapse
Affiliation(s)
- Yihao Li
- LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France
| | - Mostafa El Habib Daho
- LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France.
| | | | - Rachid Zeghlache
- LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France
| | - Hugo Le Boité
- Sorbonne University, Paris, France; Ophthalmology Department, Lariboisière Hospital, AP-HP, Paris, France
| | - Ramin Tadayoni
- Ophthalmology Department, Lariboisière Hospital, AP-HP, Paris, France; Paris Cité University, Paris, France
| | - Béatrice Cochener
- LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France; Ophthalmology Department, CHRU Brest, Brest, France
| | - Mathieu Lamard
- LaTIM UMR 1101, Inserm, Brest, France; University of Western Brittany, Brest, France
| | | |
Collapse
|
4
|
Cui C, Yang H, Wang Y, Zhao S, Asad Z, Coburn LA, Wilson KT, Landman BA, Huo Y. Deep multimodal fusion of image and non-image data in disease diagnosis and prognosis: a review. PROGRESS IN BIOMEDICAL ENGINEERING (BRISTOL, ENGLAND) 2023; 5:10.1088/2516-1091/acc2fe. [PMID: 37360402 PMCID: PMC10288577 DOI: 10.1088/2516-1091/acc2fe] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/28/2023]
Abstract
The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on various images (e.g. radiology, pathology and camera images) and non-image data (e.g. clinical data and genomic data). However, such decision-making procedures can be subjective, qualitative, and have large inter-subject variabilities. With the recent advances in multimodal deep learning technologies, an increasingly large number of efforts have been devoted to a key question: how do we extract and aggregate multimodal information to ultimately provide more objective, quantitative computer-aided clinical decision making? This paper reviews the recent studies on dealing with such a question. Briefly, this review will include the (a) overview of current multimodal learning workflows, (b) summarization of multimodal fusion methods, (c) discussion of the performance, (d) applications in disease diagnosis and prognosis, and (e) challenges and future directions.
Collapse
Affiliation(s)
- Can Cui
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
| | - Haichun Yang
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
| | - Yaohong Wang
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
| | - Shilin Zhao
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
| | - Zuhayr Asad
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
| | - Lori A Coburn
- Division of Gastroenterology Hepatology, and Nutrition, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States of America
- Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN 37212, United States of America
| | - Keith T Wilson
- Department of Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, TN 37215, United States of America
- Division of Gastroenterology Hepatology, and Nutrition, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN 37232, United States of America
- Veterans Affairs Tennessee Valley Healthcare System, Nashville, TN 37212, United States of America
| | - Bennett A Landman
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN 37235, United States of America
| | - Yuankai Huo
- Department of Computer Science, Vanderbilt University, Nashville, TN 37235, United States of America
- Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN 37235, United States of America
| |
Collapse
|
5
|
Pei X, Zuo K, Li Y, Pang Z. A Review of the Application of Multi-modal Deep Learning in Medicine: Bibliometrics and Future Directions. INT J COMPUT INT SYS 2023. [DOI: 10.1007/s44196-023-00225-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023] Open
Abstract
AbstractIn recent years, deep learning has been applied in the field of clinical medicine to process large-scale medical images, for large-scale data screening, and in the diagnosis and efficacy evaluation of various major diseases. Multi-modal medical data fusion based on deep learning can effectively extract and integrate characteristic information of different modes, improve clinical applicability in diagnosis and medical evaluation, and provide quantitative analysis, real-time monitoring, and treatment planning. This study investigates the performance of existing multi-modal fusion pre-training algorithms and medical multi-modal fusion methods and compares their key characteristics, such as supported medical data, diseases, target samples, and implementation performance. Additionally, we present the main challenges and goals of the latest trends in multi-modal medical convergence. To provide a clearer perspective on new trends, we also analyzed relevant papers on the Web of Science. We obtain some meaningful results based on the annual development trends, country, institution, and journal-level research, highly cited papers, and research directions. Finally, we perform co-authorship analysis, co-citation analysis, co-occurrence analysis, and bibliographic coupling analysis using the VOSviewer software.
Collapse
|
6
|
Ascencio-Cabral A, Reyes-Aldasoro CC. Comparison of Convolutional Neural Networks and Transformers for the Classification of Images of COVID-19, Pneumonia and Healthy Individuals as Observed with Computed Tomography. J Imaging 2022; 8:237. [PMID: 36135403 PMCID: PMC9500990 DOI: 10.3390/jimaging8090237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 08/12/2022] [Accepted: 08/22/2022] [Indexed: 11/16/2022] Open
Abstract
In this work, the performance of five deep learning architectures in classifying COVID-19 in a multi-class set-up is evaluated. The classifiers were built on pretrained ResNet-50, ResNet-50r (with kernel size 5×5 in the first convolutional layer), DenseNet-121, MobileNet-v3 and the state-of-the-art CaiT-24-XXS-224 (CaiT) transformer. The cross entropy and weighted cross entropy were minimised with Adam and AdamW. In total, 20 experiments were conducted with 10 repetitions and obtained the following metrics: accuracy (Acc), balanced accuracy (BA), F1 and F2 from the general Fβ macro score, Matthew's Correlation Coefficient (MCC), sensitivity (Sens) and specificity (Spec) followed by bootstrapping. The performance of the classifiers was compared by using the Friedman-Nemenyi test. The results show that less complex architectures such as ResNet-50, ResNet-50r and DenseNet-121 were able to achieve better generalization with rankings of 1.53, 1.71 and 3.05 for the Matthew Correlation Coefficient, respectively, while MobileNet-v3 and CaiT obtained rankings of 3.72 and 5.0, respectively.
Collapse
|