1
|
Li Y, Chen Q, Li H, Wang S, Chen N, Han T, Wang K, Yu Q, Cao Z, Tang J. MFNet: Meta-learning based on frequency-space mix for MRI segmentation in nasopharyngeal carcinoma. J Cell Mol Med 2024; 28:e18355. [PMID: 38685683 PMCID: PMC11058331 DOI: 10.1111/jcmm.18355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 04/07/2024] [Accepted: 04/11/2024] [Indexed: 05/02/2024] Open
Abstract
Deep learning techniques have been applied to medical image segmentation and demonstrated expert-level performance. Due to the poor generalization abilities of the models in the deployment in different centres, common solutions, such as transfer learning and domain adaptation techniques, have been proposed to mitigate this issue. However, these solutions necessitate retraining the models with target domain data and annotations, which limits their deployment in clinical settings in unseen domains. We evaluated the performance of domain generalization methods on the task of MRI segmentation of nasopharyngeal carcinoma (NPC) by collecting a new dataset of 321 patients with manually annotated MRIs from two hospitals. We transformed the modalities of MRI, including T1WI, T2WI and CE-T1WI, from the spatial domain to the frequency domain using Fourier transform. To address the bottleneck of domain generalization in MRI segmentation of NPC, we propose a meta-learning approach based on frequency domain feature mixing. We evaluated the performance of MFNet against existing techniques for generalizing NPC segmentation in terms of Dice and MIoU. Our method evidently outperforms the baseline in handling the generalization of NPC segmentation. The MF-Net clearly demonstrates its effectiveness for generalizing NPC MRI segmentation to unseen domains (Dice = 67.59%, MIoU = 75.74% T1W1). MFNet enhances the model's generalization capabilities by incorporating mixed-feature meta-learning. Our approach offers a novel perspective to tackle the domain generalization problem in the field of medical imaging by effectively exploiting the unique characteristics of medical images.
Collapse
Affiliation(s)
- Yin Li
- Department of OtorhinolaryngologyThe First People's Hospital of FoshanFoshanChina
| | - Qi Chen
- Department of RadiologyThe Second Affiliated Hospital of Anhui Medical UniversityHefeiChina
| | - Hao Li
- Department of Infectious Diseases, The First People's Hospital of Changde City, Xiangya School of MedicineCentral South UniversityChangdeChina
| | - Song Wang
- University of Electronic Science and Technology of ChinaChengduChina
| | - Nutan Chen
- Machine Learning Research Lab, Volkswagen GroupMunichGermany
| | - Ting Han
- Department of RadiologyThe First People's Hospital of FoshanFoshanChina
| | - Kai Wang
- Department of OtorhinolaryngologyThe First People's Hospital of FoshanFoshanChina
| | - Qingqing Yu
- Department of OtorhinolaryngologyThe First People's Hospital of FoshanFoshanChina
| | - Zhantao Cao
- Department of ResearchCETC Cyberspace Security Technology CO., LTD.ChengduChina
| | - Jun Tang
- Department of OtorhinolaryngologyThe First People's Hospital of FoshanFoshanChina
| |
Collapse
|
2
|
Loewinger G, Nunez RA, Mazumder R, Parmigiani G. Optimal ensemble construction for multistudy prediction with applications to mortality estimation. Stat Med 2024; 43:1774-1789. [PMID: 38396313 DOI: 10.1002/sim.10006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 10/12/2023] [Accepted: 12/22/2023] [Indexed: 02/25/2024]
Abstract
It is increasingly common to encounter prediction tasks in the biomedical sciences for which multiple datasets are available for model training. Common approaches such as pooling datasets before model fitting can produce poor out-of-study prediction performance when datasets are heterogeneous. Theoretical and applied work has shown multistudy ensembling to be a viable alternative that leverages the variability across datasets in a manner that promotes model generalizability. Multistudy ensembling uses a two-stage stacking strategy which fits study-specific models and estimates ensemble weights separately. This approach ignores, however, the ensemble properties at the model-fitting stage, potentially resulting in performance losses. Motivated by challenges in the estimation of COVID-attributable mortality, we propose optimal ensemble construction, an approach to multistudy stacking whereby we jointly estimate ensemble weights and parameters associated with study-specific models. We prove that limiting cases of our approach yield existing methods such as multistudy stacking and pooling datasets before model fitting. We propose an efficient block coordinate descent algorithm to optimize the loss function. We use our method to perform multicountry COVID-19 baseline mortality prediction. We show that when little data is available for a country before the onset of the pandemic, leveraging data from other countries can substantially improve prediction accuracy. We further compare and characterize the method's performance in data-driven simulations and other numerical experiments. Our method remains competitive with or outperforms multistudy stacking and other earlier methods in the COVID-19 data application and in a range of simulation settings.
Collapse
Affiliation(s)
- Gabriel Loewinger
- Machine Learning Team, National Institute on Mental Health, Bethesda, Maryland, USA
| | - Rolando Acosta Nunez
- Department of Biotatistics, Harvard School of Public Health, Boston, Massachusetts, USA
- Regeneron Pharmaceuticals Inc., Tarrytown, New York, USA
| | - Rahul Mazumder
- Operations Research Center and MIT Center for Statistics, MIT Sloan School of Management, Cambridge, Massachusetts, USA
| | - Giovanni Parmigiani
- Department of Biotatistics, Harvard School of Public Health, Boston, Massachusetts, USA
- Department of Data Science, Dana Farber Cancer Institute, Boston, Massachusetts, USA
| |
Collapse
|
3
|
Papadakis A, Spyrou E. A Multi-Modal Egocentric Activity Recognition Approach towards Video Domain Generalization. Sensors (Basel) 2024; 24:2491. [PMID: 38676108 PMCID: PMC11054491 DOI: 10.3390/s24082491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 04/08/2024] [Accepted: 04/10/2024] [Indexed: 04/28/2024]
Abstract
Egocentric activity recognition is a prominent computer vision task that is based on the use of wearable cameras. Since egocentric videos are captured through the perspective of the person wearing the camera, her/his body motions severely complicate the video content, imposing several challenges. In this work we propose a novel approach for domain-generalized egocentric human activity recognition. Typical approaches use a large amount of training data, aiming to cover all possible variants of each action. Moreover, several recent approaches have attempted to handle discrepancies between domains with a variety of costly and mostly unsupervised domain adaptation methods. In our approach we show that through simple manipulation of available source domain data and with minor involvement from the target domain, we are able to produce robust models, able to adequately predict human activity in egocentric video sequences. To this end, we introduce a novel three-stream deep neural network architecture combining elements of vision transformers and residual neural networks which are trained using multi-modal data. We evaluate the proposed approach using a challenging, egocentric video dataset and demonstrate its superiority over recent, state-of-the-art research works.
Collapse
Affiliation(s)
- Antonios Papadakis
- Department of Informatics and Telecommunications, National Kapodistrian University of Athens, 15772 Athens, Greece;
| | - Evaggelos Spyrou
- Department of Informatics and Telecommunications, University of Thessaly, 35100 Lamia, Greece
| |
Collapse
|
4
|
Liu X, Vafay Eslahi S, Marin T, Tiss A, Chemli Y, Huang Y, Johnson KA, El Fakhri G, Ouyang J. Cross noise level PET denoising with continuous adversarial domain generalization. Phys Med Biol 2024; 69:085001. [PMID: 38484401 DOI: 10.1088/1361-6560/ad341a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 03/14/2024] [Indexed: 04/04/2024]
Abstract
Objective.Performing positron emission tomography (PET) denoising within the image space proves effective in reducing the variance in PET images. In recent years, deep learning has demonstrated superior denoising performance, but models trained on a specific noise level typically fail to generalize well on different noise levels, due to inherent distribution shifts between inputs. The distribution shift usually results in bias in the denoised images. Our goal is to tackle such a problem using a domain generalization technique.Approach.We propose to utilize the domain generalization technique with a novel feature space continuous discriminator (CD) for adversarial training, using the fraction of events as a continuous domain label. The core idea is to enforce the extraction of noise-level invariant features. Thus minimizing the distribution divergence of latent feature representation for different continuous noise levels, and making the model general for arbitrary noise levels. We created three sets of 10%, 13%-22% (uniformly randomly selected), or 25% fractions of events from 9718F-MK6240 tau PET studies of 60 subjects. For each set, we generated 20 noise realizations. Training, validation, and testing were implemented using 1400, 120, and 420 pairs of 3D image volumes from the same or different sets. We used 3D UNet as the baseline and implemented CD to the continuous noise level training data of 13%-22% set.Main results.The proposed CD improves the denoising performance of our model trained in a 13%-22% fraction set for testing in both 10% and 25% fraction sets, measured by bias and standard deviation using full-count images as references. In addition, our CD method can improve the SSIM and PSNR consistently for Alzheimer-related regions and the whole brain.Significance.To our knowledge, this is the first attempt to alleviate the performance degradation in cross-noise level denoising from the perspective of domain generalization. Our study is also a pioneer work of continuous domain generalization to utilize continuously changing source domains.
Collapse
Affiliation(s)
- Xiaofeng Liu
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, United States of America
- Department of Radiology, Harvard Medical School, Boston, MA 02115, United States of America
- Department of Radiology and Biomedical Imaging, Yale University, New Haven, CT 06520, United States of America
| | - Samira Vafay Eslahi
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, United States of America
- Department of Radiology, Harvard Medical School, Boston, MA 02115, United States of America
| | - Thibault Marin
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, United States of America
- Department of Radiology, Harvard Medical School, Boston, MA 02115, United States of America
- Department of Radiology and Biomedical Imaging, Yale University, New Haven, CT 06520, United States of America
| | - Amal Tiss
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, United States of America
- Department of Radiology, Harvard Medical School, Boston, MA 02115, United States of America
| | - Yanis Chemli
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, United States of America
- Department of Radiology, Harvard Medical School, Boston, MA 02115, United States of America
- Department of Radiology and Biomedical Imaging, Yale University, New Haven, CT 06520, United States of America
| | - Yongsong Huang
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, United States of America
| | - Keith A Johnson
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, United States of America
- Department of Radiology, Harvard Medical School, Boston, MA 02115, United States of America
| | - Georges El Fakhri
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, United States of America
- Department of Radiology, Harvard Medical School, Boston, MA 02115, United States of America
- Department of Radiology and Biomedical Imaging, Yale University, New Haven, CT 06520, United States of America
| | - Jinsong Ouyang
- Gordon Center for Medical Imaging, Department of Radiology, Massachusetts General Hospital, Boston, MA 02114, United States of America
- Department of Radiology, Harvard Medical School, Boston, MA 02115, United States of America
- Department of Radiology and Biomedical Imaging, Yale University, New Haven, CT 06520, United States of America
| |
Collapse
|
5
|
Sahay R, Thomas G, Jahan CS, Manjrekar M, Popp D, Savakis A. On the Importance of Attention and Augmentations for Hypothesis Transfer in Domain Adaptation and Generalization. Sensors (Basel) 2023; 23:8409. [PMID: 37896503 PMCID: PMC10611075 DOI: 10.3390/s23208409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/27/2023] [Accepted: 10/10/2023] [Indexed: 10/29/2023]
Abstract
Unsupervised domain adaptation (UDA) aims to mitigate the performance drop due to the distribution shift between the training and testing datasets. UDA methods have achieved performance gains for models trained on a source domain with labeled data to a target domain with only unlabeled data. The standard feature extraction method in domain adaptation has been convolutional neural networks (CNNs). Recently, attention-based transformer models have emerged as effective alternatives for computer vision tasks. In this paper, we benchmark three attention-based architectures, specifically vision transformer (ViT), shifted window transformer (SWIN), and dual attention vision transformer (DAViT), against convolutional architectures ResNet, HRNet and attention-based ConvNext, to assess the performance of different backbones for domain generalization and adaptation. We incorporate these backbone architectures as feature extractors in the source hypothesis transfer (SHOT) framework for UDA. SHOT leverages the knowledge learned in the source domain to align the image features of unlabeled target data in the absence of source domain data, using self-supervised deep feature clustering and self-training. We analyze the generalization and adaptation performance of these models on standard UDA datasets and aerial UDA datasets. In addition, we modernize the training procedure commonly seen in UDA tasks by adding image augmentation techniques to help models generate richer features. Our results show that ConvNext and SWIN offer the best performance, indicating that the attention mechanism is very beneficial for domain generalization and adaptation with both transformer and convolutional architectures. Our ablation study shows that our modernized training recipe, within the SHOT framework, significantly boosts performance on aerial datasets.
Collapse
Affiliation(s)
| | | | | | | | | | - Andreas Savakis
- Rochester Institute of Technology, Rochester, NY 14623, USA; (R.S.); (C.S.J.)
| |
Collapse
|
6
|
Gordon SM, McDaniel JR, King KW, Lawhern VJ, Touryan J. Decoding neural activity to assess individual latent state in ecologically valid contexts. J Neural Eng 2023; 20:046033. [PMID: 37552980 DOI: 10.1088/1741-2552/acee20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 08/08/2023] [Indexed: 08/10/2023]
Abstract
Objective.Currently, there exists very few ways to isolate cognitive processes, historically defined via highly controlled laboratory studies, in more ecologically valid contexts. Specifically, it remains unclear as to what extent patterns of neural activity observed under such constraints actually manifest outside the laboratory in a manner that can be used to make accurate inferences about latent states, associated cognitive processes, or proximal behavior. Improving our understanding of when and how specific patterns of neural activity manifest in ecologically valid scenarios would provide validation for laboratory-based approaches that study similar neural phenomena in isolation and meaningful insight into the latent states that occur during complex tasks.Approach.Domain generalization methods, borrowed from the work of the brain-computer interface community, have the potential to capture high-dimensional patterns of neural activity in a way that can be reliably applied across experimental datasets in order to address this specific challenge. We previously used such an approach to decode phasic neural responses associated with visual target discrimination. Here, we extend that work to more tonic phenomena such as internal latent states. We use data from two highly controlled laboratory paradigms to train two separate domain-generalized models. We apply the trained models to an ecologically valid paradigm in which participants performed multiple, concurrent driving-related tasks while perched atop a six-degrees-of-freedom ride-motion simulator.Main Results.Using the pretrained models, we estimate latent state and the associated patterns of neural activity. As the patterns of neural activity become more similar to those patterns observed in the training data, we find changes in behavior and task performance that are consistent with the observations from the original, laboratory-based paradigms.Significance.These results lend ecological validity to the original, highly controlled, experimental designs and provide a methodology for understanding the relationship between neural activity and behavior during complex tasks.
Collapse
Affiliation(s)
| | | | - Kevin W King
- DCS Corporation, Alexandria, VA, United States of America
| | - Vernon J Lawhern
- DEVCOM Army Research Laboratory, Aberdeen Proving Ground, MD, United States of America
| | - Jonathan Touryan
- DEVCOM Army Research Laboratory, Aberdeen Proving Ground, MD, United States of America
| |
Collapse
|
7
|
Wozniak P, Ozog D. Cross-Domain Indoor Visual Place Recognition for Mobile Robot via Generalization Using Style Augmentation. Sensors (Basel) 2023; 23:6134. [PMID: 37447982 PMCID: PMC10346347 DOI: 10.3390/s23136134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/22/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023]
Abstract
The article presents an algorithm for the multi-domain visual recognition of an indoor place. It is based on a convolutional neural network and style randomization. The authors proposed a scene classification mechanism and improved the performance of the models based on synthetic and real data from various domains. In the proposed dataset, a domain change was defined as a camera model change. A dataset of images collected from several rooms was used to show different scenarios, human actions, equipment changes, and lighting conditions. The proposed method was tested in a scene classification problem where multi-domain data were used. The basis was a transfer learning approach with an extension style applied to various combinations of source and target data. The focus was on improving the unknown domain score and multi-domain support. The results of the experiments were analyzed in the context of data collected on a humanoid robot. The article shows that the average score was the highest for the use of multi-domain data and data style enhancement. The method of obtaining average results for the proposed method reached the level of 92.08%. The result obtained by another research team was corrected.
Collapse
Affiliation(s)
- Piotr Wozniak
- Department of Computer and Control Engineering, Faculty of Electrical and Computer Engineering, Rzeszow University of Technology, Al. Powstańców Warszawy 12, 35-959 Rzeszow, Poland;
| | | |
Collapse
|
8
|
Lin N, Zhao W, Liang S, Zhong M. Real-Time Segmentation of Unstructured Environments by Combining Domain Generalization and Attention Mechanisms. Sensors (Basel) 2023; 23:6008. [PMID: 37447855 DOI: 10.3390/s23136008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/23/2023] [Accepted: 06/27/2023] [Indexed: 07/15/2023]
Abstract
This paper presents a focused investigation into real-time segmentation in unstructured environments, a crucial aspect for enabling autonomous navigation in off-road robots. To address this challenge, an improved variant of the DDRNet23-slim model is proposed, which includes a lightweight network architecture and reclassifies ten different categories, including drivable roads, trees, high vegetation, obstacles, and buildings, based on the RUGD dataset. The model's design includes the integration of the semantic-aware normalization and semantic-aware whitening (SAN-SAW) module into the main network to improve generalization ability beyond the visible domain. The model's segmentation accuracy is improved through the fusion of channel attention and spatial attention mechanisms in the low-resolution branch to enhance its ability to capture fine details in complex scenes. Additionally, to tackle the issue of category imbalance in unstructured scene datasets, a rare class sampling strategy (RCS) is employed to mitigate the negative impact of low segmentation accuracy for rare classes on the overall performance of the model. Experimental results demonstrate that the improved model achieves a significant 14% increase mIoU in the invisible domain, indicating its strong generalization ability. With a parameter count of only 5.79M, the model achieves mAcc of 85.21% and mIoU of 77.75%. The model has been successfully deployed on a a Jetson Xavier NX ROS robot and tested in both real and simulated orchard environments. Speed optimization using TensorRT increased the segmentation speed to 30.17 FPS. The proposed model strikes a desirable balance between inference speed and accuracy and has good domain migration ability, making it applicable in various domains such as forestry rescue and intelligent agricultural orchard harvesting.
Collapse
Affiliation(s)
- Nuanchen Lin
- College of Electronic Engineering (College of Artificial Intelligence), South China Agricultural University, Guangzhou 510642, China
| | - Wenfeng Zhao
- College of Electronic Engineering (College of Artificial Intelligence), South China Agricultural University, Guangzhou 510642, China
| | - Shenghao Liang
- College of Electronic Engineering (College of Artificial Intelligence), South China Agricultural University, Guangzhou 510642, China
| | - Minyue Zhong
- College of Electronic Engineering (College of Artificial Intelligence), South China Agricultural University, Guangzhou 510642, China
| |
Collapse
|
9
|
Xiao L, Xu J, Zhao D, Shang E, Zhu Q, Dai B. Adversarial and Random Transformations for Robust Domain Adaptation and Generalization. Sensors (Basel) 2023; 23:s23115273. [PMID: 37300000 DOI: 10.3390/s23115273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 05/18/2023] [Accepted: 05/29/2023] [Indexed: 06/12/2023]
Abstract
Data augmentation has been widely used to improve generalization in training deep neural networks. Recent works show that using worst-case transformations or adversarial augmentation strategies can significantly improve accuracy and robustness. However, due to the non-differentiable properties of image transformations, searching algorithms such as reinforcement learning or evolution strategy have to be applied, which are not computationally practical for large-scale problems. In this work, we show that by simply applying consistency training with random data augmentation, state-of-the-art results on domain adaptation (DA) and generalization (DG) can be obtained. To further improve the accuracy and robustness with adversarial examples, we propose a differentiable adversarial data augmentation method based on spatial transformer networks (STNs). The combined adversarial and random-transformation-based method outperforms the state-of-the-art on multiple DA and DG benchmark datasets. Furthermore, the proposed method shows desirable robustness to corruption, which is also validated on commonly used datasets.
Collapse
Affiliation(s)
- Liang Xiao
- Unmanned Systems Technology Research Center, Defense Innovation Institute, Beijing 100071, China
| | - Jiaolong Xu
- Unmanned Systems Technology Research Center, Defense Innovation Institute, Beijing 100071, China
| | - Dawei Zhao
- Unmanned Systems Technology Research Center, Defense Innovation Institute, Beijing 100071, China
| | - Erke Shang
- Unmanned Systems Technology Research Center, Defense Innovation Institute, Beijing 100071, China
| | - Qi Zhu
- Unmanned Systems Technology Research Center, Defense Innovation Institute, Beijing 100071, China
| | - Bin Dai
- Unmanned Systems Technology Research Center, Defense Innovation Institute, Beijing 100071, China
| |
Collapse
|
10
|
Zhang S, Nie W. Multi-Domain Feature Alignment for Face Anti-Spoofing. Sensors (Basel) 2023; 23:4077. [PMID: 37112418 PMCID: PMC10144369 DOI: 10.3390/s23084077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 04/06/2023] [Accepted: 04/12/2023] [Indexed: 06/19/2023]
Abstract
Face anti-spoofing is critical for enhancing the robustness of face recognition systems against presentation attacks. Existing methods predominantly rely on binary classification tasks. Recently, methods based on domain generalization have yielded promising results. However, due to distribution discrepancies between various domains, the differences in the feature space related to the domain considerably hinder the generalization of features from unfamiliar domains. In this work, we propose a multi-domain feature alignment framework (MADG) that addresses poor generalization when multiple source domains are distributed in the scattered feature space. Specifically, an adversarial learning process is designed to narrow the differences between domains, achieving the effect of aligning the features of multiple sources, thus resulting in multi-domain alignment. Moreover, to further improve the effectiveness of our proposed framework, we incorporate multi-directional triplet loss to achieve a higher degree of separation in the feature space between fake and real faces. To evaluate the performance of our method, we conducted extensive experiments on several public datasets. The results demonstrate that our proposed approach outperforms current state-of-the-art methods, thereby validating its effectiveness in face anti-spoofing.
Collapse
|
11
|
Luo X, Meratnia N. A Codeword-Independent Localization Technique for Reconfigurable Intelligent Surface Enhanced Environments Using Adversarial Learning. Sensors (Basel) 2023; 23:984. [PMID: 36679782 PMCID: PMC9865069 DOI: 10.3390/s23020984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 01/01/2023] [Accepted: 01/11/2023] [Indexed: 06/17/2023]
Abstract
Reconfigurable Intelligent Surfaces (RISs) not only enable software-defined radio in modern wireless communication networks but also have the potential to be utilized for localization. Most previous works used channel matrices to calculate locations, requiring extensive field measurements, which leads to rapidly growing complexity. Although a few studies have designed fingerprint-based systems, they are only feasible under an unrealistic assumption that the RIS will be deployed only for localization purposes. Additionally, all these methods utilize RIS codewords for location inference, inducing considerable communication burdens. In this paper, we propose a new localization technique for RIS-enhanced environments that does not require RIS codewords for online location inference. Our proposed approach extracts codeword-independent representations of fingerprints using a domain adversarial neural network. We evaluated our solution using the DeepMIMO dataset. Due to the lack of results from other studies, for fair comparisons, we define oracle and baseline cases, which are the theoretical upper and lower bounds of our system, respectively. In all experiments, our proposed solution performed much more similarly to the oracle cases than the baseline cases, demonstrating the effectiveness and robustness of our method.
Collapse
|
12
|
Bento N, Rebelo J, Barandas M, Carreiro AV, Campagner A, Cabitza F, Gamboa H. Comparing Handcrafted Features and Deep Neural Representations for Domain Generalization in Human Activity Recognition. Sensors (Basel) 2022; 22:s22197324. [PMID: 36236427 PMCID: PMC9572241 DOI: 10.3390/s22197324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/21/2022] [Accepted: 09/23/2022] [Indexed: 06/02/2023]
Abstract
Human Activity Recognition (HAR) has been studied extensively, yet current approaches are not capable of generalizing across different domains (i.e., subjects, devices, or datasets) with acceptable performance. This lack of generalization hinders the applicability of these models in real-world environments. As deep neural networks are becoming increasingly popular in recent work, there is a need for an explicit comparison between handcrafted and deep representations in Out-of-Distribution (OOD) settings. This paper compares both approaches in multiple domains using homogenized public datasets. First, we compare several metrics to validate three different OOD settings. In our main experiments, we then verify that even though deep learning initially outperforms models with handcrafted features, the situation is reversed as the distance from the training distribution increases. These findings support the hypothesis that handcrafted features may generalize better across specific domains.
Collapse
Affiliation(s)
- Nuno Bento
- Associação Fraunhofer Portugal Research, Rua Alfredo Allen 455/461, 4200-135 Porto, Portugal
| | - Joana Rebelo
- Associação Fraunhofer Portugal Research, Rua Alfredo Allen 455/461, 4200-135 Porto, Portugal
| | - Marília Barandas
- Associação Fraunhofer Portugal Research, Rua Alfredo Allen 455/461, 4200-135 Porto, Portugal
- Laboratório de Instrumentação, Engenharia Biomédica e Física da Radiação (LIBPhys–UNL), Departamento de Física, Faculdade de Ciências e Tecnologia (FCT), Universidade Nova de Lisboa, 2829-516 Caparica, Portugal
| | - André V. Carreiro
- Associação Fraunhofer Portugal Research, Rua Alfredo Allen 455/461, 4200-135 Porto, Portugal
| | - Andrea Campagner
- Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, 20126 Milan, Italy
| | - Federico Cabitza
- Dipartimento di Informatica, Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca, 20126 Milan, Italy
- IRCCS Istituto Ortopedico Galeazzi, 20161 Milan, Italy
| | - Hugo Gamboa
- Associação Fraunhofer Portugal Research, Rua Alfredo Allen 455/461, 4200-135 Porto, Portugal
- Laboratório de Instrumentação, Engenharia Biomédica e Física da Radiação (LIBPhys–UNL), Departamento de Física, Faculdade de Ciências e Tecnologia (FCT), Universidade Nova de Lisboa, 2829-516 Caparica, Portugal
| |
Collapse
|
13
|
Zakia U, Menon C. Force Myography-Based Human Robot Interactions via Deep Domain Adaptation and Generalization. Sensors (Basel) 2021; 22:s22010211. [PMID: 35009752 PMCID: PMC8749939 DOI: 10.3390/s22010211] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 12/25/2021] [Accepted: 12/27/2021] [Indexed: 05/20/2023]
Abstract
Estimating applied force using force myography (FMG) technique can be effective in human-robot interactions (HRI) using data-driven models. A model predicts well when adequate training and evaluation are observed in same session, which is sometimes time consuming and impractical. In real scenarios, a pretrained transfer learning model predicting forces quickly once fine-tuned to target distribution would be a favorable choice and hence needs to be examined. Therefore, in this study a unified supervised FMG-based deep transfer learner (SFMG-DTL) model using CNN architecture was pretrained with multiple sessions FMG source data (Ds, Ts) and evaluated in estimating forces in separate target domains (Dt, Tt) via supervised domain adaptation (SDA) and supervised domain generalization (SDG). For SDA, case (i) intra-subject evaluation (Ds ≠ Dt-SDA, Ts ≈ Tt-SDA) was examined, while for SDG, case (ii) cross-subject evaluation (Ds ≠ Dt-SDG, Ts ≠ Tt-SDG) was examined. Fine tuning with few "target training data" calibrated the model effectively towards target adaptation. The proposed SFMG-DTL model performed better with higher estimation accuracies and lower errors (R2 ≥ 88%, NRMSE ≤ 0.6) in both cases. These results reveal that interactive force estimations via transfer learning will improve daily HRI experiences where "target training data" is limited, or faster adaptation is required.
Collapse
Affiliation(s)
- Umme Zakia
- Menrva Research Group, Schools of Mechatronic Systems Engineering and Engineering Science, Simon Fraser University, Metro Vancouver, BC V5A 1S6, Canada;
| | - Carlo Menon
- Menrva Research Group, Schools of Mechatronic Systems Engineering and Engineering Science, Simon Fraser University, Metro Vancouver, BC V5A 1S6, Canada;
- Biomedical and Mobile Health Technology Laboratory, ETH Zurich, Lengghalde 5, 8008 Zurich, Switzerland
- Correspondence: ; Tel.: +1-778-782-9338; Fax: +1-778-782-7514
| |
Collapse
|
14
|
Lee K, Dobbins NJ, McInnes B, Yetisgen M, Uzuner Ö. Transferability of neural network clinical deidentification systems. J Am Med Inform Assoc 2021; 28:2661-2669. [PMID: 34586386 DOI: 10.1093/jamia/ocab207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 07/19/2021] [Accepted: 09/10/2021] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Neural network deidentification studies have focused on individual datasets. These studies assume the availability of a sufficient amount of human-annotated data to train models that can generalize to corresponding test data. In real-world situations, however, researchers often have limited or no in-house training data. Existing systems and external data can help jump-start deidentification on in-house data; however, the most efficient way of utilizing existing systems and external data is unclear. This article investigates the transferability of a state-of-the-art neural clinical deidentification system, NeuroNER, across a variety of datasets, when it is modified architecturally for domain generalization and when it is trained strategically for domain transfer. MATERIALS AND METHODS We conducted a comparative study of the transferability of NeuroNER using 4 clinical note corpora with multiple note types from 2 institutions. We modified NeuroNER architecturally to integrate 2 types of domain generalization approaches. We evaluated each architecture using 3 training strategies. We measured transferability from external sources; transferability across note types; the contribution of external source data when in-domain training data are available; and transferability across institutions. RESULTS AND CONCLUSIONS Transferability from a single external source gave inconsistent results. Using additional external sources consistently yielded an F1-score of approximately 80%. Fine-tuning emerged as a dominant transfer strategy, with or without domain generalization. We also found that external sources were useful even in cases where in-domain training data were available. Transferability across institutions differed by note type and annotation label but resulted in improved performance.
Collapse
Affiliation(s)
- Kahyun Lee
- Department of Information Science and Technology, George Mason University, Fairfax, Virginia, USA
| | - Nicholas J Dobbins
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - Bridget McInnes
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA
| | - Meliha Yetisgen
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - Özlem Uzuner
- Department of Information Science and Technology, George Mason University, Fairfax, Virginia, USA
| |
Collapse
|
15
|
Bian W, Chen Y, Ye X, Zhang Q. An Optimization-Based Meta-Learning Model for MRI Reconstruction with Diverse Dataset. J Imaging 2021; 7:231. [PMID: 34821862 PMCID: PMC8621471 DOI: 10.3390/jimaging7110231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Revised: 10/26/2021] [Accepted: 10/28/2021] [Indexed: 11/16/2022] Open
Abstract
This work aims at developing a generalizable Magnetic Resonance Imaging (MRI) reconstruction method in the meta-learning framework. Specifically, we develop a deep reconstruction network induced by a learnable optimization algorithm (LOA) to solve the nonconvex nonsmooth variational model of MRI image reconstruction. In this model, the nonconvex nonsmooth regularization term is parameterized as a structured deep network where the network parameters can be learned from data. We partition these network parameters into two parts: a task-invariant part for the common feature encoder component of the regularization, and a task-specific part to account for the variations in the heterogeneous training and testing data. We train the regularization parameters in a bilevel optimization framework which significantly improves the robustness of the training process and the generalization ability of the network. We conduct a series of numerical experiments using heterogeneous MRI data sets with various undersampling patterns, ratios, and acquisition settings. The experimental results show that our network yields greatly improved reconstruction quality over existing methods and can generalize well to new reconstruction problems whose undersampling patterns/trajectories are not present during training.
Collapse
Affiliation(s)
- Wanyu Bian
- Department of Mathematics, University of Florida, Gainesville, FL 32611, USA; (Y.C.); (Q.Z.)
| | - Yunmei Chen
- Department of Mathematics, University of Florida, Gainesville, FL 32611, USA; (Y.C.); (Q.Z.)
| | - Xiaojing Ye
- Department of Mathematics and Statistics, Georgia State University, Atlanta, GA 30303, USA;
| | - Qingchao Zhang
- Department of Mathematics, University of Florida, Gainesville, FL 32611, USA; (Y.C.); (Q.Z.)
| |
Collapse
|
16
|
Gideon J, McInnis MG, Provost EM. Improving Cross-Corpus Speech Emotion Recognition with Adversarial Discriminative Domain Generalization (ADDoG). IEEE Trans Affect Comput 2021; 12:1055-1068. [PMID: 35695825 PMCID: PMC9173710 DOI: 10.1109/taffc.2019.2916092] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Automatic speech emotion recognition provides computers with critical context to enable user understanding. While methods trained and tested within the same dataset have been shown successful, they often fail when applied to unseen datasets. To address this, recent work has focused on adversarial methods to find more generalized representations of emotional speech. However, many of these methods have issues converging, and only involve datasets collected in laboratory conditions. In this paper, we introduce Adversarial Discriminative Domain Generalization (ADDoG), which follows an easier to train "meet in the middle" approach. The model iteratively moves representations learned for each dataset closer to one another, improving cross-dataset generalization. We also introduce Multiclass ADDoG, or MADDoG, which is able to extend the proposed method to more than two datasets, simultaneously. Our results show consistent convergence for the introduced methods, with significantly improved results when not using labels from the target dataset. We also show how, in most cases, ADDoG and MADDoG can be used to improve upon baseline state-of-the-art methods when target dataset labels are added and in-the-wild data are considered. Even though our experiments focus on cross-corpus speech emotion, these methods could be used to remove unwanted factors of variation in other settings.
Collapse
|
17
|
Hagad JL, Kimura T, Fukui KI, Numao M. Learning Subject-Generalized Topographical EEG Embeddings Using Deep Variational Autoencoders and Domain-Adversarial Regularization. Sensors (Basel) 2021; 21:1792. [PMID: 33806712 PMCID: PMC7961341 DOI: 10.3390/s21051792] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2021] [Revised: 02/21/2021] [Accepted: 03/02/2021] [Indexed: 02/07/2023]
Abstract
Two of the biggest challenges in building models for detecting emotions from electroencephalography (EEG) devices are the relatively small amount of labeled samples and the strong variability of signal feature distributions between different subjects. In this study, we propose a context-generalized model that tackles the data constraints and subject variability simultaneously using a deep neural network architecture optimized for normally distributed subject-independent feature embeddings. Variational autoencoders (VAEs) at the input level allow the lower feature layers of the model to be trained on both labeled and unlabeled samples, maximizing the use of the limited data resources. Meanwhile, variational regularization encourages the model to learn Gaussian-distributed feature embeddings, resulting in robustness to small dataset imbalances. Subject-adversarial regularization applied to the bi-lateral features further enforces subject-independence on the final feature embedding used for emotion classification. The results from subject-independent performance experiments on the SEED and DEAP EEG-emotion datasets show that our model generalizes better across subjects than other state-of-the-art feature embeddings when paired with deep learning classifiers. Furthermore, qualitative analysis of the embedding space reveals that our proposed subject-invariant bi-lateral variational domain adversarial neural network (BiVDANN) architecture may improve the subject-independent performance by discovering normally distributed features.
Collapse
Affiliation(s)
- Juan Lorenzo Hagad
- Graduate School of Information Science and Technology, Osaka University, Suita, Osaka 565-0871, Japan
- Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka 567-0047, Japan; (T.K.); (K.-i.F.); (M.N.)
| | - Tsukasa Kimura
- Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka 567-0047, Japan; (T.K.); (K.-i.F.); (M.N.)
| | - Ken-ichi Fukui
- Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka 567-0047, Japan; (T.K.); (K.-i.F.); (M.N.)
| | - Masayuki Numao
- Institute of Scientific and Industrial Research, Osaka University, Ibaraki, Osaka 567-0047, Japan; (T.K.); (K.-i.F.); (M.N.)
| |
Collapse
|
18
|
Ma J, Wang Y, An X, Ge C, Yu Z, Chen J, Zhu Q, Dong G, He J, He Z, Cao T, Zhu Y, Nie Z, Yang X. Toward data-efficient learning: A benchmark for COVID-19 CT lung and infection segmentation. Med Phys 2021; 48:1197-1210. [PMID: 33354790 DOI: 10.1002/mp.14676] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 12/01/2020] [Accepted: 12/01/2020] [Indexed: 12/11/2022] Open
Abstract
PURPOSE Accurate segmentation of lung and infection in COVID-19 computed tomography (CT) scans plays an important role in the quantitative management of patients. Most of the existing studies are based on large and private annotated datasets that are impractical to obtain from a single institution, especially when radiologists are busy fighting the coronavirus disease. Furthermore, it is hard to compare current COVID-19 CT segmentation methods as they are developed on different datasets, trained in different settings, and evaluated with different metrics. METHODS To promote the development of data-efficient deep learning methods, in this paper, we built three benchmarks for lung and infection segmentation based on 70 annotated COVID-19 cases, which contain current active research areas, for example, few-shot learning, domain generalization, and knowledge transfer. For a fair comparison among different segmentation methods, we also provide standard training, validation and testing splits, evaluation metrics and, the corresponding code. RESULTS Based on the state-of-the-art network, we provide more than 40 pretrained baseline models, which not only serve as out-of-the-box segmentation tools but also save computational time for researchers who are interested in COVID-19 lung and infection segmentation. We achieve average dice similarity coefficient (DSC) scores of 97.3%, 97.7%, and 67.3% and average normalized surface dice (NSD) scores of 90.6%, 91.4%, and 70.0% for left lung, right lung, and infection, respectively. CONCLUSIONS To the best of our knowledge, this work presents the first data-efficient learning benchmark for medical image segmentation, and the largest number of pretrained models up to now. All these resources are publicly available, and our work lays the foundation for promoting the development of deep learning methods for efficient COVID-19 CT segmentation with limited data.
Collapse
Affiliation(s)
- Jun Ma
- Department of Mathematics, Nanjing University of Science and Technology, Nanjing, 210094, P. R. China
| | - Yixin Wang
- Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, 100190, P. R. China
| | - Xingle An
- China Electronics Cloud Brain (Tianjin) Technology CO., Ltd, Tianjin, 300309, P. R. China
| | - Cheng Ge
- Institute of Bioinformatics and Medical Engineering, Jiangsu University of Technology, Changzhou, 213001, P. R. China
| | - Ziqi Yu
- Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, 200433, P. R. China
| | - Jianan Chen
- Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Qiongjie Zhu
- Department of Radiology, Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School, Nanjing, 210008, P. R. China
| | - Guoqiang Dong
- Department of Radiology, Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School, Nanjing, 210008, P. R. China
| | - Jian He
- Department of Radiology, Nanjing Drum Tower Hospital, the Affiliated Hospital of Nanjing University Medical School, Nanjing, 210008, P. R. China
| | | | - Tianjia Cao
- China Electronics Cloud Brain (Tianjin) Technology CO., Ltd, Tianjin, 300309, P. R. China
| | - Yuntao Zhu
- Department of Mathematics, Nanjing University, Nanjing, 210093, P. R. China
| | - Ziwei Nie
- Department of Mathematics, Nanjing University, Nanjing, 210093, P. R. China
| | - Xiaoping Yang
- Department of Mathematics, Nanjing University, Nanjing, 210093, P. R. China
| |
Collapse
|
19
|
Zhang L, Wang X, Yang D, Sanford T, Harmon S, Turkbey B, Wood BJ, Roth H, Myronenko A, Xu D, Xu Z. Generalizing Deep Learning for Medical Image Segmentation to Unseen Domains via Deep Stacked Transformation. IEEE Trans Med Imaging 2020; 39:2531-2540. [PMID: 32070947 PMCID: PMC7393676 DOI: 10.1109/tmi.2020.2973595] [Citation(s) in RCA: 98] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Recent advances in deep learning for medical image segmentation demonstrate expert-level accuracy. However, application of these models in clinically realistic environments can result in poor generalization and decreased accuracy, mainly due to the domain shift across different hospitals, scanner vendors, imaging protocols, and patient populations etc. Common transfer learning and domain adaptation techniques are proposed to address this bottleneck. However, these solutions require data (and annotations) from the target domain to retrain the model, and is therefore restrictive in practice for widespread model deployment. Ideally, we wish to have a trained (locked) model that can work uniformly well across unseen domains without further training. In this paper, we propose a deep stacked transformation approach for domain generalization. Specifically, a series of n stacked transformations are applied to each image during network training. The underlying assumption is that the "expected" domain shift for a specific medical imaging modality could be simulated by applying extensive data augmentation on a single source domain, and consequently, a deep model trained on the augmented "big" data (BigAug) could generalize well on unseen domains. We exploit four surprisingly effective, but previously understudied, image-based characteristics for data augmentation to overcome the domain generalization problem. We train and evaluate the BigAug model (with n=9 transformations) on three different 3D segmentation tasks (prostate gland, left atrial, left ventricle) covering two medical imaging modalities (MRI and ultrasound) involving eight publicly available challenge datasets. The results show that when training on relatively small dataset (n = 10~32 volumes, depending on the size of the available datasets) from a single source domain: (i) BigAug models degrade an average of 11%(Dice score change) from source to unseen domain, substantially better than conventional augmentation (degrading 39%) and CycleGAN-based domain adaptation method (degrading 25%), (ii) BigAug is better than "shallower" stacked transforms (i.e. those with fewer transforms) on unseen domains and demonstrates modest improvement to conventional augmentation on the source domain, (iii) after training with BigAug on one source domain, performance on an unseen domain is similar to training a model from scratch on that domain when using the same number of training samples. When training on large datasets (n = 465 volumes) with BigAug, (iv) application to unseen domains reaches the performance of state-of-the-art fully supervised models that are trained and tested on their source domains. These findings establish a strong benchmark for the study of domain generalization in medical imaging, and can be generalized to the design of highly robust deep segmentation models for clinical deployment.
Collapse
|