1
|
Colleoni E, Sanchez Matilla R, Luengo I, Stoyanov D. Guided image generation for improved surgical image segmentation. Med Image Anal 2024; 97:103263. [PMID: 39013205 DOI: 10.1016/j.media.2024.103263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 05/30/2024] [Accepted: 06/27/2024] [Indexed: 07/18/2024]
Abstract
The lack of large datasets and high-quality annotated data often limits the development of accurate and robust machine-learning models within the medical and surgical domains. In the machine learning community, generative models have recently demonstrated that it is possible to produce novel and diverse synthetic images that closely resemble reality while controlling their content with various types of annotations. However, generative models have not been yet fully explored in the surgical domain, partially due to the lack of large datasets and due to specific challenges present in the surgical domain such as the large anatomical diversity. We propose Surgery-GAN, a novel generative model that produces synthetic images from segmentation maps. Our architecture produces surgical images with improved quality when compared to early generative models thanks to the combination of channel- and pixel-level normalization layers that boost image quality while granting adherence to the input segmentation map. While state-of-the-art generative models often generate overfitted images, lacking diversity, or containing unrealistic artefacts such as cartooning; experiments demonstrate that Surgery-GAN is able to generate novel, realistic, and diverse surgical images in three different surgical datasets: cholecystectomy, partial nephrectomy, and radical prostatectomy. In addition, we investigate whether the use of synthetic images together with real ones can be used to improve the performance of other machine-learning models. Specifically, we use Surgery-GAN to generate large synthetic datasets which we then use to train five different segmentation models. Results demonstrate that using our synthetic images always improves the mean segmentation performance with respect to only using real images. For example, when considering radical prostatectomy, we can boost the mean segmentation performance by up to 5.43%. More interestingly, experimental results indicate that the performance improvement is larger in the set of classes that are under-represented in the training sets, where the performance boost of specific classes reaches up to 61.6%.
Collapse
Affiliation(s)
- Emanuele Colleoni
- Medtronic Digital Surgery, 230 City Rd, EC1V 2QY, London, United Kingdom; Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), 43-45 Foley St, W1W 7TY, London, United Kingdom.
| | - Ricardo Sanchez Matilla
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), 43-45 Foley St, W1W 7TY, London, United Kingdom
| | - Imanol Luengo
- Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), 43-45 Foley St, W1W 7TY, London, United Kingdom
| | - Danail Stoyanov
- Medtronic Digital Surgery, 230 City Rd, EC1V 2QY, London, United Kingdom; Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS), University College London (UCL), 43-45 Foley St, W1W 7TY, London, United Kingdom
| |
Collapse
|
2
|
Liu J, Xu S, He P, Wu S, Luo X, Deng Y, Huang H. VSG-GAN: A high-fidelity image synthesis method with semantic manipulation in retinal fundus image. Biophys J 2024; 123:2815-2829. [PMID: 38414236 PMCID: PMC11393672 DOI: 10.1016/j.bpj.2024.02.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 01/29/2024] [Accepted: 02/22/2024] [Indexed: 02/29/2024] Open
Abstract
In recent years, advancements in retinal image analysis, driven by machine learning and deep learning techniques, have enhanced disease detection and diagnosis through automated feature extraction. However, challenges persist, including limited data set diversity due to privacy concerns and imbalanced sample pairs, hindering effective model training. To address these issues, we introduce the vessel and style guided generative adversarial network (VSG-GAN), an innovative algorithm building upon the foundational concept of GAN. In VSG-GAN, a generator and discriminator engage in an adversarial process to produce realistic retinal images. Our approach decouples retinal image generation into distinct modules: the vascular skeleton and background style. Leveraging style transformation and GAN inversion, our proposed hierarchical variational autoencoder module generates retinal images with diverse morphological traits. In addition, the spatially adaptive denormalization module ensures consistency between input and generated images. We evaluate our model on MESSIDOR and RITE data sets using various metrics, including structural similarity index measure, inception score, Fréchet inception distance, and kernel inception distance. Our results demonstrate the superiority of VSG-GAN, outperforming existing methods across all evaluation assessments. This underscores its effectiveness in addressing data set limitations and imbalances. Our algorithm provides a novel solution to challenges in retinal image analysis by offering diverse and realistic retinal image generation. Implementing the VSG-GAN augmentation approach on downstream diabetic retinopathy classification tasks has shown enhanced disease diagnosis accuracy, further advancing the utility of machine learning in this domain.
Collapse
Affiliation(s)
- Junjie Liu
- Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, Zhuhai, China; BNU-HKBU United International College, Zhuhai, China; Faculty of Science, Hong Kong Baptist University, Hong Kong SAR, China; Trinity College Dublin, Dublin 2, Ireland
| | - Shixin Xu
- Data Science Research Center, Duke Kunshan University, Kunshan, Jiangsu, China
| | - Ping He
- Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, Zhuhai, China; BNU-HKBU United International College, Zhuhai, China
| | - Sirong Wu
- Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, Zhuhai, China; BNU-HKBU United International College, Zhuhai, China; Faculty of Science, Hong Kong Baptist University, Hong Kong SAR, China
| | - Xi Luo
- Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, Zhuhai, China; BNU-HKBU United International College, Zhuhai, China; Faculty of Science, Hong Kong Baptist University, Hong Kong SAR, China
| | - Yuhui Deng
- Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, Zhuhai, China; BNU-HKBU United International College, Zhuhai, China.
| | - Huaxiong Huang
- Research Center for Mathematics, Beijing Normal University, Zhuhai, China; Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, Zhuhai, China; Department of Mathematics and Statistics, York University, Toronto, ON, Canada.
| |
Collapse
|
3
|
Cai J, Duan Q, Long M, Zhang LB, Ding X. Feature Interaction-Based Face De-Morphing Factor Prediction for Restoring Accomplice's Facial Image. SENSORS (BASEL, SWITZERLAND) 2024; 24:5504. [PMID: 39275415 PMCID: PMC11398201 DOI: 10.3390/s24175504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 08/22/2024] [Accepted: 08/23/2024] [Indexed: 09/16/2024]
Abstract
Face morphing attacks disrupt the essential correlation between a face image and its identity information, posing a significant challenge to face recognition systems. Despite advancements in face morphing attack detection methods, these techniques cannot reconstruct the face images of accomplices. Existing deep learning-based face de-morphing techniques have mainly focused on identity disentanglement, overlooking the morphing factors inherent in the morphed images. This paper introduces a novel face de-morphing method to restore the identity information of accomplices by predicting the corresponding de-morphing factor. To obtain reasonable de-morphing factors, a channel-wise attention mechanism is employed to perform feature interaction, and the correlation between the morphed image and the real-time captured reference image is integrated to promote the prediction of the de-morphing factor. Furthermore, the identity information of the accomplice is restored by mapping the morphed and reference images into the StyleGAN latent space and performing inverse linear interpolation using the predicted de-morphing factor. Experimental results demonstrate the superiority of this method in restoring accomplice facial images, achieving improved restoration accuracy and image quality compared to existing techniques.
Collapse
Affiliation(s)
- Juan Cai
- School of Physics Electronics and Intelligent Manufacturing, Huaihua University, Huaihua 418000, China
| | - Qiangqiang Duan
- School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, China
| | - Min Long
- School of Electronics and Communication Engineering, Guangzhou University, Guangzhou 511370, China
| | - Le-Bing Zhang
- School of Computer and Artificial Intelligence, Huaihua University, Huaihua 418000, China
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
| | - Xiangling Ding
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China
| |
Collapse
|
4
|
Wang Z, Yang Y, Chen Y, Yuan T, Sermesant M, Delingette H, Wu O. Mutual Information Guided Diffusion for Zero-Shot Cross-Modality Medical Image Translation. IEEE TRANSACTIONS ON MEDICAL IMAGING 2024; 43:2825-2838. [PMID: 38551825 PMCID: PMC11580158 DOI: 10.1109/tmi.2024.3382043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/02/2024]
Abstract
Cross-modality data translation has attracted great interest in medical image computing. Deep generative models show performance improvement in addressing related challenges. Nevertheless, as a fundamental challenge in image translation, the problem of zero-shot learning cross-modality image translation with fidelity remains unanswered. To bridge this gap, we propose a novel unsupervised zero-shot learning method called Mutual Information guided Diffusion Model, which learns to translate an unseen source image to the target modality by leveraging the inherent statistical consistency of Mutual Information between different modalities. To overcome the prohibitive high dimensional Mutual Information calculation, we propose a differentiable local-wise mutual information layer for conditioning the iterative denoising process. The Local-wise-Mutual-Information-Layer captures identical cross-modality features in the statistical domain, offering diffusion guidance without relying on direct mappings between the source and target domains. This advantage allows our method to adapt to changing source domains without the need for retraining, making it highly practical when sufficient labeled source domain data is not available. We demonstrate the superior performance of MIDiffusion in zero-shot cross-modality translation tasks through empirical comparisons with other generative models, including adversarial-based and diffusion-based models. Finally, we showcase the real-world application of MIDiffusion in 3D zero-shot learning-based cross-modality image segmentation tasks.
Collapse
|
5
|
Melnik A, Miasayedzenkau M, Makaravets D, Pirshtuk D, Akbulut E, Holzmann D, Renusch T, Reichert G, Ritter H. Face Generation and Editing With StyleGAN: A Survey. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:3557-3576. [PMID: 38224501 DOI: 10.1109/tpami.2024.3350004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2024]
Abstract
Our goal with this survey is to provide an overview of the state of the art deep learning methods for face generation and editing using StyleGAN. The survey covers the evolution of StyleGAN, from PGGAN to StyleGAN3, and explores relevant topics such as suitable metrics for training, different latent representations, GAN inversion to latent spaces of StyleGAN, face image editing, cross-domain face stylization, face restoration, and even Deepfake applications. We aim to provide an entry point into the field for readers that have basic knowledge about the field of deep learning and are looking for an accessible introduction and overview.
Collapse
|
6
|
Zheng X, Jing B, Zhao Z, Wang R, Zhang X, Chen H, Wu S, Sun Y, Zhang J, Wu H, Huang D, Zhu W, Chen J, Cao Q, Zeng H, Duan J, Luo Y, Li Z, Lin W, Nie R, Deng Y, Yun J, Li C, Xie D, Cai M. An interpretable deep learning model for identifying the morphological characteristics of dMMR/MSI-H gastric cancer. iScience 2024; 27:109243. [PMID: 38420592 PMCID: PMC10901137 DOI: 10.1016/j.isci.2024.109243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 12/29/2023] [Accepted: 02/12/2024] [Indexed: 03/02/2024] Open
Abstract
Accurate tumor diagnosis by pathologists relies on identifying specific morphological characteristics. However, summarizing these unique morphological features in tumor classifications can be challenging. Although deep learning models have been extensively studied for tumor classification, their indirect and subjective interpretation obstructs pathologists from comprehending the model and discerning the morphological features accountable for classifications. In this study, we introduce a new approach utilizing Style Generative Adversarial Networks, which enables a direct interpretation of deep learning models to detect significant morphological characteristics within datasets representing patients with deficient mismatch repair/microsatellite instability-high gastric cancer. Our approach effectively identifies distinct morphological features crucial for tumor classification, offering valuable insights for pathologists to enhance diagnostic accuracy and foster professional growth.
Collapse
Affiliation(s)
- Xueyi Zheng
- Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Bingzhong Jing
- Artificial Intelligence Laboratory, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Zihan Zhao
- Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Ruixuan Wang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
| | - Xinke Zhang
- Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Haohua Chen
- Artificial Intelligence Laboratory, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Shuyang Wu
- Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Yan Sun
- Department of Pathology, Tianjin Medical University Cancer Institute and Hospital, Tianjin 300000, China
| | - Jiangyu Zhang
- Department of Pathology, Affiliated Cancer Hospital & Institute of Guangzhou Medical University, Guangzhou 510095, China
| | - Hongmei Wu
- Department of Pathology, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou 510080, China
| | - Dan Huang
- Department of Pathology, Fudan University Shanghai Cancer Center, Shanghai 200032, China
| | - Wenbiao Zhu
- Department of Pathology, Shantou University medical college Meizhou clinical school, Meizhou People's Hospital, Meizhou 514011, China
| | - Jianning Chen
- Department of Pathology, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou 510635, China
| | - Qinghua Cao
- Department of Pathology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, China
| | - Hong Zeng
- Department of Pathology, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou 510120, China
| | - Jinling Duan
- Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Yuanliang Luo
- Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Zhicheng Li
- Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Wuhao Lin
- Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Runcong Nie
- Department of Department of Gastric Surgery, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Yishu Deng
- Artificial Intelligence Laboratory, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Jingping Yun
- Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Chaofeng Li
- Artificial Intelligence Laboratory, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Dan Xie
- Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| | - Muyan Cai
- Department of Pathology, State Key Laboratory of Oncology in South China, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, Guangzhou 510060, China
| |
Collapse
|
7
|
Peng C, Zhang C, Liu D, Wang N, Gao X. HiFiSketch: High Fidelity Face Photo-Sketch Synthesis and Manipulation. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:5865-5876. [PMID: 37889808 DOI: 10.1109/tip.2023.3326680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]
Abstract
With the rapid development of generative adversarial networks, face photo-sketch synthesis has achieved promising performance and playing an increasingly important role in law enforcement as well as entertainment. However, most of the existing methods only work under the condition of no interference, and lack of generalization ability in wild scenes. The fidelity of the images generated by the existing methods are insufficient, and the manipulation ability according to text description is unavailable. Directly applying existing text-based image manipulation methods on face photo-sketch scenario may lead to severe distortions due to the cross-domain challenges. Therefore, we propose a novel cross-domain face photo-sketch synthesis framework named HiFiSketch, a network that learns to adjust the weights of generators for high-fidelity synthesis and manipulation. It can realize the translation of images between the photo domain and the sketch domain, and modify results according to the text input in the meanwhile. We further propose a cross-domain loss function, which can effectively preserve facial details during face photo-sketch synthesis. Extensive experiments on four public face sketch datasets show the superiority of our method compared to existing methods. We further present text-based face photo-sketch manipulation and sequential face photo-sketch manipulation for the first time to demonstrate the effectiveness of our method on high fidelity face photo-sketch synthesis and manipulation.
Collapse
|
8
|
Huang W, Tu S, Xu L. IA-FaceS: A bidirectional method for semantic face editing. Neural Netw 2023; 158:272-292. [PMID: 36481459 DOI: 10.1016/j.neunet.2022.11.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 09/12/2022] [Accepted: 11/09/2022] [Indexed: 11/23/2022]
Abstract
Semantic face editing has achieved substantial progress in recent years. However, existing face editing methods, which often encode the entire image into a single code, still have difficulty in enabling flexible editing while keeping high-fidelity reconstruction. The one-code scheme also brings entangled face manipulations and limited flexibility in editing face components. In this paper, we present IA-FaceS, a bidirectional method for disentangled face attribute manipulation as well as flexible, controllable component editing. We propose to embed images onto two branches: one branch computes high-dimensional component-invariant content embedding for capturing face details, and the other provides low-dimensional component-specific embeddings for component manipulations. The two-branch scheme naturally enables high-quality facial component-level editing while keeping faithful reconstruction with details. Moreover, we devise a component adaptive modulation (CAM) module, which integrates component-specific guidance into the decoder and successfully disentangles highly-correlated face components. The single-eye editing is developed for the first time without editing face masks or sketches. According to the experimental results, IA-FaceS establishes a good balance between maintaining image details and performing flexible face manipulation. Both quantitative and qualitative results indicate that the proposed method outperforms the existing methods in reconstruction, face attribute manipulation, and component transfer. We release the code and weights at: https://github.com/CMACH508/IA-FaceS.
Collapse
Affiliation(s)
- Wenjing Huang
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Shikui Tu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.
| | - Lei Xu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|