1
|
Yan C, Yan H, Liang W, Yin M, Luo H, Luo J. DP-SSLoRA: A privacy-preserving medical classification model combining differential privacy with self-supervised low-rank adaptation. Comput Biol Med 2024; 179:108792. [PMID: 38964242 DOI: 10.1016/j.compbiomed.2024.108792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 05/31/2024] [Accepted: 06/18/2024] [Indexed: 07/06/2024]
Abstract
BACKGROUND AND OBJECTIVE Concerns about patient privacy issues have limited the application of medical deep learning models in certain real-world scenarios. Differential privacy (DP) can alleviate this problem by injecting random noise into the model. However, naively applying DP to medical models will not achieve a satisfactory balance between privacy and utility due to the high dimensionality of medical models and the limited labeled samples. METHODS This work proposed the DP-SSLoRA model, a privacy-preserving classification model for medical images combining differential privacy with self-supervised low-rank adaptation. In this work, a self-supervised pre-training method is used to obtain enhanced representations from unlabeled publicly available medical data. Then, a low-rank decomposition method is employed to mitigate the impact of differentially private noise and combined with pre-trained features to conduct the classification task on private datasets. RESULTS In the classification experiments using three real chest-X ray datasets, DP-SSLoRA achieves good performance with strong privacy guarantees. Under the premise of ɛ=2, with the AUC of 0.942 in RSNA, the AUC of 0.9658 in Covid-QU-mini, and the AUC of 0.9886 in Chest X-ray 15k. CONCLUSION Extensive experiments on real chest X-ray datasets show that DP-SSLoRA can achieve satisfactory performance with stronger privacy guarantees. This study provides guidance for studying privacy-preserving in the medical field. Source code is publicly available online. https://github.com/oneheartforone/DP-SSLoRA.
Collapse
Affiliation(s)
- Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, 475004, Henan, China; Academy for Advanced Interdisciplinary Studies, Henan University, Kaifeng, 475004, Henan, China; Henan Engineering Research Center of Intelligent Technology and Application, Henan University, Kaifeng, 475004, Henan, China
| | - Haicao Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, 475004, Henan, China
| | - Wenjuan Liang
- School of Computer and Information Engineering, Henan University, Kaifeng, 475004, Henan, China; Academy for Advanced Interdisciplinary Studies, Henan University, Kaifeng, 475004, Henan, China; Henan Engineering Research Center of Intelligent Technology and Application, Henan University, Kaifeng, 475004, Henan, China.
| | - Menghan Yin
- School of Computer and Information Engineering, Henan University, Kaifeng, 475004, Henan, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, 475004, Henan, China; Academy for Advanced Interdisciplinary Studies, Henan University, Kaifeng, 475004, Henan, China; Henan Engineering Research Center of Intelligent Technology and Application, Henan University, Kaifeng, 475004, Henan, China
| | - Junwei Luo
- School of Software, Henan Polytecgnic University, Jiaozuo, 454000, Henan, China
| |
Collapse
|
2
|
Sariyanidi E, Zampella CJ, Schultz RT, Tunc B. Inequality-Constrained 3D Morphable Face Model Fitting. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:1305-1318. [PMID: 38015704 PMCID: PMC10823595 DOI: 10.1109/tpami.2023.3334948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
3D morphable model (3DMM) fitting on 2D data is traditionally done via unconstrained optimization with regularization terms to ensure that the result is a plausible face shape and is consistent with a set of 2D landmarks. This paper presents inequality-constrained 3DMM fitting as the first alternative to regularization in optimization-based 3DMM fitting. Inequality constraints on the 3DMM's shape coefficients ensure face-like shapes without modifying the objective function for smoothness, thus allowing for more flexibility to capture person-specific shape details. Moreover, inequality constraints on landmarks increase robustness in a way that does not require per-image tuning. We show that the proposed method stands out with its ability to estimate person-specific face shapes by jointly fitting a 3DMM to multiple frames of a person. Further, when used with a robust objective function, namely gradient correlation, the method can work "in-the-wild" even with a 3DMM constructed from controlled data. Lastly, we show how to use the log-barrier method to efficiently implement the method. To our knowledge, we present the first 3DMM fitting framework that requires no learning yet is accurate, robust, and efficient. The absence of learning enables a generic solution that allows flexibility in the input image size, interchangeable morphable models, and incorporation of camera matrix.
Collapse
|
3
|
Zhu H, Yang H, Guo L, Zhang Y, Wang Y, Huang M, Wu M, Shen Q, Yang R, Cao X. FaceScape: 3D Facial Dataset and Benchmark for Single-View 3D Face Reconstruction. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14528-14545. [PMID: 37607140 DOI: 10.1109/tpami.2023.3307338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]
Abstract
In this article, we present a large-scale detailed 3D face dataset, FaceScape, and the corresponding benchmark to evaluate single-view facial 3D reconstruction. By training on FaceScape data, a novel algorithm is proposed to predict elaborate riggable 3D face models from a single image input. FaceScape dataset releases 16,940 textured 3D faces, captured from 847 subjects and each with 20 specific expressions. The 3D models contain the pore-level facial geometry that is also processed to be topologically uniform. These fine 3D facial models can be represented as a 3D morphable model for coarse shapes and displacement maps for detailed geometry. Taking advantage of the large-scale and high-accuracy dataset, a novel algorithm is further proposed to learn the expression-specific dynamic details using a deep neural network. The learned relationship serves as the foundation of our 3D face prediction system from a single image input. Different from most previous methods, our predicted 3D models are riggable with highly detailed geometry under different expressions. We also use FaceScape data to generate the in-the-wild and in-the-lab benchmark to evaluate recent methods of single-view face reconstruction. The accuracy is reported and analyzed on the dimensions of camera pose and focal length, which provides a faithful and comprehensive evaluation and reveals new challenges. The unprecedented dataset, benchmark, and code have been released to the public for research purpose.
Collapse
|
4
|
Doukas MC, Ververas E, Sharmanska V, Zafeiriou S. Free-HeadGAN: Neural Talking Head Synthesis With Explicit Gaze Control. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:9743-9756. [PMID: 37028333 DOI: 10.1109/tpami.2023.3253243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
We present Free-HeadGAN, a person-generic neural talking head synthesis system. We show that modeling faces with sparse 3D facial landmarks is sufficient for achieving state-of-the-art generative performance, without relying on strong statistical priors of the face, such as 3D Morphable Models. Apart from 3D pose and facial expressions, our method is capable of fully transferring the eye gaze, from a driving actor to a source identity. Our complete pipeline consists of three components: a canonical 3D key-point estimator that regresses 3D pose and expression-related deformations, a gaze estimation network and a generator that is built upon the architecture of HeadGAN. We further experiment with an extension of our generator to accommodate few-shot learning using an attention mechanism, in case multiple source images are available. Compared to recent methods for reenactment and motion transfer, our system achieves higher photo-realism combined with superior identity preservation, while offering explicit gaze control.
Collapse
|
5
|
Fujino S, Iwanaga T. Real-time wrinkle evaluation method using Visual Illusion-based image feature enhancement System. Skin Res Technol 2023; 29:e13206. [PMID: 36382793 PMCID: PMC9838642 DOI: 10.1111/srt.13206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Accepted: 08/31/2022] [Indexed: 11/17/2022]
Abstract
BACKGROUND Several advanced methods for evaluating wrinkles are currently available, however, with limitations in their application because wrinkle structures change in response to facial expressions and surrounding environments. A Visual Illusion-based image feature enhancement System (VIS) was used to develop a real-time evaluation method. OBJECTIVES This study expands the VIS application into the wrinkle evaluation method by adjusting VIS to evaluate facial wrinkles, evaluating the age-dependent wrinkles, and validating it for real-time wrinkle evaluation. METHODS Wrinkles in various Japanese men and women were evaluated using VIS and the current methods. Furthermore, the effectiveness of an eye cream containing niacinamide was evaluated before and after the 4-week treatment. RESULTS VIS qualitatively detects even fine wrinkles and numerically records them without any special instrument. Moreover, VIS can be applied to moving images, revealing the effectiveness of the antiwrinkle formulation qualitatively and quantitatively even when the subjects are smiling. CONCLUSION This paper presents an epoch-making wrinkle evaluation method that is qualitative and quantitative, with high sensitivity in real-time and relies solely on digital images without any difficulties. Therefore, these results imply that this method enables the wrinkle evaluation under real-life conditions.
Collapse
Affiliation(s)
- Saori Fujino
- Beauty Care Laboratory, Kracie Home Products, Ltd., Yokohama, Japan
| | - Tetsuro Iwanaga
- Beauty Care Laboratory, Kracie Home Products, Ltd., Yokohama, Japan
| |
Collapse
|
6
|
One-shot many-to-many facial reenactment using Bi-Layer Graph Convolutional Networks. Neural Netw 2022; 156:193-204. [DOI: 10.1016/j.neunet.2022.09.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Revised: 08/27/2022] [Accepted: 09/27/2022] [Indexed: 11/05/2022]
|
7
|
Facial Kinship Verification: A Comprehensive Review and Outlook. Int J Comput Vis 2022; 130:1494-1525. [PMID: 35465628 PMCID: PMC9016696 DOI: 10.1007/s11263-022-01605-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 02/26/2022] [Indexed: 11/05/2022]
Abstract
AbstractThe goal of Facial Kinship Verification (FKV) is to automatically determine whether two individuals have a kin relationship or not from their given facial images or videos. It is an emerging and challenging problem that has attracted increasing attention due to its practical applications. Over the past decade, significant progress has been achieved in this new field. Handcrafted features and deep learning techniques have been widely studied in FKV. The goal of this paper is to conduct a comprehensive review of the problem of FKV. We cover different aspects of the research, including problem definition, challenges, applications, benchmark datasets, a taxonomy of existing methods, and state-of-the-art performance. In retrospect of what has been achieved so far, we identify gaps in current research and discuss potential future research directions.
Collapse
|
8
|
Wu X, Zhang Q, Wu Y, Wang H, Li S, Sun L, Li X. F³A-GAN: Facial Flow for Face Animation With Generative Adversarial Networks. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8658-8670. [PMID: 34554912 DOI: 10.1109/tip.2021.3112059] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Formulated as a conditional generation problem, face animation aims at synthesizing continuous face images from a single source image driven by a set of conditional face motion. Previous works mainly model the face motion as conditions with 1D or 2D representation (e.g., action units, emotion codes, landmark), which often leads to low-quality results in some complicated scenarios such as continuous generation and large-pose transformation. To tackle this problem, the conditions are supposed to meet two requirements, i.e., motion information preserving and geometric continuity. To this end, we propose a novel representation based on a 3D geometric flow, termed facial flow, to represent the natural motion of the human face at any pose. Compared with other previous conditions, the proposed facial flow well controls the continuous changes to the face. After that, in order to utilize the facial flow for face editing, we build a synthesis framework generating continuous images with conditional facial flows. To fully take advantage of the motion information of facial flows, a hierarchical conditional framework is designed to combine the extracted multi-scale appearance features from images and motion features from flows in a hierarchical manner. The framework then decodes multiple fused features back to images progressively. Experimental results demonstrate the effectiveness of our method compared to other state-of-the-art methods.
Collapse
|
9
|
Abstract
AbstractStandard registration algorithms need to be independently applied to each surface to register, following careful pre-processing and hand-tuning. Recently, learning-based approaches have emerged that reduce the registration of new scans to running inference with a previously-trained model. The potential benefits are multifold: inference is typically orders of magnitude faster than solving a new instance of a difficult optimization problem, deep learning models can be made robust to noise and corruption, and the trained model may be re-used for other tasks, e.g. through transfer learning. In this paper, we cast the registration task as a surface-to-surface translation problem, and design a model to reliably capture the latent geometric information directly from raw 3D face scans. We introduce Shape-My-Face (SMF), a powerful encoder-decoder architecture based on an improved point cloud encoder, a novel visual attention mechanism, graph convolutional decoders with skip connections, and a specialized mouth model that we smoothly integrate with the mesh convolutions. Compared to the previous state-of-the-art learning algorithms for non-rigid registration of face scans, SMF only requires the raw data to be rigidly aligned (with scaling) with a pre-defined face template. Additionally, our model provides topologically-sound meshes with minimal supervision, offers faster training time, has orders of magnitude fewer trainable parameters, is more robust to noise, and can generalize to previously unseen datasets. We extensively evaluate the quality of our registrations on diverse data. We demonstrate the robustness and generalizability of our model with in-the-wild face scans across different modalities, sensor types, and resolutions. Finally, we show that, by learning to register scans, SMF produces a hybrid linear and non-linear morphable model. Manipulation of the latent space of SMF allows for shape generation, and morphing applications such as expression transfer in-the-wild. We train SMF on a dataset of human faces comprising 9 large-scale databases on commodity hardware.
Collapse
|
10
|
Tran L, Liu X. On Learning 3D Face Morphable Model from In-the-Wild Images. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2021; 43:157-171. [PMID: 31329546 DOI: 10.1109/tpami.2019.2927975] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
As a classic statistical model of 3D facial shape and albedo, 3D Morphable Model (3DMM) is widely used in facial analysis, e.g., model fitting, image synthesis. Conventional 3DMM is learned from a set of 3D face scans with associated well-controlled 2D face images, and represented by two sets of PCA basis functions. Due to the type and amount of training data, as well as, the linear bases, the representation power of 3DMM can be limited. To address these problems, this paper proposes an innovative framework to learn a nonlinear 3DMM model from a large set of in-the-wild face images, without collecting 3D face scans. Specifically, given a face image as input, a network encoder estimates the projection, lighting, shape and albedo parameters. Two decoders serve as the nonlinear 3DMM to map from the shape and albedo parameters to the 3D shape and albedo, respectively. With the projection parameter, lighting, 3D shape, and albedo, a novel analytically-differentiable rendering layer is designed to reconstruct the original input face. The entire network is end-to-end trainable with only weak supervision. We demonstrate the superior representation power of our nonlinear 3DMM over its linear counterpart, and its contribution to face alignment, 3D reconstruction, and face editing. Source code and additional results can be found at our project page: http://cvlab.cse.msu.edu/project-nonlinear-3dmm.html.
Collapse
|
11
|
SL2E-AFRE : Personalized 3D face reconstruction using autoencoder with simultaneous subspace learning and landmark estimation. APPL INTELL 2020. [DOI: 10.1007/s10489-020-02000-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
12
|
Sariyanidi E, Zampella CJ, Schultz RT, Tunc B. Can Facial Pose and Expression Be Separated with Weak Perspective Camera? PROCEEDINGS. IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2020; 2020:7171-7180. [PMID: 32921968 DOI: 10.1109/cvpr42600.2020.00720] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Separating facial pose and expression within images requires a camera model for 3D-to-2D mapping. The weak perspective (WP) camera has been the most popular choice; it is the default, if not the only option, in state-of-the-art facial analysis methods and software. WP camera is justified by the supposition that its errors are negligible when the subjects are relatively far from the camera, yet this claim has never been tested despite nearly 20 years of research. This paper critically examines the suitability of WP camera for separating facial pose and expression. First, we theoretically show that WP causes pose-expression ambiguity, as it leads to estimation of spurious expressions. Next, we experimentally quantify the magnitude of spurious expressions. Finally, we test whether spurious expressions have detrimental effects on a common facial analysis application, namely Action Unit (AU) detection. Contrary to conventional wisdom, we find that severe pose-expression ambiguity exists even when subjects are not close to the camera, leading to large false positive rates in AU detection. We also demonstrate that the magnitude and characteristics of spurious expressions depend on the point distribution model used to model the expressions. Our results suggest that common assumptions about WP need to be revisited in facial expression modeling, and that facial analysis software should encourage and facilitate the use of the true camera model whenever possible.
Collapse
Affiliation(s)
| | | | - Robert T Schultz
- Center for Autism Research, Children's Hospital of Philadelphia.,University of Pennsylvania
| | - Birkan Tunc
- Center for Autism Research, Children's Hospital of Philadelphia.,University of Pennsylvania
| |
Collapse
|
13
|
Edirisinghe P, Kitulwatte I, Nadeera D. Knowledge, attitude and practice regarding the use of digital photographs in the examination of the dead and living among doctors practicing forensic medicine in Sri Lanka. J Forensic Leg Med 2020; 73:101995. [DOI: 10.1016/j.jflm.2020.101995] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Revised: 05/24/2020] [Accepted: 05/25/2020] [Indexed: 11/25/2022]
|
14
|
Abstract
AbstractImage-to-image (i2i) translation is the dense regression problem of learning how to transform an input image into an output using aligned image pairs. Remarkable progress has been made in i2i translation with the advent of deep convolutional neural networks and particular using the learning paradigm of generative adversarial networks (GANs). In the absence of paired images, i2i translation is tackled with one or multiple domain transformations (i.e., CycleGAN, StarGAN etc.). In this paper, we study the problem of image-to-image translation, under a set of continuous parameters that correspond to a model describing a physical process. In particular, we propose the SliderGAN which transforms an input face image into a new one according to the continuous values of a statistical blendshape model of facial motion. We show that it is possible to edit a facial image according to expression and speech blendshapes, using sliders that control the continuous values of the blendshape model. This provides much more flexibility in various tasks, including but not limited to face editing, expression transfer and face neutralisation, comparing to models based on discrete expressions or action units.
Collapse
|
15
|
|
16
|
Xue N, Deng J, Cheng S, Panagakis Y, Zafeiriou S. Side Information for Face Completion: A Robust PCA Approach. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 41:2349-2364. [PMID: 30843800 DOI: 10.1109/tpami.2019.2902556] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Robust principal component analysis (RPCA) is a powerful method for learning low-rank feature representation of various visual data. However, for certain types as well as significant amount of error corruption, it fails to yield satisfactory results; a drawback that can be alleviated by exploiting domain-dependent prior knowledge or information. In this paper, we propose two models for the RPCA that take into account such side information, even in the presence of missing values. We apply this framework to the task of UV completion which is widely used in pose-invariant face recognition. Moreover, we construct a generative adversarial network (GAN) to extract side information as well as subspaces. These subspaces not only assist in the recovery but also speed up the process in case of large-scale data. We quantitatively and qualitatively evaluate the proposed approaches through both synthetic data and eight real-world datasets to verify their effectiveness.
Collapse
|