1
|
Wang R, Wu XJ, Chen Z, Hu C, Kittler J. SPD Manifold Deep Metric Learning for Image Set Classification. IEEE Trans Neural Netw Learn Syst 2024; PP:1-15. [PMID: 38470600 DOI: 10.1109/tnnls.2022.3216811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
Abstract
By characterizing each image set as a nonsingular covariance matrix on the symmetric positive definite (SPD) manifold, the approaches of visual content classification with image sets have made impressive progress. However, the key challenge of unhelpfully large intraclass variability and interclass similarity of representations remains open to date. Although, several recent studies have mitigated the two problems by jointly learning the embedding mapping and the similarity metric on the original SPD manifold, their inherent shallow and linear feature transformation mechanism are not powerful enough to capture useful geometric features, especially in complex scenarios. To this end, this article explores a novel approach, termed SPD manifold deep metric learning (SMDML), for image set classification. Specifically, SMDML first selects a prevailing SPD manifold neural network (SPDNet) as the backbone (encoder) to derive an SPD matrix nonlinear representation. To counteract the degradation of structural information during multistage feature embedding, we construct a Riemannian decoder at the end of the encoder, trained by a reconstruction error term (RT), to induce the generated low-dimensional feature manifold of the hidden layer to capture the pivotal information about the visual data describing the imaged scene. We demonstrate through theory and experiments that it is feasible to replace the Riemannian metric with Euclidean distance in RT. Then, the ReCov layer is introduced into the established Riemannian network to regularize the local statistical information within each input feature matrix, which enhances the effectiveness of the learning process. The theoretical analysis of the activation function used in the ReCov layer in terms of continuity and conditions for generating positive definite matrices is beneficial for network design. Inspired by the fact that the single cross-entropy loss used for training is unable to effectively parse the geometric distribution of the deep representations, we finally endow the suggested model with a novel metric learning regularization term. By explicitly incorporating the encoding and processing of the data variations into the network learning process, this term can not only derive a powerful Riemannian representation but also train an effective classifier. The experimental results show the superiority of the proposed approach on three typical visual classification tasks.
Collapse
|
2
|
Akbari A, Awais M, Fatemifar S, Khalid SS, Kittler J. RAgE: Robust Age Estimation Through Subject Anchoring With Consistency Regularisation. IEEE Trans Pattern Anal Mach Intell 2024; 46:1603-1617. [PMID: 35767502 DOI: 10.1109/tpami.2022.3187079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Modern facial age estimation systems can achieve high accuracy when training and test datasets are identically distributed and captured under similar conditions. However, domain shifts in data, encountered in practice, lead to a sharp drop in accuracy of most existing age estimation algorithms. In this article, we propose a novel method, namely RAgE, to improve the robustness and reduce the uncertainty of age estimates by leveraging unlabelled data through a subject anchoring strategy and a novel consistency regularisation term. First, we propose an similarity-preserving pseudo-labelling algorithm by which the model generates pseudo-labels for a cohort of unlabelled images belonging to the same subject, while taking into account the similarity among age labels. In order to improve the robustness of the system, a consistency regularisation term is then used to simultaneously encourage the model to produce invariant outputs for the images in the cohort with respect to an anchor image. We propose a novel consistency regularisation term the noise-tolerant property of which effectively mitigates the so-called confirmation bias caused by incorrect pseudo-labels. Experiments on multiple benchmark ageing datasets demonstrate substantial improvements over the state-of-the-art methods and robustness to confounding external factors, including subject's head pose, illumination variation and appearance of expression in the face image.
Collapse
|
3
|
Liu D, Bober M, Kittler J. Importance Weighted Structure Learning for Scene Graph Generation. IEEE Trans Pattern Anal Mach Intell 2024; 46:1231-1242. [PMID: 37910406 DOI: 10.1109/tpami.2023.3329339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2023]
Abstract
Scene graph generation is a structured prediction task aiming to explicitly model objects and their relationships via constructing a visually-grounded scene graph for an input image. Currently, the message passing neural network based mean field variational Bayesian methodology is the ubiquitous solution for such a task, in which the variational inference objective is often assumed to be the classical evidence lower bound. However, the variational approximation inferred from such loose objective generally underestimates the underlying posterior, which often leads to inferior generation performance. In this paper, we propose a novel importance weighted structure learning method aiming to approximate the underlying log-partition function with a tighter importance weighted lower bound, which is computed from multiple samples drawn from a reparameterizable Gumbel-Softmax sampler. A generic entropic mirror descent algorithm is applied to solve the resulting constrained variational inference task. The proposed method achieves the state-of-the-art performance on various popular scene graph generation benchmarks.
Collapse
|
4
|
Chen Z, Wu XJ, Xu T, Kittler J. Discriminative Dictionary Pair Learning With Scale-Constrained Structured Representation for Image Classification. IEEE Trans Neural Netw Learn Syst 2023; 34:10225-10239. [PMID: 37015383 DOI: 10.1109/tnnls.2022.3165217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The dictionary pair learning (DPL) model aims to design a synthesis dictionary and an analysis dictionary to accomplish the goal of rapid sample encoding. In this article, we propose a novel structured representation learning algorithm based on the DPL for image classification. It is referred to as discriminative DPL with scale-constrained structured representation (DPL-SCSR). The proposed DPL-SCSR utilizes the binary label matrix of dictionary atoms to project the representation into the corresponding label space of the training samples. By imposing a non-negative constraint, the learned representation adaptively approximates a block-diagonal structure. This innovative transformation is also capable of controlling the scale of the block-diagonal representation by enforcing the sum of within-class coefficients of each sample to 1, which means that the dictionary atoms of each class compete to represent the samples from the same class. This implies that the requirement of similarity preservation is considered from the perspective of the constraint on the sum of coefficients. More importantly, the DPL-SCSR does not need to design a classifier in the representation space as the label matrix of the dictionary can also be used as an efficient linear classifier. Finally, the DPL-SCSR imposes the l2,p -norm on the analysis dictionary to make the process of feature extraction more interpretable. The DPL-SCSR seamlessly incorporates the scale-constrained structured representation learning, within-class similarity preservation of representation, and the linear classifier into one regularization term, which dramatically reduces the complexity of training and parameter tuning. The experimental results on several popular image classification datasets show that our DPL-SCSR can deliver superior performance compared with the state-of-the-art (SOTA) dictionary learning methods. The MATLAB code of this article is available at https://github.com/chenzhe207/DPL-SCSR.
Collapse
|
5
|
Chen Z, Wu XJ, Xu T, Kittler J. Fast Self-Guided Multi-View Subspace Clustering. IEEE Trans Image Process 2023; 32:6514-6525. [PMID: 37030827 DOI: 10.1109/tip.2023.3261746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Multi-view subspace clustering is an important topic in cluster analysis. Its aim is to utilize the complementary information conveyed by multiple views of objects to be clustered. Recently, view-shared anchor learning based multi-view clustering methods have been developed to speed up the learning of common data representation. Although widely applied to large-scale scenarios, most of the existing approaches are still faced with two limitations. First, they do not pay sufficient consideration on the negative impact caused by certain noisy views with unclear clustering structures. Second, many of them only focus on the multi-view consistency, yet are incapable of capturing the cross-view diversity. As a result, the learned complementary features may be inaccurate and adversely affect clustering performance. To solve these two challenging issues, we propose a Fast Self-guided Multi-view Subspace Clustering (FSMSC) algorithm which skillfully integrates the view-shared anchor learning and global-guided-local self-guidance learning into a unified model. Such an integration is inspired by the observation that the view with clean clustering structures will play a more crucial role in grouping the clusters when the features of all views are concatenated. Specifically, we first learn a locally-consistent data representation shared by all views in the local learning module, then we learn a globally-discriminative data representation from multi-view concatenated features in the global learning module. Afterwards, a feature selection matrix constrained by the l2,1 -norm is designed to construct a guidance from global learning to local learning. In this way, the multi-view consistent and diverse information can be simultaneously utilized and the negative impact caused by noisy views can be overcame to some extent. Extensive experiments on different datasets demonstrate the effectiveness of our proposed fast self-guided learning model, and its promising performance compared to both, the state-of-the-art non-deep and deep multi-view clustering algorithms. The code of this paper is available at https://github.com/chenzhe207/FSMSC.
Collapse
|
6
|
Khalid SS, Awais M, Feng ZH, Chan CH, Farooq A, Akbari A, Kittler J. NPT-Loss: Demystifying Face Recognition Losses With Nearest Proxies Triplet. IEEE Trans Pattern Anal Mach Intell 2023; 45:15249-15259. [PMID: 35344485 DOI: 10.1109/tpami.2022.3162705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Face recognition (FR) using deep convolutional neural networks (DCNNs) has seen remarkable success in recent years. One key ingredient of DCNN-based FR is the design of a loss function that ensures discrimination between various identities. The state-of-the-art (SOTA) solutions utilise normalised Softmax loss with additive and/or multiplicative margins. Despite being popular and effective, these losses are justified only intuitively with little theoretical explanations. In this work, we show that under the LogSumExp (LSE) approximation, the SOTA Softmax losses become equivalent to a proxy-triplet loss that focuses on nearest-neighbour negative proxies only. This motivates us to propose a variant of the proxy-triplet loss, entitled Nearest Proxies Triplet (NPT) loss, which unlike SOTA solutions, converges for a wider range of hyper-parameters and offers flexibility in proxy selection and thus outperforms SOTA techniques. We generalise many SOTA losses into a single framework and give theoretical justifications for the assertion that minimising the proposed loss ensures a minimum separability between all identities. We also show that the proposed loss has an implicit mechanism of hard-sample mining. We conduct extensive experiments using various DCNN architectures on a number of FR benchmarks to demonstrate the efficacy of the proposed scheme over SOTA methods.
Collapse
|
7
|
Liu D, Bober M, Kittler J. Constrained Structure Learning for Scene Graph Generation. IEEE Trans Pattern Anal Mach Intell 2023; 45:11588-11599. [PMID: 37276097 DOI: 10.1109/tpami.2023.3282889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
As a structured prediction task, scene graph generation aims to build a visually-grounded scene graph to explicitly model objects and their relationships in an input image. Currently, the mean field variational Bayesian framework is the de facto methodology used by the existing methods, in which the unconstrained inference step is often implemented by a message passing neural network. However, such formulation fails to explore other inference strategies, and largely ignores the more general constrained optimization models. In this paper, we present a constrained structure learning method, for which an explicit constrained variational inference objective is proposed. Instead of applying the ubiquitous message-passing strategy, a generic constrained optimization method - entropic mirror descent - is utilized to solve the constrained variational inference step. We validate the proposed generic model on various popular scene graph generation benchmarks and show that it outperforms the state-of-the-art methods.
Collapse
|
8
|
Li H, Xu T, Wu XJ, Lu J, Kittler J. LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images. IEEE Trans Pattern Anal Mach Intell 2023; PP. [PMID: 37074897 DOI: 10.1109/tpami.2023.3268209] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Deep learning based fusion methods have been achieving promising performance in image fusion tasks. This is attributed to the network architecture that plays a very important role in the fusion process. However, in general, it is hard to specify a good fusion architecture, and consequently, the design of fusion networks is still a black art, rather than science. To address this problem, we formulate the fusion task mathematically, and establish a connection between its optimal solution and the network architecture that can implement it. This approach leads to a novel method proposed in the paper of constructing a lightweight fusion network. It avoids the time-consuming empirical network design by a trial-and-test strategy. In particular we adopt a learnable representation approach to the fusion task, in which the construction of the fusion network architecture is guided by the optimisation algorithm producing the learnable model. The low-rank representation (LRR) objective is the foundation of our learnable model. The matrix multiplications, which are at the heart of the solution are transformed into convolutional operations, and the iterative process of optimisation is replaced by a special feed-forward network. Based on this novel network architecture, an end-to-end lightweight fusion network is constructed to fuse infrared and visible light images. Its successful training is facilitated by a detail-to-semantic information loss function proposed to preserve the image details and to enhance the salient features of the source images. Our experiments show that the proposed fusion network exhibits better fusion performance than the state-of-the-art fusion methods on public datasets. Interestingly, our network requires a fewer training parameters than other existing methods.
Collapse
|
9
|
Wang R, Wu XJ, Xu T, Hu C, Kittler J. U-SPDNet: An SPD manifold learning-based neural network for visual classification. Neural Netw 2023; 161:382-396. [PMID: 36780861 DOI: 10.1016/j.neunet.2022.11.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 11/07/2022] [Accepted: 11/27/2022] [Indexed: 12/15/2022]
Abstract
With the development of neural networking techniques, several architectures for symmetric positive definite (SPD) matrix learning have recently been put forward in the computer vision and pattern recognition (CV&PR) community for mining fine-grained geometric features. However, the degradation of structural information during multi-stage feature transformation limits their capacity. To cope with this issue, this paper develops a U-shaped neural network on the SPD manifolds (U-SPDNet) for visual classification. The designed U-SPDNet contains two subsystems, one of which is a shrinking path (encoder) making up of a prevailing SPD manifold neural network (SPDNet (Huang and Van Gool, 2017)) for capturing compact representations from the input data. Another is a constructed symmetric expanding path (decoder) to upsample the encoded features, trained by a reconstruction error term. With this design, the degradation problem will be gradually alleviated during training. To enhance the representational capacity of U-SPDNet, we also append skip connections from encoder to decoder, realized by manifold-valued geometric operations, namely Riemannian barycenter and Riemannian optimization. On the MDSD, Virus, FPHA, and UAV-Human datasets, the accuracy achieved by our method is respectively 6.92%, 8.67%, 1.57%, and 1.08% higher than SPDNet, certifying its effectiveness.
Collapse
Affiliation(s)
- Rui Wang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China; Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| | - Xiao-Jun Wu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China; Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China.
| | - Tianyang Xu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China; Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| | - Cong Hu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China; Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi 214122, China
| | - Josef Kittler
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China; Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, Guildford GU2 7XH, UK
| |
Collapse
|
10
|
Xu T, Feng Z, Wu XJ, Kittler J. Towards Robust Visual Object Tracking with Independent Target-Agnostic Detection and Effective Siamese Cross-Task Interaction. IEEE Trans Image Process 2023; PP:1541-1554. [PMID: 37027596 DOI: 10.1109/tip.2023.3246800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Advanced Siamese visual object tracking architectures are jointly trained using pair-wise input images to perform target classification and bounding box regression. They have achieved promising results in recent benchmarks and competitions. However, the existing methods suffer from two limitations: First, though the Siamese structure can estimate the target state in an instance frame, provided the target appearance does not deviate too much from the template, the detection of the target in an image cannot be guaranteed in the presence of severe appearance variations. Second, despite the classification and regression tasks sharing the same output from the backbone network, their specific modules and loss functions are invariably designed independently, without promoting any interaction. Yet, in a general tracking task, the centre classification and bounding box regression tasks are collaboratively working to estimate the final target location. To address the above issues, it is essential to perform target-agnostic detection so as to promote cross-task interactions in a Siamese-based tracking framework. In this work, we endow a novel network with a target-agnostic object detection module to complement the direct target inference, and to avoid or minimise the misalignment of the key cues of potential template-instance matches. To unify the multi-task learning formulation, we develop a cross-task interaction module to ensure consistent supervision of the classification and regression branches, improving the synergy of different branches. To eliminate potential inconsistencies that may arise within a multi-task architecture, we assign adaptive labels, rather than fixed hard labels, to supervise the network training more effectively. The experimental results obtained on several benchmarks, i.e., OTB100, UAV123, VOT2018, VOT2019, and LaSOT, demonstrate the effectiveness of the advanced target detection module, as well as the cross-task interaction, exhibiting superior tracking performance as compared with the state-of-the-art tracking methods.
Collapse
|
11
|
Liu D, Bober M, Kittler J. Neural Belief Propagation for Scene Graph Generation. IEEE Trans Pattern Anal Mach Intell 2023; PP:1-13. [PMID: 37022845 DOI: 10.1109/tpami.2023.3243306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Scene graph generation aims to interpret an input image by explicitly modelling the objects contained therein and their relationships. In existing methods the problem is predominantly solved by message passing neural network models. Unfortunately, in such models, the variational distributions generally ignore the structural dependencies among the output variables, and most of the scoring functions only consider pairwise dependencies. This can lead to inconsistent interpretations. In this paper, we propose a novel neural belief propagation method seeking to replace the traditional mean field approximation with a structural Bethe approximation. To find a better bias-variance trade-off, higher-order dependencies among three or more output variables are also incorporated into the relevant scoring function. The proposed method achieves the state-of-the-art performance on various popular scene graph generation benchmarks.
Collapse
|
12
|
Yu W, Wu XJ, Xu T, Chen Z, Kittler J. Scalable Affine Multi-view Subspace Clustering. Neural Process Lett 2023. [DOI: 10.1007/s11063-022-11059-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
13
|
Hua Y, Song X, Feng Z, Wu XJ, Kittler J, Yu DJ. CPInformer for Efficient and Robust Compound-Protein Interaction Prediction. IEEE/ACM Trans Comput Biol Bioinform 2023; 20:285-296. [PMID: 35044921 DOI: 10.1109/tcbb.2022.3144008] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Recently, deep learning has become the mainstream methodology for Compound-Protein Interaction (CPI) prediction. However, the existing compound-protein feature extraction methods have some issues that limit their performance. First, graph networks are widely used for structural compound feature extraction, but the chemical properties of a compound depend on functional groups rather than graphic structure. Besides, the existing methods lack capabilities in extracting rich and discriminative protein features. Last, the compound-protein features are usually simply combined for CPI prediction, without considering information redundancy and effective feature mining. To address the above issues, we propose a novel CPInformer method. Specifically, we extract heterogeneous compound features, including structural graph features and functional class fingerprints, to reduce prediction errors caused by similar structural compounds. Then, we combine local and global features using dense connections to obtain multi-scale protein features. Last, we apply ProbSparse self-attention to protein features, under the guidance of compound features, to eliminate information redundancy, and to improve the accuracy of CPInformer. More importantly, the proposed method identifies the activated local regions that link a CPI, providing a good visualisation for the CPI state. The results obtained on five benchmarks demonstrate the merits and superiority of CPInformer over the state-of-the-art approaches.
Collapse
|
14
|
Akbari A, Awais M, Bashar M, Kittler J. A Theoretical Insight Into the Effect of Loss Function for Deep Semantic-Preserving Learning. IEEE Trans Neural Netw Learn Syst 2023; 34:119-133. [PMID: 34283721 DOI: 10.1109/tnnls.2021.3090358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Good generalization performance is the fundamental goal of any machine learning algorithm. Using the uniform stability concept, this article theoretically proves that the choice of loss function impacts the generalization performance of a trained deep neural network (DNN). The adopted stability-based framework provides an effective tool for comparing the generalization error bound with respect to the utilized loss function. The main result of our analysis is that using an effective loss function makes stochastic gradient descent more stable which consequently leads to the tighter generalization error bound, and so better generalization performance. To validate our analysis, we study learning problems in which the classes are semantically correlated. To capture this semantic similarity of neighboring classes, we adopt the well-known semantics-preserving learning framework, namely label distribution learning (LDL). We propose two novel loss functions for the LDL framework and theoretically show that they provide stronger stability than the other widely used loss functions adopted for training DNNs. The experimental results on three applications with semantically correlated classes, including facial age estimation, head pose estimation, and image esthetic assessment, validate the theoretical insights gained by our analysis and demonstrate the usefulness of the proposed loss functions in practical applications.
Collapse
|
15
|
Akbari A, Awais M, Fatemifar S, Kittler J. Deep Order-Preserving Learning With Adaptive Optimal Transport Distance. IEEE Trans Pattern Anal Mach Intell 2023; 45:313-328. [PMID: 35254972 DOI: 10.1109/tpami.2022.3156885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
We consider a framework for taking into consideration the relative importance (ordinality) of object labels in the process of learning a label predictor function. The commonly used loss functions are not well matched to this problem, as they exhibit deficiencies in capturing natural correlations of the labels and the corresponding data. We propose to incorporate such correlations into our learning algorithm using an optimal transport formulation. Our approach is to learn the ground metric, which is partly involved in forming the optimal transport distance, by leveraging ordinality as a general form of side information in its formulation. Based on this idea, we then develop a novel loss function for training deep neural networks. A highly efficient alternating learning method is then devised to alternatively optimise the ground metric and the deep model in an end-to-end learning manner. This scheme allows us to adaptively adjust the shape of the ground metric, and consequently the shape of the loss function for each application. We back up our approach by theoretical analysis and verify the performance of our proposed scheme by applying it to two learning tasks, i.e. chronological age estimation from the face and image aesthetic assessment. The numerical results on several benchmark datasets demonstrate the superiority of the proposed algorithm.
Collapse
|
16
|
Akbari A, Awais M, Fatemifar S, Khalid SS, Kittler J. A Novel Ground Metric for Optimal Transport-Based Chronological Age Estimation. IEEE Trans Cybern 2022; 52:9986-9999. [PMID: 34133311 DOI: 10.1109/tcyb.2021.3083245] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Label distribution learning (LDL) is the state-of-the-art approach to dealing with a number of real-world applications, such as chronological age estimation from a face image, where there is an inherent similarity among adjacent age labels. LDL takes into account the semantic similarity by assigning a label distribution to each instance. The well-known Kullback-Leibler (KL) divergence is the widely used loss function for the LDL framework. However, the KL divergence does not fully and effectively capture the semantic similarity among age labels, thus leading to suboptimal performance. In this article, we propose a novel loss function based on the optimal transport theory for the LDL-based age estimation. A ground metric function plays an important role in the optimal transport formulation. It should be carefully determined based on the underlying geometric structure of the label space of the application in-hand. The label space in the age estimation problem has a specific geometric structure, that is, closer ages have more inherent semantic relationships. Inspired by this, we devise a novel ground metric function, which enables the loss function to increase the influence of highly correlated ages; thus exploiting the semantic similarity among ages more effectively than the existing loss functions. We then use the proposed loss function, namely, γ -Wasserstein loss, for training a deep neural network (DNN). This leads to a notoriously computationally expensive and nonconvex optimization problem. Following the standard methodology, we formulate the optimization function as a convex problem and then use an efficient iterative algorithm to update the parameters of the DNN. Extensive experiments in age estimation on different benchmark datasets validate the effectiveness of the proposed method, which consistently outperforms state-of-the-art approaches.
Collapse
|
17
|
Jiang Y, Song X, Xu T, Feng Z, Wu X, Kittler J. Target-Cognisant Siamese Network for Robust Visual Object Tracking. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.09.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
18
|
Wang R, Wu XJ, Liu Z, Kittler J. Geometry-Aware Graph Embedding Projection Metric Learning for Image Set Classification. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2021.3086814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Rui Wang
- School of Artificial Intelligence and Computer Science and Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, China
| | - Xiao-Jun Wu
- School of Artificial Intelligence and Computer Science and Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, China
| | - Zhen Liu
- School of Artificial Intelligence and Computer Science and Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, China
| | - Josef Kittler
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, U.K
| |
Collapse
|
19
|
Chen Z, Wu XJ, Kittler J. Relaxed Block-Diagonal Dictionary Pair Learning With Locality Constraint for Image Recognition. IEEE Trans Neural Netw Learn Syst 2022; 33:3645-3659. [PMID: 33764879 DOI: 10.1109/tnnls.2021.3053941] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
We propose a novel structured analysis-synthesis dictionary pair learning method for efficient representation and image classification, referred to as relaxed block-diagonal dictionary pair learning with a locality constraint (RBD-DPL). RBD-DPL aims to learn relaxed block-diagonal representations of the input data to enhance the discriminability of both analysis and synthesis dictionaries by dynamically optimizing the block-diagonal components of representation, while the off-block-diagonal counterparts are set to zero. In this way, the learned synthesis subdictionary is allowed to be more flexible in reconstructing the samples from the same class, and the analysis dictionary effectively transforms the original samples into a relaxed coefficient subspace, which is closely associated with the label information. Besides, we incorporate a locality-constraint term as a complement of the relaxation learning to enhance the locality of the analytical encoding so that the learned representation exhibits high intraclass similarity. A linear classifier is trained in the learned relaxed representation space for consistent classification. RBD-DPL is computationally efficient because it avoids both the use of class-specific complementary data matrices to learn discriminative analysis dictionary, as well as the time-consuming l1/l0 -norm sparse reconstruction process. The experimental results demonstrate that our RBD-DPL achieves at least comparable or better recognition performance than the state-of-the-art algorithms. Moreover, both the training and testing time are significantly reduced, which verifies the efficiency of our method. The MATLAB code of the proposed RBD-DPL is available at https://github.com/chenzhe207/RBD-DPL.
Collapse
|
20
|
|
21
|
Liu Z, Song X, Feng Z, Xu T, Wu X, Kittler J. Global Context-Aware Feature Extraction and Visible Feature Enhancement for Occlusion-Invariant Pedestrian Detection in Crowded Scenes. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10910-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
22
|
Wang R, Wu XJ, Kittler J. SymNet: A Simple Symmetric Positive Definite Manifold Deep Learning Method for Image Set Classification. IEEE Trans Neural Netw Learn Syst 2022; 33:2208-2222. [PMID: 33784627 DOI: 10.1109/tnnls.2020.3044176] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
By representing each image set as a nonsingular covariance matrix on the symmetric positive definite (SPD) manifold, visual classification with image sets has attracted much attention. Despite the success made so far, the issue of large within-class variability of representations still remains a key challenge. Recently, several SPD matrix learning methods have been proposed to assuage this problem by directly constructing an embedding mapping from the original SPD manifold to a lower dimensional one. The advantage of this type of approach is that it cannot only implement discriminative feature selection but also preserve the Riemannian geometrical structure of the original data manifold. Inspired by this fact, we propose a simple SPD manifold deep learning network (SymNet) for image set classification in this article. Specifically, we first design SPD matrix mapping layers to map the input SPD matrices into new ones with lower dimensionality. Then, rectifying layers are devised to activate the input matrices for the purpose of forming a valid SPD manifold, chiefly to inject nonlinearity for SPD matrix learning with two nonlinear functions. Afterward, we introduce pooling layers to further compress the input SPD matrices, and the log-map layer is finally exploited to embed the resulting SPD matrices into the tangent space via log-Euclidean Riemannian computing, such that the Euclidean learning applies. For SymNet, the (2-D)2principal component analysis (PCA) technique is utilized to learn the multistage connection weights without requiring complicated computations, thus making it be built and trained easier. On the tail of SymNet, the kernel discriminant analysis (KDA) algorithm is coupled with the output vectorized feature representations to perform discriminative subspace learning. Extensive experiments and comparisons with state-of-the-art methods on six typical visual classification tasks demonstrate the feasibility and validity of the proposed SymNet.
Collapse
|
23
|
Akbari A, Awais M, Feng ZH, Farooq A, Kittler J. Distribution Cognisant Loss for Cross-Database Facial Age Estimation With Sensitivity Analysis. IEEE Trans Pattern Anal Mach Intell 2022; 44:1869-1887. [PMID: 33026982 DOI: 10.1109/tpami.2020.3029486] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Existing facial age estimation studies have mostly focused on intra-database protocols that assume training and test images are captured under similar conditions. This is rarely valid in practical applications, where we typically encounter training and test sets with different characteristics. In this article, we deal with such situations, namely subjective-exclusive cross-database age estimation. We formulate the age estimation problem as the distribution learning framework, where the age labels are encoded as a probability distribution. To improve the cross-database age estimation performance, we propose a new loss function which provides a more robust measure of the difference between ground-truth and predicted distributions. The desirable properties of the proposed loss function are theoretically analysed and compared with the state-of-the-art approaches. In addition, we compile a new balanced large-scale age estimation database. Last, we introduce a novel evaluation protocol, called subject-exclusive cross-database age estimation protocol, which provides meaningful information of a method in terms of the generalisation capability. The experimental results demonstrate that the proposed approach outperforms the state-of-the-art age estimation methods under both intra-database and subject-exclusive cross-database evaluation protocols. In addition, in this article, we provide a comparative sensitivity analysis of various algorithms to identify trends and issues inherent to their performance. This analysis introduces some open problems to the community which might be considered when designing a robust age estimation system.
Collapse
|
24
|
Fatemifar S, Asadi S, Awais M, Akbari A, Kittler J. Face spoofing detection ensemble via multistage optimisation and pruning. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.04.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
25
|
Wang R, Wu XJ, Chen Z, Xu T, Kittler J. Learning a discriminative SPD manifold neural network for image set classification. Neural Netw 2022; 151:94-110. [DOI: 10.1016/j.neunet.2022.03.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 02/15/2022] [Accepted: 03/07/2022] [Indexed: 11/26/2022]
|
26
|
Chen Z, Wu XJ, Kittler J. Fisher Regularized ε-Dragging for Image Classification. IEEE Trans Cogn Dev Syst 2022. [DOI: 10.1109/tcds.2022.3175008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Zhe Chen
- School of AI & CS, Jiangnan University, Wuxi 214122, China
| | - Xiao-Jun Wu
- School of AI & CS, Jiangnan University, Wuxi 214122, China
| | - Josef Kittler
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, U.K
| |
Collapse
|
27
|
Rahimzadeh Arashloo S, Kittler J. Multi-target regression via non-linear output structure learning. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.12.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
28
|
|
29
|
Pfeifle R, Kittler J, Wuhrer M, Schett G, Krönke G. AB0016 THE IMPACT OF IL-17A THERAPY ON IGG SIALYLATION IN HUMANS. Ann Rheum Dis 2021. [DOI: 10.1136/annrheumdis-2021-eular.1087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Background:Rheumatoid arthritis (RA) is characterized by autoreactive B- and T cells. Autoantibodies are a hallmark of RA and contribute to synovial inflammation. We have recently demonstrated that Th17 cells suppress the enzyme ST6 a-galactoside b-2,6-sialyltransferase (ST6GAL1) in developing plasma cells. Thereby, Th17 cells regulate the degree of autoantibody sialylation leading to the increased inflammatory activity of autoantibodies. These events correlate with the onset of RA, arguing for a crucial role of the IL-23/Th17 axis during the transition of asymptomatic autoimmunity into active RA. Therefore, treatment against the IL-23/TH17-axis might present an attractive therapeutic approach to halt or delay RA’s onset. However, the effects of Th17 cytokines like IL-17 on IgG glycosylation in humans are so far poorly studied.Objectives:To explore whether anti-IL17A treatment can inhibit pro-inflammatory IgG glycosylation patterns in humans.Methods:Total IgG from patient cohorts suffering from psoriatic arthritis (PsA) treated with Secukinumab (anti-IL-17 blockade, n=26) or Ustekinumab (anti-IL12/23 blockade, n=14) was compared with patients treated with anti-TNFa blockade as a control (n=20). The cohorts were age- and sex-matched and included patients being on therapy for at least six months. Total IgG was isolated using Protein G columns, and IgG glycopeptides of IgG1, IgG2, and IgG4 were analyzed using the LC-MS technique. The effect of IL-17 depletion on IgG glycosylation was analyzed in psoriatic arthritis patients who have been treated with secukinumab for at least six months. Furthermore, in a longitudinal approach, IgG1, IgG2, and IgG4 glycosylation were analyzed from samples, isolated before the beginning of anti-IL-17 blockade and after at least six months of therapy (n=16).Results:Cross-sectional comparison of cohorts treated with Ustekinumab, Sekukinumab, and anti-TNFa therapy did not show any significant differences in sialylation, galactosylation, or fucosylation of IgG1 and IgG2. IgG4 from anti-TNFa treated patients displayed a small increase of sialylation when compared to the Ustekinumab treated cohort.Longitudinal analyses, however, showed that IL-17A blockade during Secukinumab therapy caused a significant increase of sialic acid-rich IgG glycoforms on IgG1, IgG2 IgG4 patients, while the galactosylation, fucosylation remained unaffected.Conclusion:This data indicates that IL-17A blockade specifically affects IgG sialylation, while other Fc-glycan modifications remain unaltered. This data confirms our recent findings in mice, where cytokines of the IL-23/Th17 axis specifically induce the production of hypo-sialylated, proinflammatory autoantibodies in rheumatoid arthritis (RA) [2]. Therefore, neutralizing IL-17 might be a therapeutic option during the asymptomatic autoimmune prodromal phase in autoimmune diseases like RA, where TH17 cytokines orchestrate the emergence of a pro-inflammatory autoantibody response and the transition into active RA.References:[1]McInnes IB, G. Schett, The pathogenesis of rheumatoid arthritis. N Engl J Med 2011; 365: 2205-19.[2]Pfeifle R et al, Regulation of autoantibody activity by the IL-23-Th17 axis determines the onset of autoimmune disease. Nat Immunol. 2017, Jan;18(1):104-113.Disclosure of Interests:Rene Pfeifle Grant/research support from: Novartis AG., Julia Kittler: None declared, Manfred Wuhrer: None declared, Georg Schett: None declared, Gerhard Krönke Grant/research support from: Novartis AG
Collapse
|
30
|
Abstract
Visual semantic information comprises two important parts: the meaning of each visual semantic unit and the coherent visual semantic relation conveyed by these visual semantic units. Essentially, the former one is a visual perception task while the latter corresponds to visual context reasoning. Remarkable advances in visual perception have been achieved due to the success of deep learning. In contrast, visual semantic information pursuit, a visual scene semantic interpretation task combining visual perception and visual context reasoning, is still in its early stage. It is the core task of many different computer vision applications, such as object detection, visual semantic segmentation, visual relationship detection, or scene graph generation. Since it helps to enhance the accuracy and the consistency of the resulting interpretation, visual context reasoning is often incorporated with visual perception in current deep end-to-end visual semantic information pursuit methods. Surprisingly, a comprehensive review for this exciting area is still lacking. In this survey, we present a unified theoretical paradigm for all these methods, followed by an overview of the major developments and the future trends in each potential direction. The common benchmark datasets, the evaluation metrics and the comparisons of the corresponding methods are also introduced.
Collapse
|
31
|
Arashloo SR, Kittler J. Robust One-Class Kernel Spectral Regression. IEEE Trans Neural Netw Learn Syst 2021; 32:999-1013. [PMID: 32481229 DOI: 10.1109/tnnls.2020.2979823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The kernel null-space technique is known to be an effective one-class classification (OCC) technique. Nevertheless, the applicability of this method is limited due to its susceptibility to possible training data corruption and the inability to rank training observations according to their conformity with the model. This article addresses these shortcomings by regularizing the solution of the null-space kernel Fisher methodology in the context of its regression-based formulation. In this respect, first, the effect of the Tikhonov regularization in the Hilbert space is analyzed, where the one-class learning problem in the presence of contamination in the training set is posed as a sensitivity analysis problem. Next, the effect of the sparsity of the solution is studied. For both alternative regularization schemes, iterative algorithms are proposed which recursively update label confidences. Through extensive experiments, the proposed methodology is found to enhance robustness against contamination in the training set compared with the baseline kernel null-space method, as well as other existing approaches in the OCC paradigm, while providing the functionality to rank training samples effectively.
Collapse
|
32
|
Dos Santos FP, Zor C, Kittler J, Ponti MA. Learning image features with fewer labels using a semi-supervised deep convolutional network. Neural Netw 2020; 132:131-143. [PMID: 32871338 DOI: 10.1016/j.neunet.2020.08.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 05/04/2020] [Accepted: 08/13/2020] [Indexed: 11/16/2022]
Abstract
Learning feature embeddings for pattern recognition is a relevant task for many applications. Deep learning methods such as convolutional neural networks can be employed for this assignment with different training strategies: leveraging pre-trained models as baselines; training from scratch with the target dataset; or fine-tuning from the pre-trained model. Although there are separate systems used for learning features from labelled and unlabelled data, there are few models combining all available information. Therefore, in this paper, we present a novel semi-supervised deep network training strategy that comprises a convolutional network and an autoencoder using a joint classification and reconstruction loss function. We show our network improves the learned feature embedding when including the unlabelled data in the training process. The results using the feature embedding obtained by our network achieve better classification accuracy when compared with competing methods, as well as offering good generalisation in the context of transfer learning. Furthermore, the proposed network ensemble and loss function is highly extensible and applicable in many recognition tasks.
Collapse
Affiliation(s)
- Fernando P Dos Santos
- Institute of Mathematical and Computer Sciences (ICMC), University of São Paulo (USP), São Carlos/SP, 13566-590, Brazil.
| | - Cemre Zor
- Centre for Medical Image Computing (CMIC), University College London, WC1E 7JE, United Kingdom.
| | - Josef Kittler
- Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, GU2 7XH, United Kingdom.
| | - Moacir A Ponti
- Institute of Mathematical and Computer Sciences (ICMC), University of São Paulo (USP), São Carlos/SP, 13566-590, Brazil.
| |
Collapse
|
33
|
Li H, Wu XJ, Kittler J. MDLatLRR: A novel decomposition method for infrared and visible image fusion. IEEE Trans Image Process 2020; 29:4733-4746. [PMID: 32142438 DOI: 10.1109/tip.2020.2975984] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Image decomposition is crucial for many image processing tasks, as it allows to extract salient features from source images. A good image decomposition method could lead to a better performance, especially in image fusion tasks. We propose a multi-level image decomposition method based on latent low-rank representation(LatLRR), which is called MDLatLRR. This decomposition method is applicable to many image processing fields. In this paper, we focus on the image fusion task. We build a novel image fusion framework based on MDLatLRR which is used to decompose source images into detail parts(salient features) and base parts. A nuclear-norm based fusion strategy is used to fuse the detail parts and the base parts are fused by an averaging strategy. Compared with other state-of-the-art fusion methods, the proposed algorithm exhibits better fusion performance in both subjective and objective evaluation.
Collapse
|
34
|
|
35
|
Feng ZH, Kittler J, Awais M, Wu XJ. Rectified Wing Loss for Efficient and Robust Facial Landmark Localisation with Convolutional Neural Networks. Int J Comput Vis 2019. [DOI: 10.1007/s11263-019-01275-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
AbstractEfficient and robust facial landmark localisation is crucial for the deployment of real-time face analysis systems. This paper presents a new loss function, namely Rectified Wing (RWing) loss, for regression-based facial landmark localisation with Convolutional Neural Networks (CNNs). We first systemically analyse different loss functions, including L2, L1 and smooth L1. The analysis suggests that the training of a network should pay more attention to small-medium errors. Motivated by this finding, we design a piece-wise loss that amplifies the impact of the samples with small-medium errors. Besides, we rectify the loss function for very small errors to mitigate the impact of inaccuracy of manual annotation. The use of our RWing loss boosts the performance significantly for regression-based CNNs in facial landmarking, especially for lightweight network architectures. To address the problem of under-representation of samples with large pose variations, we propose a simple but effective boosting strategy, referred to as pose-based data balancing. In particular, we deal with the data imbalance problem by duplicating the minority training samples and perturbing them by injecting random image rotation, bounding box translation and other data augmentation strategies. Last, the proposed approach is extended to create a coarse-to-fine framework for robust and efficient landmark localisation. Moreover, the proposed coarse-to-fine framework is able to deal with the small sample size problem effectively. The experimental results obtained on several well-known benchmarking datasets demonstrate the merits of our RWing loss and prove the superiority of the proposed method over the state-of-the-art approaches.
Collapse
|
36
|
Yin HF, Wu XJ, Kittler J, Feng ZH. Learning a representation with the block-diagonal structure for pattern classification. Pattern Anal Appl 2019. [DOI: 10.1007/s10044-019-00858-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
37
|
|
38
|
Xu T, Feng ZH, Wu XJ, Kittler J. Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Object Tracking. IEEE Trans Image Process 2019; 28:5596-5609. [PMID: 31170074 DOI: 10.1109/tip.2019.2919201] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
With efficient appearance learning models, discriminative correlation filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filters. Consequently, the process of learning spatial filters can be approximated by the lasso regularization. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimization framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123, and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.
Collapse
|
39
|
Chen Z, Wu XJ, Kittler J. A sparse regularized nuclear norm based matrix regression for face recognition with contiguous occlusion. Pattern Recognit Lett 2019. [DOI: 10.1016/j.patrec.2019.05.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
40
|
Abstract
In pattern recognition, disagreement between two classifiers regarding the predicted class membership of an observation can be indicative of an anomaly and its nuance. Since, in general, classifiers base their decisions on class a posteriori probabilities, the most natural approach to detecting classifier incongruence is to use divergence. However, existing divergences are not particularly suitable to gauge classifier incongruence. In this paper, we postulate the properties that a divergence measure should satisfy and propose a novel divergence measure, referred to as delta divergence. In contrast to existing measures, it focuses on the dominant (most probable) hypotheses and, thus, reduces the effect of the probability mass distributed over the non dominant hypotheses (clutter). The proposed measure satisfies other important properties, such as symmetry, and independence of classifier confidence. The relationship of the proposed divergence to some baseline measures, and its superiority, is shown experimentally.
Collapse
|
41
|
Crouch DJM, Winney B, Koppen WP, Christmas WJ, Hutnik K, Day T, Meena D, Boumertit A, Hysi P, Nessa A, Spector TD, Kittler J, Bodmer WF. Genetics of the human face: Identification of large-effect single gene variants. Proc Natl Acad Sci U S A 2018; 115:E676-E685. [PMID: 29301965 PMCID: PMC5789906 DOI: 10.1073/pnas.1708207114] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To discover specific variants with relatively large effects on the human face, we have devised an approach to identifying facial features with high heritability. This is based on using twin data to estimate the additive genetic value of each point on a face, as provided by a 3D camera system. In addition, we have used the ethnic difference between East Asian and European faces as a further source of face genetic variation. We use principal components (PCs) analysis to provide a fine definition of the surface features of human faces around the eyes and of the profile, and chose upper and lower 10% extremes of the most heritable PCs for looking for genetic associations. Using this strategy for the analysis of 3D images of 1,832 unique volunteers from the well-characterized People of the British Isles study and 1,567 unique twin images from the TwinsUK cohort, together with genetic data for 500,000 SNPs, we have identified three specific genetic variants with notable effects on facial profiles and eyes.
Collapse
Affiliation(s)
- Daniel J M Crouch
- Cancer and Immunogenetics Laboratory, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom
- Department of Oncology, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - Bruce Winney
- Cancer and Immunogenetics Laboratory, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom
- Department of Oncology, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - Willem P Koppen
- Centre for Vision, Speech and Signal Processing, Department of Electronic Engineering, University of Surrey, Guildford GU2 7XH, United Kingdom
| | - William J Christmas
- Centre for Vision, Speech and Signal Processing, Department of Electronic Engineering, University of Surrey, Guildford GU2 7XH, United Kingdom
| | - Katarzyna Hutnik
- Cancer and Immunogenetics Laboratory, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom
- Department of Oncology, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - Tammy Day
- Cancer and Immunogenetics Laboratory, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom
- Department of Oncology, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - Devendra Meena
- Cancer and Immunogenetics Laboratory, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom
- Department of Oncology, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - Abdelhamid Boumertit
- Cancer and Immunogenetics Laboratory, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom
- Department of Oncology, University of Oxford, Oxford OX3 7DQ, United Kingdom
| | - Pirro Hysi
- TwinsUK, St. Thomas' Hospital, King's College London, London SE1 7EH, United Kingdom
| | - Ayrun Nessa
- TwinsUK, St. Thomas' Hospital, King's College London, London SE1 7EH, United Kingdom
| | - Tim D Spector
- TwinsUK, St. Thomas' Hospital, King's College London, London SE1 7EH, United Kingdom
| | - Josef Kittler
- Centre for Vision, Speech and Signal Processing, Department of Electronic Engineering, University of Surrey, Guildford GU2 7XH, United Kingdom
| | - Walter F Bodmer
- Cancer and Immunogenetics Laboratory, Weatherall Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom;
- Department of Oncology, University of Oxford, Oxford OX3 7DQ, United Kingdom
| |
Collapse
|
42
|
Gecer B, Bhattarai B, Kittler J, Kim TK. Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model. Computer Vision – ECCV 2018 2018. [DOI: 10.1007/978-3-030-01252-6_14] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
|
43
|
Pardo A, Kittler J. Special section: CIARP 2015. Pattern Recognit Lett 2017. [DOI: 10.1016/j.patrec.2016.11.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
44
|
Affiliation(s)
- Norman Poh
- Department of ComputingUniversity of SurreyGuildford GU2 7XHSurreyUK
| | | | - Chi‐Ho Chan
- CVSSPUniversity of SurreyGuildford GU2 7XHSurreyUK
| | - Medha Pandit
- CVSSPUniversity of SurreyGuildford GU2 7XHSurreyUK
| |
Collapse
|
45
|
Feng ZH, Hu G, Kittler J, Christmas W, Wu XJ. Cascaded Collaborative Regression for Robust Facial Landmark Detection Trained Using a Mixture of Synthetic and Real Images With Dynamic Weighting. IEEE Trans Image Process 2015; 24:3425-3440. [PMID: 26087493 DOI: 10.1109/tip.2015.2446944] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
A large amount of training data is usually crucial for successful supervised learning. However, the task of providing training samples is often time-consuming, involving a considerable amount of tedious manual work. In addition, the amount of training data available is often limited. As an alternative, in this paper, we discuss how best to augment the available data for the application of automatic facial landmark detection. We propose the use of a 3D morphable face model to generate synthesized faces for a regression-based detector training. Benefiting from the large synthetic training data, the learned detector is shown to exhibit a better capability to detect the landmarks of a face with pose variations. Furthermore, the synthesized training data set provides accurate and consistent landmarks automatically as compared to the landmarks annotated manually, especially for occluded facial parts. The synthetic data and real data are from different domains; hence the detector trained using only synthesized faces does not generalize well to real faces. To deal with this problem, we propose a cascaded collaborative regression algorithm, which generates a cascaded shape updater that has the ability to overcome the difficulties caused by pose variations, as well as achieving better accuracy when applied to real faces. The training is based on a mix of synthetic and real image data with the mixing controlled by a dynamic mixture weighting schedule. Initially, the training uses heavily the synthetic data, as this can model the gross variations between the various poses. As the training proceeds, progressively more of the natural images are incorporated, as these can model finer detail. To improve the performance of the proposed algorithm further, we designed a dynamic multi-scale local feature extraction method, which captures more informative local features for detector training. An extensive evaluation on both controlled and uncontrolled face data sets demonstrates the merit of the proposed algorithm.
Collapse
|
46
|
Kittler J, Schrader O, Kästner U, Marthe F. Chromosome number and ploidy level of balm (Melissa officinalis). Mol Cytogenet 2015; 8:61. [PMID: 26300974 PMCID: PMC4545860 DOI: 10.1186/s13039-015-0166-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 07/28/2015] [Indexed: 11/29/2022] Open
Abstract
Background Lemon balm (Melissa officinalis L.) is of increasing importance resulting in rising growth area. Improved knowledge on the genome structure, number of chromosomes in connection with the taxonomical structure of balm is indispensable for improved new varieties. Results A collection of 40 balm accessions (M. officinalis) was characterized by flow cytometry and FISH (18/25S and 5S rDNA) to determine the chromosome number and ploidy level. Three different types were found: diploid genotypes with 2n = 2× = 32 chromosomes; tetraploid 2n = 4× = 64 chromosomes and triploid 2n = 3× = 48 chromosomes. A haploid base number of × = 16 chromosomes is likely. First time described triploid accessions are sterile but cytologically and morphologically stable for many years. Triploids express better winter hardiness and regeneration after harvesting cuts as well as bigger leaves and internodes. Conclusions A basic chromosome number of x = 16 is reported for the first time for the species M. officinalis.
Collapse
Affiliation(s)
- J Kittler
- Institute for Breeding Research on Horticultural Crops of the Julius Kuehn Institute (JKI), Federal Research Centre for Cultivated Plants, Erwin-Baur-Str. 27, Quedlinburg, D-06484 Germany
| | - O Schrader
- Institute for Breeding Research on Horticultural Crops of the Julius Kuehn Institute (JKI), Federal Research Centre for Cultivated Plants, Erwin-Baur-Str. 27, Quedlinburg, D-06484 Germany
| | - U Kästner
- Institute for Breeding Research on Horticultural Crops of the Julius Kuehn Institute (JKI), Federal Research Centre for Cultivated Plants, Erwin-Baur-Str. 27, Quedlinburg, D-06484 Germany
| | - F Marthe
- Institute for Breeding Research on Horticultural Crops of the Julius Kuehn Institute (JKI), Federal Research Centre for Cultivated Plants, Erwin-Baur-Str. 27, Quedlinburg, D-06484 Germany
| |
Collapse
|
47
|
|
48
|
Affiliation(s)
- Chris J McBain
- Porter Neuroscience Research Center, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Building 35, Room 3C-903, 35 Convent Drive, MSC 3715, Bethesda, MD, 20892-3715, United States.
| | - Josef Kittler
- Porter Neuroscience Research Center, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Building 35, Room 3C-903, 35 Convent Drive, MSC 3715, Bethesda, MD, 20892-3715, United States
| | - Bernhard Luscher
- Porter Neuroscience Research Center, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Building 35, Room 3C-903, 35 Convent Drive, MSC 3715, Bethesda, MD, 20892-3715, United States
| | - Istvan Mody
- Porter Neuroscience Research Center, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Building 35, Room 3C-903, 35 Convent Drive, MSC 3715, Bethesda, MD, 20892-3715, United States
| | - Beverley A Orser
- Porter Neuroscience Research Center, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Building 35, Room 3C-903, 35 Convent Drive, MSC 3715, Bethesda, MD, 20892-3715, United States
| |
Collapse
|
49
|
|
50
|
Khan A, Windridge D, Kittler J. Multilevel Chinese takeaway process and label-based processes for rule induction in the context of automated sports video annotation. IEEE Trans Cybern 2014; 44:1910-1923. [PMID: 25222731 DOI: 10.1109/tcyb.2014.2299955] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We propose four variants of a novel hierarchical hidden Markov models strategy for rule induction in the context of automated sports video annotation including a multilevel Chinese takeaway process (MLCTP) based on the Chinese restaurant process and a novel Cartesian product label-based hierarchical bottom-up clustering (CLHBC) method that employs prior information contained within label structures. Our results show significant improvement by comparison against the flat Markov model: optimal performance is obtained using a hybrid method, which combines the MLCTP generated hierarchical topological structures with CLHBC generated event labels. We also show that the methods proposed are generalizable to other rule-based environments including human driving behavior and human actions.
Collapse
|