1
|
Zhong G, Xiao Y, Liu B, Zhao L, Kong X. Ordinal Regression With Pinball Loss. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11246-11260. [PMID: 37030787 DOI: 10.1109/tnnls.2023.3258464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Ordinal regression (OR) aims to solve multiclass classification problems with ordinal classes. Support vector OR (SVOR) is a typical OR algorithm and has been extensively used in OR problems. In this article, based on the characteristics of OR problems, we propose a novel pinball loss function and present an SVOR method with pinball loss (pin-SVOR). Pin-SVOR is fundamentally different from traditional SVOR with hinge loss. Traditional SVOR employs the hinge loss function, and the classifier is determined by only a few data points near the class boundary, called support vectors, which may lead to a noise sensitive and re-sampling unstable classifier. Distinctively, pin-SVOR employs the pinball loss function. It attaches an extra penalty to correctly classified data that lies inside the class, such that all the training data is involved in deciding the classifier. The data near the middle of each class has a small penalty, and that near the class boundary has a large penalty. Thus, the training data tend to lie near the middle of each class instead of on the class boundary, which leads to scatter minimization in the middle of each class and noise insensitivity. The experimental results show that pin-SVOR has better classification performance than state-of-the-art OR methods.
Collapse
|
2
|
Zhang Y, Gao X, He L, Lu W, He R. Objective Video Quality Assessment Combining Transfer Learning With CNN. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2716-2730. [PMID: 30736007 DOI: 10.1109/tnnls.2018.2890310] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Nowadays, video quality assessment (VQA) is essential to video compression technology applied to video transmission and storage. However, small-scale video quality databases with imbalanced samples and low-level feature representations for distorted videos impede the development of VQA methods. In this paper, we propose a full-reference (FR) VQA metric integrating transfer learning with a convolutional neural network (CNN). First, we imitate the feature-based transfer learning framework to transfer the distorted images as the related domain, which enriches the distorted samples. Second, to extract high-level spatiotemporal features of the distorted videos, a six-layer CNN with the acknowledged learning ability is pretrained and finetuned by the common features of the distorted image blocks (IBs) and video blocks (VBs), respectively. Notably, the labels of the distorted IBs and VBs are predicted by the classic FR metrics. Finally, based on saliency maps and the entropy function, we conduct a pooling stage to obtain the quality scores of the distorted videos by weighting the block-level scores predicted by the trained CNN. In particular, we introduce a preprocessing and a postprocessing to reduce the impact of inaccurate labels predicted by the FR-VQA metric. Due to feature learning in the proposed framework, two kinds of experimental schemes including train-test iterative procedures on one database and tests on one database with training other databases are carried out. The experimental results demonstrate that the proposed method has high expansibility and is on a par with some state-of-the-art VQA metrics on two widely used VQA databases with various compression distortions.
Collapse
|
3
|
Wang Z, Du B, Guo Y. Domain Adaptation With Neural Embedding Matching. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2387-2397. [PMID: 31536022 DOI: 10.1109/tnnls.2019.2935608] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Domain adaptation aims to exploit the supervision knowledge in a source domain for learning prediction models in a target domain. In this article, we propose a novel representation learning-based domain adaptation method, i.e., neural embedding matching (NEM) method, to transfer information from the source domain to the target domain where labeled data is scarce. The proposed approach induces an intermediate common representation space for both domains with a neural network model while matching the embedding of data from the two domains in this common representation space. The embedding matching is based on the fundamental assumptions that a cross-domain pair of instances will be close to each other in the embedding space if they belong to the same class category, and the local geometry property of the data can be maintained in the embedding space. The assumptions are encoded via objectives of metric learning and graph embedding techniques to regularize and learn the semisupervised neural embedding model. We also provide a generalization bound analysis for the proposed domain adaptation method. Meanwhile, a progressive learning strategy is proposed and it improves the generalization ability of the neural network gradually. Experiments are conducted on a number of benchmark data sets and the results demonstrate that the proposed method outperforms several state-of-the-art domain adaptation methods and the progressive learning strategy is promising.
Collapse
|
4
|
Zhang Y, Cheung YM, Tan KC. A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:39-52. [PMID: 30908240 DOI: 10.1109/tnnls.2019.2899381] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Ordinal data are common in many data mining and machine learning tasks. Compared to nominal data, the possible values (also called categories interchangeably) of an ordinal attribute are naturally ordered. Nevertheless, since the data values are not quantitative, the distance between two categories of an ordinal attribute is generally not well defined, which surely has a serious impact on the result of the quantitative analysis if an inappropriate distance metric is utilized. From the practical perspective, ordinal-and-nominal-attribute categorical data, i.e., categorical data associated with a mixture of nominal and ordinal attributes, is common, but the distance metric for such data has yet to be well explored in the literature. In this paper, within the framework of clustering analysis, we therefore first propose an entropy-based distance metric for ordinal attributes, which exploits the underlying order information among categories of an ordinal attribute for the distance measurement. Then, we generalize this distance metric and propose a unified one accordingly, which is applicable to ordinal-and-nominal-attribute categorical data. Compared with the existing metrics proposed for categorical data, the proposed metric is simple to use and nonparametric. More importantly, it reasonably exploits the underlying order information of ordinal attributes and statistical information of nominal attributes for distance measurement. Extensive experiments show that the proposed metric outperforms the existing counterparts on both the real and benchmark data sets.
Collapse
|
5
|
Zhang X, Zhuang Y, Wang W, Pedrycz W. Online Feature Transformation Learning for Cross-Domain Object Category Recognition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2857-2871. [PMID: 28613184 DOI: 10.1109/tnnls.2017.2705113] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In this paper, we introduce a new research problem termed online feature transformation learning in the context of multiclass object category recognition. The learning of a feature transformation is viewed as learning a global similarity metric function in an online manner. We first consider the problem of online learning a feature transformation matrix expressed in the original feature space and propose an online passive aggressive feature transformation algorithm. Then these original features are mapped to kernel space and an online single kernel feature transformation (OSKFT) algorithm is developed to learn a nonlinear feature transformation. Based on the OSKFT and the existing Hedge algorithm, a novel online multiple kernel feature transformation algorithm is also proposed, which can further improve the performance of online feature transformation learning in large-scale application. The classifier is trained with k nearest neighbor algorithm together with the learned similarity metric function. Finally, we experimentally examined the effect of setting different parameter values in the proposed algorithms and evaluate the model performance on several multiclass object recognition data sets. The experimental results demonstrate the validity and good performance of our methods on cross-domain and multiclass object recognition application.
Collapse
|
6
|
Li S, Song S, Huang G. Prediction Reweighting for Domain Adaptation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1682-1695. [PMID: 28113917 DOI: 10.1109/tnnls.2016.2538282] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
There are plenty of classification methods that perform well when training and testing data are drawn from the same distribution. However, in real applications, this condition may be violated, which causes degradation of classification accuracy. Domain adaptation is an effective approach to address this problem. In this paper, we propose a general domain adaptation framework from the perspective of prediction reweighting, from which a novel approach is derived. Different from the major domain adaptation methods, our idea is to reweight predictions of the training classifier on testing data according to their signed distance to the domain separator, which is a classifier that distinguishes training data (from source domain) and testing data (from target domain). We then propagate the labels of target instances with larger weights to ones with smaller weights by introducing a manifold regularization method. It can be proved that our reweighting scheme effectively brings the source and target domains closer to each other in an appropriate sense, such that classification in target domain becomes easier. The proposed method can be implemented efficiently by a simple two-stage algorithm, and the target classifier has a closed-form solution. The effectiveness of our approach is verified by the experiments on artificial datasets and two standard benchmarks, a visual object recognition task and a cross-domain sentiment analysis of text. Experimental results demonstrate that our method is competitive with the state-of-the-art domain adaptation algorithms.
Collapse
|
7
|
Yang L, Jing L, Yu J, Ng MK. Learning Transferred Weights From Co-Occurrence Data for Heterogeneous Transfer Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:2187-2200. [PMID: 26353384 DOI: 10.1109/tnnls.2015.2472457] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
One of the main research problems in heterogeneous transfer learning is to determine whether a given source domain is effective in transferring knowledge to a target domain, and then to determine how much of the knowledge should be transferred from a source domain to a target domain. The main objective of this paper is to solve this problem by evaluating the relatedness among given domains through transferred weights. We propose a novel method to learn such transferred weights with the aid of co-occurrence data, which contain the same set of instances but in different feature spaces. Because instances with the same category should have similar features, our method is to compute their principal components in each feature space such that co-occurrence data can be rerepresented by these principal components. The principal component coefficients from different feature spaces for the same instance in the co-occurrence data have the same order of significance for describing the category information. By using these principal component coefficients, the Markov Chain Monte Carlo method is employed to construct a directed cyclic network where each node is a domain and each edge weight is the conditional dependence from one domain to another domain. Here, the edge weight of the network can be employed as the transferred weight from a source domain to a target domain. The weight values can be taken as a prior for setting parameters in the existing heterogeneous transfer learning methods to control the amount of knowledge transferred from a source domain to a target domain. The experimental results on synthetic and real-world data sets are reported to illustrate the effectiveness of the proposed method that can capture strong or weak relations among feature spaces, and enhance the learning performance of heterogeneous transfer learning.
Collapse
|
8
|
Zhou Y, Hospedales TM, Fenton N. When and Where to Transfer for Bayes Net Parameter Learning. EXPERT SYSTEMS WITH APPLICATIONS 2016; 55:361-373. [PMID: 27375345 PMCID: PMC4924610 DOI: 10.1016/j.eswa.2016.02.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Learning Bayesian networks from scarce data is a major challenge in real-world applications where data are hard to acquire. Transfer learning techniques attempt to address this by leveraging data from different but related problems. For example, it may be possible to exploit medical diagnosis data from a different country. A challenge with this approach is heterogeneous relatedness to the target, both within and across source networks. In this paper we introduce the Bayesian network parameter transfer learning (BNPTL) algorithm to reason about both network and fragment (sub-graph) relatedness. BNPTL addresses (i) how to find the most relevant source network and network fragments to transfer, and (ii) how to fuse source and target parameters in a robust way. In addition to improving target task performance, explicit reasoning allows us to diagnose network and fragment relatedness across BNs, even if latent variables are present, or if their state space is heterogeneous. This is important in some applications where relatedness itself is an output of interest. Experimental results demonstrate the superiority of BNPTL at various scarcities and source relevance levels compared to single task learning and other state-of-the-art parameter transfer methods. Moreover, we demonstrate successful application to real-world medical case studies.
Collapse
Affiliation(s)
- Yun Zhou
- Risk and Information Management (RIM) Research Group, Queen Mary University of London
- Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology
| | - Timothy M. Hospedales
- Risk and Information Management (RIM) Research Group, Queen Mary University of London
| | - Norman Fenton
- Risk and Information Management (RIM) Research Group, Queen Mary University of London
| |
Collapse
|
9
|
Xiao Y, Liu B, Hao Z. A Maximum Margin Approach for Semisupervised Ordinal Regression Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:1003-1019. [PMID: 26151945 DOI: 10.1109/tnnls.2015.2434960] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Ordinal regression (OR) is generally defined as the task where the input samples are ranked on an ordinal scale. OR has found a wide variety of applications, and a great deal of work has been done on it. However, most of the existing work focuses on supervised/semisupervised OR classification, and the semisupervised OR clustering problems have not been explicitly addressed. In real-world OR applications, labeling a large number of training samples is usually time-consuming and costly, and instead, a set of unlabeled samples can be utilized to set up the OR model. Moreover, although the sample labels are unavailable, we can sometimes get the relative ranking information of the unlabeled samples. This sample ranking information can be utilized to refine the OR model. Hence, how to build an OR model on the unlabeled samples and incorporate the sample ranking information into the process of improving the clustering accuracy remains a key challenge for OR applications. In this paper, we consider the semisupervised OR clustering problems with sample-ranking constraints, which give the relative ranking information of the unlabeled samples, and put forward a maximum margin approach for semisupervised OR clustering ( [Formula: see text]SORC). On one hand, [Formula: see text]SORC seeks a set of parallel hyperplanes to partition the unlabeled samples into clusters. On the other hand, a loss function is put forward to incorporate the sample ranking information into the clustering process. As a result, the optimization function of [Formula: see text]SORC is formulated to maximize the margins of the closest neighboring clusters and meanwhile minimize the loss associated with the sample-ranking constraints. Extensive experiments on OR data sets show that the proposed [Formula: see text]SORC method outperforms the traditional semisupervised clustering methods considered.
Collapse
|
10
|
Deng WY, Bai Z, Huang GB, Zheng QH. A Fast SVD-Hidden-nodes based Extreme Learning Machine for Large-Scale Data Analytics. Neural Netw 2015; 77:14-28. [PMID: 26907860 DOI: 10.1016/j.neunet.2015.09.003] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2015] [Revised: 08/03/2015] [Accepted: 09/07/2015] [Indexed: 10/22/2022]
Abstract
Big dimensional data is a growing trend that is emerging in many real world contexts, extending from web mining, gene expression analysis, protein-protein interaction to high-frequency financial data. Nowadays, there is a growing consensus that the increasing dimensionality poses impeding effects on the performances of classifiers, which is termed as the "peaking phenomenon" in the field of machine intelligence. To address the issue, dimensionality reduction is commonly employed as a preprocessing step on the Big dimensional data before building the classifiers. In this paper, we propose an Extreme Learning Machine (ELM) approach for large-scale data analytic. In contrast to existing approaches, we embed hidden nodes that are designed using singular value decomposition (SVD) into the classical ELM. These SVD nodes in the hidden layer are shown to capture the underlying characteristics of the Big dimensional data well, exhibiting excellent generalization performances. The drawback of using SVD on the entire dataset, however, is the high computational complexity involved. To address this, a fast divide and conquer approximation scheme is introduced to maintain computational tractability on high volume data. The resultant algorithm proposed is labeled here as Fast Singular Value Decomposition-Hidden-nodes based Extreme Learning Machine or FSVD-H-ELM in short. In FSVD-H-ELM, instead of identifying the SVD hidden nodes directly from the entire dataset, SVD hidden nodes are derived from multiple random subsets of data sampled from the original dataset. Comprehensive experiments and comparisons are conducted to assess the FSVD-H-ELM against other state-of-the-art algorithms. The results obtained demonstrated the superior generalization performance and efficiency of the FSVD-H-ELM.
Collapse
Affiliation(s)
- Wan-Yu Deng
- School of Computer, Xian University of Posts & Telecommunications, Shaanxi, China; School of Computer Engineering, Nanyang Technological University, Singapore.
| | - Zuo Bai
- School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore
| | - Guang-Bin Huang
- School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore
| | - Qing-Hua Zheng
- Department of Computer Science and Technology, Xi'an Jiaotong University, China
| |
Collapse
|
11
|
|
12
|
Fernández-Navarro F, Riccardi A, Carloni S. Ordinal neural networks without iterative tuning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2014; 25:2075-2085. [PMID: 25330430 DOI: 10.1109/tnnls.2014.2304976] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Ordinal regression (OR) is an important branch of supervised learning in between the multiclass classification and regression. In this paper, the traditional classification scheme of neural network is adapted to learn ordinal ranks. The model proposed imposes monotonicity constraints on the weights connecting the hidden layer with the output layer. To do so, the weights are transcribed using padding variables. This reformulation leads to the so-called inequality constrained least squares (ICLS) problem. Its numerical solution can be obtained by several iterative methods, for example, trust region or line search algorithms. In this proposal, the optimum is determined analytically according to the closed-form solution of the ICLS problem estimated from the Karush-Kuhn-Tucker conditions. Furthermore, following the guidelines of the extreme learning machine framework, the weights connecting the input and the hidden layers are randomly generated, so the final model estimates all its parameters without iterative tuning. The model proposed achieves competitive performance compared with the state-of-the-art neural networks methods for OR.
Collapse
|