1
|
Peng J, Zhu X, Wang Y, An L, Shen D. Structured sparsity regularized multiple kernel learning for Alzheimer's disease diagnosis. PATTERN RECOGNITION 2019; 88:370-382. [PMID: 30872866 PMCID: PMC6410562 DOI: 10.1016/j.patcog.2018.11.027] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Multimodal data fusion has shown great advantages in uncovering information that could be overlooked by using single modality. In this paper, we consider the integration of high-dimensional multi-modality imaging and genetic data for Alzheimer's disease (AD) diagnosis. With a focus on taking advantage of both phenotype and genotype information, a novel structured sparsity, defined by ℓ 1, p-norm (p > 1), regularized multiple kernel learning method is designed. Specifically, to facilitate structured feature selection and fusion from heterogeneous modalities and also capture feature-wise importance, we represent each feature with a distinct kernel as a basis, followed by grouping the kernels according to modalities. Then, an optimally combined kernel presentation of multimodal features is learned in a data-driven approach. Contrary to the Group Lasso (i.e., ℓ 2, 1-norm penalty) which performs sparse group selection, the proposed regularizer enforced on kernel weights is to sparsely select concise feature set within each homogenous group and fuse the heterogeneous feature groups by taking advantage of dense norms. We have evaluated our method using data of subjects from Alzheimer's Disease Neuroimaging Initiative (ADNI) database. The effectiveness of the method is demonstrated by the clearly improved prediction diagnosis and also the discovered brain regions and SNPs relevant to AD.
Collapse
Affiliation(s)
- Jialin Peng
- College of Computer Science and Technology, Huaqiao University, Xiamen, China
- Xiamen Key Laboratory of CVPR, Huaqiao University, Xiamen, China
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Xiaofeng Zhu
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Ye Wang
- College of Computer Science and Technology, Huaqiao University, Xiamen, China
| | - Le An
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Dinggang Shen
- Department of Radiology and BRIC, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Brain and Cognitive Engineering, Korea University, Seoul, Korea
| |
Collapse
|
2
|
Weak Fault Detection of Tapered Rolling Bearing Based on Penalty Regularization Approach. ALGORITHMS 2018. [DOI: 10.3390/a11110184] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Aimed at the issue of estimating the fault component from a noisy observation, a novel detection approach based on augmented Huber non-convex penalty regularization (AHNPR) is proposed. The core objectives of the proposed method are that (1) it estimates non-zero singular values (i.e., fault component) accurately and (2) it maintains the convexity of the proposed objective cost function (OCF) by restricting the parameters of the non-convex regularization. Specifically, the AHNPR model is expressed as the L1-norm minus a generalized Huber function, which avoids the underestimation weakness of the L1-norm regularization. Furthermore, the convexity of the proposed OCF is proved via the non-diagonal characteristic of the matrix BTB, meanwhile, the non-zero singular values of the OCF is solved by the forward–backward splitting (FBS) algorithm. Last, the proposed method is validated by the simulated signal and vibration signals of tapered bearing. The results demonstrate that the proposed approach can identify weak fault information from the raw vibration signal under severe background noise, that the non-convex penalty regularization can induce sparsity of the singular values more effectively than the typical convex penalty (e.g., L1-norm fused lasso optimization (LFLO) method), and that the issue of underestimating sparse coefficients can be improved.
Collapse
|
3
|
Xu YL, Li XX, Chen DR, Li HX. Learning Rates of Regularized Regression With Multiple Gaussian Kernels for Multi-Task Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5408-5418. [PMID: 29994740 DOI: 10.1109/tnnls.2018.2802469] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This paper considers a least square regularized regression algorithm for multi-task learning in a union of reproducing kernel Hilbert spaces (RKHSs) with Gaussian kernels. It is assumed that the optimal prediction function of the target task and those of related tasks are in an RKHS with the same but with unknown Gaussian kernel width. The samples for related tasks are used to select the Gaussian kernel width, and the sample for the target task is used to obtain the prediction function in the RKHS with this selected width. With an error decomposition result, a fast learning rate is obtained for the target task. The key step is to estimate the sample errors of related tasks in the union of RKHSs with Gaussian kernels. The utility of this algorithm is illustrated with one simulated data set and four real data sets. The experiment results illustrate that the underlying algorithm can result in significant improvements in prediction error when few samples of the target task and more samples of related tasks are available.
Collapse
|
4
|
A joint matrix minimization approach for seismic wavefield recovery. Sci Rep 2018; 8:2188. [PMID: 29391463 PMCID: PMC5795022 DOI: 10.1038/s41598-018-20556-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 01/17/2018] [Indexed: 12/04/2022] Open
Abstract
Reconstruction of the seismic wavefield from sub-sampled data is important and necessary in seismic image processing; this is partly due to limitations of the observations which usually yield incomplete data. To make the best of the observed seismic signals, we propose a joint matrix minimization model to recover the seismic wavefield. Employing matrix instead of vector as weight variable can express all the sub-sampled traces simultaneously. This scheme utilizes the collective representation rather than an individual one to recover a given set of sub-samples. The matrix model takes the interrelation of the multiple observations into account to facilitate recovery, for example, the similarity of the same seismic trace and distinctions of different ones. Hence an l2, p(0 < p ≤ 1)-regularized joint matrix minimization is formulated which has some computational challenges especially when p is in (0, 1). For solving the involved matrix optimization problem, a unified algorithm is developed and the convergence analysis is accordingly demonstrated for a range of parameters. Numerical experiments on synthetic and field data examples exhibit the efficient performance of the joint technique. Both reconstruction accuracy and computational cost indicate that the new strategy achieves good performance in seismic wavefield recovery and has potential for practical applications.
Collapse
|
5
|
Nie L, Zhang L, Meng L, Song X, Chang X, Li X. Modeling Disease Progression via Multisource Multitask Learners: A Case Study With Alzheimer's Disease. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1508-1519. [PMID: 26929064 DOI: 10.1109/tnnls.2016.2520964] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Understanding the progression of chronic diseases can empower the sufferers in taking proactive care. To predict the disease status in the future time points, various machine learning approaches have been proposed. However, a few of them jointly consider the dual heterogeneities of chronic disease progression. In particular, the predicting task at each time point has features from multiple sources, and multiple tasks are related to each other in chronological order. To tackle this problem, we propose a novel and unified scheme to coregularize the prior knowledge of source consistency and temporal smoothness. We theoretically prove that our proposed model is a linear model. Before training our model, we adopt the matrix factorization approach to address the data missing problem. Extensive evaluations on real-world Alzheimer's disease data set have demonstrated the effectiveness and efficiency of our model. It is worth mentioning that our model is generally applicable to a rich range of chronic diseases.
Collapse
|
6
|
Pan S, Wu J, Zhu X, Long G, Zhang C. Task Sensitive Feature Exploration and Learning for Multitask Graph Classification. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:744-758. [PMID: 26978839 DOI: 10.1109/tcyb.2016.2526058] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Multitask learning (MTL) is commonly used for jointly optimizing multiple learning tasks. To date, all existing MTL methods have been designed for tasks with feature-vector represented instances, but cannot be applied to structure data, such as graphs. More importantly, when carrying out MTL, existing methods mainly focus on exploring overall commonality or disparity between tasks for learning, but cannot explicitly capture task relationships in the feature space, so they are unable to answer important questions, such as what exactly is shared between tasks and what is the uniqueness of one task differing from others? In this paper, we formulate a new multitask graph learning problem, and propose a task sensitive feature exploration and learning algorithm for multitask graph classification. Because graphs do not have features available, we advocate a task sensitive feature exploration and learning paradigm to jointly discover discriminative subgraph features across different tasks. In addition, a feature learning process is carried out to categorize each subgraph feature into one of three categories: (1) common feature; (2) task auxiliary feature; and (3) task specific feature, indicating whether the feature is shared by all tasks, by a subset of tasks, or by only one specific task, respectively. The feature learning and the multiple task learning are iteratively optimized to form a multitask graph classification model with a global optimization goal. Experiments on real-world functional brain analysis and chemical compound categorization demonstrate the algorithm's performance. Results confirm that our method can be used to explicitly capture task correlations and uniqueness in the feature space, and explicitly answer what are shared between tasks and what is the uniqueness of a specific task.
Collapse
|
7
|
Cao P, Liu X, Zhang J, Li W, Zhao D, Huang M, Zaiane O. A ℓ 2, 1 norm regularized multi-kernel learning for false positive reduction in Lung nodule CAD. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 140:211-231. [PMID: 28254078 DOI: 10.1016/j.cmpb.2016.12.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 11/25/2016] [Accepted: 12/12/2016] [Indexed: 06/06/2023]
Abstract
OBJECTIVE The aim of this paper is to describe a novel algorithm for False Positive Reduction in lung nodule Computer Aided Detection(CAD). METHODS In this paper, we describes a new CT lung CAD method which aims to detect solid nodules. Specially, we proposed a multi-kernel classifier with a ℓ2, 1 norm regularizer for heterogeneous feature fusion and selection from the feature subset level, and designed two efficient strategies to optimize the parameters of kernel weights in non-smooth ℓ2, 1 regularized multiple kernel learning algorithm. The first optimization algorithm adapts a proximal gradient method for solving the ℓ2, 1 norm of kernel weights, and use an accelerated method based on FISTA; the second one employs an iterative scheme based on an approximate gradient descent method. RESULTS The results demonstrates that the FISTA-style accelerated proximal descent method is efficient for the ℓ2, 1 norm formulation of multiple kernel learning with the theoretical guarantee of the convergence rate. Moreover, the experimental results demonstrate the effectiveness of the proposed methods in terms of Geometric mean (G-mean) and Area under the ROC curve (AUC), and significantly outperforms the competing methods. CONCLUSIONS The proposed approach exhibits some remarkable advantages both in heterogeneous feature subsets fusion and classification phases. Compared with the fusion strategies of feature-level and decision level, the proposed ℓ2, 1 norm multi-kernel learning algorithm is able to accurately fuse the complementary and heterogeneous feature sets, and automatically prune the irrelevant and redundant feature subsets to form a more discriminative feature set, leading a promising classification performance. Moreover, the proposed algorithm consistently outperforms the comparable classification approaches in the literature.
Collapse
Affiliation(s)
- Peng Cao
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Medical Image Computing of Ministry of Education, Northeastern University, Shenyang, China.
| | - Xiaoli Liu
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Medical Image Computing of Ministry of Education, Northeastern University, Shenyang, China
| | - Jian Zhang
- School of Computer & Software, Nanjing University of Information Science & Technology, Nanjing, China
| | - Wei Li
- Key Laboratory of Medical Image Computing of Ministry of Education, Northeastern University, Shenyang, China
| | - Dazhe Zhao
- Computer Science and Engineering, Northeastern University, Shenyang, China; Key Laboratory of Medical Image Computing of Ministry of Education, Northeastern University, Shenyang, China
| | - Min Huang
- Information Science and Engineering, Northeastern University, Shenyang, China
| | - Osmar Zaiane
- Computing Science, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
8
|
Shahroudy A, Ng TT, Yang Q, Wang G. Multimodal Multipart Learning for Action Recognition in Depth Videos. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:2123-2129. [PMID: 26660700 DOI: 10.1109/tpami.2015.2505295] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The articulated and complex nature of human actions makes the task of action recognition difficult. One approach to handle this complexity is dividing it to the kinetics of body parts and analyzing the actions based on these partial descriptors. We propose a joint sparse regression based learning method which utilizes the structured sparsity to model each action as a combination of multimodal features from a sparse set of body parts. To represent dynamics and appearance of parts, we employ a heterogeneous set of depth and skeleton based features. The proper structure of multimodal multipart features are formulated into the learning framework via the proposed hierarchical mixed norm, to regularize the structured features of each part and to apply sparsity between them, in favor of a group feature selection. Our experimental results expose the effectiveness of the proposed learning method in which it outperforms other methods in all three tested datasets while saturating one of them by achieving perfect accuracy.
Collapse
|
9
|
Cao F, Cai M, Tan Y, Zhao J. Image Super-Resolution via Adaptive Regularization and Sparse Representation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:1550-1561. [PMID: 26766382 DOI: 10.1109/tnnls.2015.2512563] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Previous studies have shown that image patches can be well represented as a sparse linear combination of elements from an appropriately selected over-complete dictionary. Recently, single-image super-resolution (SISR) via sparse representation using blurred and downsampled low-resolution images has attracted increasing interest, where the aim is to obtain the coefficients for sparse representation by solving an l0 or l1 norm optimization problem. The l0 optimization is a nonconvex and NP-hard problem, while the l1 optimization usually requires many more measurements and presents new challenges even when the image is the usual size, so we propose a new approach for SISR recovery based on regularization nonconvex optimization. The proposed approach is potentially a powerful method for recovering SISR via sparse representations, and it can yield a sparser solution than the l1 regularization method. We also consider the best choice for lp regularization with all p in (0, 1), where we propose a scheme that adaptively selects the norm value for each image patch. In addition, we provide a method for estimating the best value of the regularization parameter λ adaptively, and we discuss an alternate iteration method for selecting p and λ . We perform experiments, which demonstrates that the proposed regularization nonconvex optimization method can outperform the convex optimization method and generate higher quality images.
Collapse
|
10
|
Lai Z, Wong WK, Xu Y, Yang J, Zhang D. Approximate Orthogonal Sparse Embedding for Dimensionality Reduction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:723-735. [PMID: 25955995 DOI: 10.1109/tnnls.2015.2422994] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Locally linear embedding (LLE) is one of the most well-known manifold learning methods. As the representative linear extension of LLE, orthogonal neighborhood preserving projection (ONPP) has attracted widespread attention in the field of dimensionality reduction. In this paper, a unified sparse learning framework is proposed by introducing the sparsity or L1-norm learning, which further extends the LLE-based methods to sparse cases. Theoretical connections between the ONPP and the proposed sparse linear embedding are discovered. The optimal sparse embeddings derived from the proposed framework can be computed by iterating the modified elastic net and singular value decomposition. We also show that the proposed model can be viewed as a general model for sparse linear and nonlinear (kernel) subspace learning. Based on this general model, sparse kernel embedding is also proposed for nonlinear sparse feature extraction. Extensive experiments on five databases demonstrate that the proposed sparse learning framework performs better than the existing subspace learning algorithm, particularly in the cases of small sample sizes.
Collapse
|
11
|
Rakotomamonjy A, Flamary R, Gasso G. DC Proximal Newton for Nonconvex Optimization Problems. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:636-647. [PMID: 25910256 DOI: 10.1109/tnnls.2015.2418224] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We introduce a novel algorithm for solving learning problems where both the loss function and the regularizer are nonconvex but belong to the class of difference of convex (DC) functions. Our contribution is a new general purpose proximal Newton algorithm that is able to deal with such a situation. The algorithm consists in obtaining a descent direction from an approximation of the loss function and then in performing a line search to ensure a sufficient descent. A theoretical analysis is provided showing that the iterates of the proposed algorithm admit as limit points stationary points of the DC objective function. Numerical experiments show that our approach is more efficient than the current state of the art for a problem with a convex loss function and a nonconvex regularizer. We have also illustrated the benefit of our algorithm in high-dimensional transductive learning problem where both the loss function and regularizers are nonconvex.
Collapse
|
12
|
Zhou Q, Zhao Q. Flexible Clustered Multi-Task Learning by Learning Representative Tasks. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:266-278. [PMID: 26761733 DOI: 10.1109/tpami.2015.2452911] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Multi-task learning (MTL) methods have shown promising performance by learning multiple relevant tasks simultaneously, which exploits to share useful information across relevant tasks. Among various MTL methods, clustered multi-task learning (CMTL) assumes that all tasks can be clustered into groups and attempts to learn the underlying cluster structure from the training data. In this paper, we present a new approach for CMTL, called flexible clustered multi-task (FCMTL), in which the cluster structure is learned by identifying representative tasks. The new approach allows an arbitrary task to be described by multiple representative tasks, effectively soft-assigning a task to multiple clusters with different weights. Unlike existing counterpart, the proposed approach is more flexible in that (a) it does not require clusters to be disjoint, (b) tasks within one particular cluster do not have to share information to the same extent, and (c) the number of clusters is automatically inferred from data. Computationally, the proposed approach is formulated as a row-sparsity pursuit problem. We validate the proposed FCMTL on both synthetic and real-world data sets, and empirical results demonstrate that it outperforms many existing MTL methods.
Collapse
|
13
|
Li C, Georgiopoulos M, Anagnostopoulos GC. Multitask Classification Hypothesis Space With Improved Generalization Bounds. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:1468-1479. [PMID: 25167558 DOI: 10.1109/tnnls.2014.2347054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
This paper presents a pair of hypothesis spaces (HSs) of vector-valued functions intended to be used in the context of multitask classification. While both are parameterized on the elements of reproducing kernel Hilbert spaces and impose a feature mapping that is common to all tasks, one of them assumes this mapping as fixed, while the more general one learns the mapping via multiple kernel learning. For these new HSs, empirical Rademacher complexity-based generalization bounds are derived, and are shown to be tighter than the bound of a particular HS, which has appeared recently in the literature, leading to improved performance. As a matter of fact, the latter HS is shown to be a special case of ours. Based on an equivalence to Group-Lasso type HSs, the proposed HSs are utilized toward corresponding support vector machine-based formulations. Finally, experimental results on multitask learning problems underline the quality of the derived bounds and validate this paper's analysis.
Collapse
|
14
|
Li C, Georgiopoulos M, Anagnostopoulos GC. Pareto-path multitask multiple kernel learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:51-61. [PMID: 25532155 DOI: 10.1109/tnnls.2014.2309939] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
A traditional and intuitively appealing Multitask Multiple Kernel Learning (MT-MKL) method is to optimize the sum (thus, the average) of objective functions with (partially) shared kernel function, which allows information sharing among the tasks. We point out that the obtained solution corresponds to a single point on the Pareto Front (PF) of a multiobjective optimization problem, which considers the concurrent optimization of all task objectives involved in the Multitask Learning (MTL) problem. Motivated by this last observation and arguing that the former approach is heuristic, we propose a novel support vector machine MT-MKL framework that considers an implicitly defined set of conic combinations of task objectives. We show that solving our framework produces solutions along a path on the aforementioned PF and that it subsumes the optimization of the average of objective functions as a special case. Using the algorithms we derived, we demonstrate through a series of experimental results that the framework is capable of achieving a better classification performance, when compared with other similar MTL approaches.
Collapse
|
15
|
Kandemir M, Vetek A, Gönen M, Klami A, Kaski S. Multi-task and multi-view learning of user state. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2014.02.057] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Feature selection in machine learning: an exact penalty approach using a Difference of Convex function Algorithm. Mach Learn 2014. [DOI: 10.1007/s10994-014-5455-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Mixed-norm regularization for brain decoding. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2014:317056. [PMID: 24860614 PMCID: PMC4016929 DOI: 10.1155/2014/317056] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/09/2014] [Accepted: 03/13/2014] [Indexed: 11/23/2022]
Abstract
This work investigates the use of mixed-norm regularization for sensor selection in event-related potential (ERP) based brain-computer interfaces (BCI). The classification problem is cast as a discriminative optimization framework where sensor selection is induced through the use of mixed-norms. This framework is extended to the multitask learning situation where several similar classification tasks related to different subjects are learned simultaneously. In this case, multitask learning helps in leveraging data scarcity issue yielding to more robust classifiers. For this purpose, we have introduced a regularizer that induces both sensor selection and classifier similarities. The different regularization approaches are compared on three ERP datasets showing the interest of mixed-norm regularization in terms of sensor selection. The multitask approaches are evaluated when a small number of learning examples are available yielding to significant performance improvements especially for subjects performing poorly.
Collapse
|
18
|
Mao Q, Tsang IWH. Efficient multitemplate learning for structured prediction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2013; 24:248-261. [PMID: 24808279 DOI: 10.1109/tnnls.2012.2228228] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Conditional random fields (CRF) and structural support vector machines (structural SVM) are two state-of-the-art methods for structured prediction that captures the interdependencies among output variables. The success of these methods is attributed to the fact that their discriminative models are able to account for overlapping features on all input observations. These features are usually generated by applying a given set of templates on labeled data, but improper templates may lead to degraded performance. To alleviate this issue, in this paper we propose a novel multiple template learning paradigm to learn structured prediction and the importance of each template simultaneously, so that hundreds of arbitrary templates could be added into the learning model without caution. This paradigm can be formulated as a special multiple kernel learning problem with an exponential number of constraints. Then we introduce an efficient cutting-plane algorithm to solve this problem in the primal and present its convergence. We also evaluate the proposed learning paradigm on two widely studied structured prediction tasks, i.e., sequence labeling and dependency parsing. Extensive experimental results show that the proposed method outperforms CRFs and structural SVMs because of exploiting the importance of each template. Complexity analysis and empirical results also show that the proposed method is more efficient than Online multikernel learning on very sparse and high-dimensional data. We further extend this paradigm for structured prediction using generalized p-block norm regularization with p >; 1, and experiments show competitive performances when p ∈ [1,2).
Collapse
|
19
|
|
20
|
Zhong LW, Kwok JT. Efficient sparse modeling with automatic feature grouping. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:1436-1447. [PMID: 24807927 DOI: 10.1109/tnnls.2012.2200262] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
For high-dimensional data, it is often desirable to group similar features together during the learning process. This can reduce the estimation variance and improve the stability of feature selection, leading to better generalization. Moreover, it can also help in understanding and interpreting data. Octagonal shrinkage and clustering algorithm for regression (OSCAR) is a recent sparse-modeling approach that uses a l1 -regularizer and a pairwise l∞-regularizer on the feature coefficients to encourage such feature grouping. However, computationally, its optimization procedure is very expensive. In this paper, we propose an efficient solver based on the accelerated gradient method. We show that its key proximal step can be solved by a highly efficient simple iterative group merging algorithm. Given d input features, this reduces the empirical time complexity from O(d(2) ~ d(5)) for the existing solvers to just O(d). Experimental results on a number of toy and real-world datasets demonstrate that OSCAR is a competitive sparse-modeling approach, but with the added ability of automatic feature grouping.
Collapse
|
21
|
Xu Z, Chang X, Xu F, Zhang H. L1/2 regularization: a thresholding representation theory and a fast solver. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:1013-1027. [PMID: 24807129 DOI: 10.1109/tnnls.2012.2197412] [Citation(s) in RCA: 193] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The special importance of L1/2 regularization has been recognized in recent studies on sparse modeling (particularly on compressed sensing). The L1/2 regularization, however, leads to a nonconvex, nonsmooth, and non-Lipschitz optimization problem that is difficult to solve fast and efficiently. In this paper, through developing a threshoding representation theory for L1/2 regularization, we propose an iterative half thresholding algorithm for fast solution of L1/2 regularization, corresponding to the well-known iterative soft thresholding algorithm for L1 regularization, and the iterative hard thresholding algorithm for L0 regularization. We prove the existence of the resolvent of gradient of ||x||1/2(1/2), calculate its analytic expression, and establish an alternative feature theorem on solutions of L1/2 regularization, based on which a thresholding representation of solutions of L1/2 regularization is derived and an optimal regularization parameter setting rule is formulated. The developed theory provides a successful practice of extension of the well- known Moreau's proximity forward-backward splitting theory to the L1/2 regularization case. We verify the convergence of the iterative half thresholding algorithm and provide a series of experiments to assess performance of the algorithm. The experiments show that the half algorithm is effective, efficient, and can be accepted as a fast solver for L1/2 regularization. With the new algorithm, we conduct a phase diagram study to further demonstrate the superiority of L1/2 regularization over L1 regularization.
Collapse
|
22
|
Fan W, Bouguila N, Ziou D. Variational learning for finite Dirichlet mixture models and applications. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:762-774. [PMID: 24806125 DOI: 10.1109/tnnls.2012.2190298] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this paper, we focus on the variational learning of finite Dirichlet mixture models. Compared to other algorithms that are commonly used for mixture models (such as expectation-maximization), our approach has several advantages: first, the problem of over-fitting is prevented; furthermore, the complexity of the mixture model (i.e., the number of components) can be determined automatically and simultaneously with the parameters estimation as part of the Bayesian inference procedure; finally, since the whole inference process is analytically tractable with closed-form solutions, it may scale well to large applications. Both synthetic and real data, generated from real-life challenging applications namely image databases categorization and anomaly intrusion detection, are experimented to verify the effectiveness of the proposed approach.
Collapse
|
23
|
Li H, Chen N, Li L. Error analysis for matrix elastic-net regularization algorithms. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2012; 23:737-748. [PMID: 24806123 DOI: 10.1109/tnnls.2012.2188906] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Elastic-net regularization is a successful approach in statistical modeling. It can avoid large variations which occur in estimating complex models. In this paper, elastic-net regularization is extended to a more general setting, the matrix recovery (matrix completion) setting. Based on a combination of the nuclear-norm minimization and the Frobenius-norm minimization, we consider the matrix elastic-net (MEN) regularization algorithm, which is an analog to the elastic-net regularization scheme from compressive sensing. Some properties of the estimator are characterized by the singular value shrinkage operator. We estimate the error bounds of the MEN regularization algorithm in the framework of statistical learning theory. We compute the learning rate by estimates of the Hilbert-Schmidt operators. In addition, an adaptive scheme for selecting the regularization parameter is presented. Numerical experiments demonstrate the superiority of the MEN regularization algorithm.
Collapse
|