1
|
Zhang Z, Ren X, Xie J, Luo Y. A Novel Swarm Exploring Varying Parameter Recurrent Neural Network for Solving Non-Convex Nonlinear Programming. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:12642-12652. [PMID: 37040247 DOI: 10.1109/tnnls.2023.3263975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Aiming at solving non-convex nonlinear programming efficiently and accurately, a swarm exploring varying parameter recurrent neural network (SE-VPRNN) method is proposed in this article. First, the local optimal solutions are searched accurately by the proposed varying parameter recurrent neural network. After each network converges to the local optimal solutions, information is exchanged through a particle swarm optimization (PSO) framework to update the velocities and positions. The neural network searches for the local optimal solutions again from the updated position until all the neural networks are searched to the same local optimal solution. For improving the global searching ability, wavelet mutation is applied to increase the diversity of particles. Computer simulations show that the proposed method can solve the non-convex nonlinear programming effectively. Compared with three existing algorithms, the proposed method has advantages in accuracy and convergence time.
Collapse
|
2
|
Li Z, Zhao H, Guo Y, Yang Z, Xie S. Accelerated Log-Regularized Convolutional Transform Learning and Its Convergence Guarantee. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:10785-10799. [PMID: 33872171 DOI: 10.1109/tcyb.2021.3067352] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Convolutional transform learning (CTL), learning filters by minimizing the data fidelity loss function in an unsupervised way, is becoming very pervasive, resulting from keeping the best of both worlds: the benefit of unsupervised learning and the success of the convolutional neural network. There have been growing interests in developing efficient CTL algorithms. However, developing a convergent and accelerated CTL algorithm with accurate representations simultaneously with proper sparsity is an open problem. This article presents a new CTL framework with a log regularizer that can not only obtain accurate representations but also yield strong sparsity. To efficiently address our nonconvex composite optimization, we propose to employ the proximal difference of the convex algorithm (PDCA) which relies on decomposing the nonconvex regularizer into the difference of two convex parts and then optimizes the convex subproblems. Furthermore, we introduce the extrapolation technology to accelerate the algorithm, leading to a fast and efficient CTL algorithm. In particular, we provide a rigorous convergence analysis for the proposed algorithm under the accelerated PDCA. The experimental results demonstrate that the proposed algorithm can converge more stably to desirable solutions with lower approximation error and simultaneously with stronger sparsity and, thus, learn filters efficiently. Meanwhile, the convergence speed is faster than the existing CTL algorithms.
Collapse
|
3
|
Peng H, Wu J, Zhang Z, Chen S, Zhang HT. Deep Network Quantization via Error Compensation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:4960-4970. [PMID: 33852390 DOI: 10.1109/tnnls.2021.3064293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
For portable devices with limited resources, it is often difficult to deploy deep networks due to the prohibitive computational overhead. Numerous approaches have been proposed to quantize weights and/or activations to speed up the inference. Loss-aware quantization has been proposed to directly formulate the impact of weight quantization on the model's final loss. However, we discover that, under certain circumstances, such a method may not converge and end up oscillating. To tackle this issue, we introduce a novel loss-aware quantization algorithm to efficiently compress deep networks with low bit-width model weights. We provide a more accurate estimation of gradients by leveraging the Taylor expansion to compensate for the quantization error, which leads to better convergence behavior. Our theoretical analysis indicates that the gradient mismatch issue can be fixed by the newly introduced quantization error compensation term. Experimental results for both linear models and convolutional networks verify the effectiveness of our proposed method.
Collapse
|
4
|
Li Z, Wan C, Tan B, Yang Z, Xie S. A fast DC-based dictionary learning algorithm with the SCAD penalty. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
5
|
A novel dictionary learning method for sparse representation with nonconvex regularizations. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.07.085] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
6
|
Le Thi HA, Le HM, Phan DN, Tran B. Stochastic DCA for minimizing a large sum of DC functions with application to multi-class logistic regression. Neural Netw 2020; 132:220-231. [PMID: 32919312 DOI: 10.1016/j.neunet.2020.08.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Revised: 07/10/2020] [Accepted: 08/27/2020] [Indexed: 10/23/2022]
Abstract
We consider the large sum of DC (Difference of Convex) functions minimization problem which appear in several different areas, especially in stochastic optimization and machine learning. Two DCA (DC Algorithm) based algorithms are proposed: stochastic DCA and inexact stochastic DCA. We prove that the convergence of both algorithms to a critical point is guaranteed with probability one. Furthermore, we develop our stochastic DCA for solving an important problem in multi-task learning, namely group variables selection in multi class logistic regression. The corresponding stochastic DCA is very inexpensive, all computations are explicit. Numerical experiments on several benchmark datasets and synthetic datasets illustrate the efficiency of our algorithms and their superiority over existing methods, with respect to classification accuracy, sparsity of solution as well as running time.
Collapse
Affiliation(s)
- Hoai An Le Thi
- Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh City, Viet Nam; Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, Viet Nam; Université de Lorraine, LGIPM, F-57000 Metz, France.
| | - Hoai Minh Le
- Université de Lorraine, LGIPM, F-57000 Metz, France.
| | - Duy Nhat Phan
- Université de Lorraine, LGIPM, F-57000 Metz, France.
| | - Bach Tran
- Université de Lorraine, LGIPM, F-57000 Metz, France.
| |
Collapse
|
7
|
Yu J, Tang Q, Li Q, Guo H, He X. Hybrid reconstruction method for multispectral bioluminescence tomography with log-sum regularization. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA. A, OPTICS, IMAGE SCIENCE, AND VISION 2020; 37:1060-1066. [PMID: 32543609 DOI: 10.1364/josaa.386961] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 05/07/2020] [Indexed: 06/11/2023]
Abstract
Bioluminescence tomography (BLT) has important applications in the in vivo visualization of a pathological process for preclinical studies. However, the reconstruction of BLT is severely ill-posed. To recover the bioluminescence source stably and efficiently, we use a log-sum regularization term in the objective function and utilize a hybrid optimization algorithm for solving the nonconvex regularized problems (HONOR). The hybrid optimization scheme of HONOR merges second-order information and first-order information to reconstruction by choosing either the quasi-Newton (QN) or gradient descent step at each iteration. The QN step uses the limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm (L-BFGS) to acquire second-order information. Simulations and in vivo experiments based on multispectral measurements demonstrated the remarkable performance of the proposed hybrid method in the sparse reconstruction of BLT.
Collapse
|
8
|
|
9
|
Wang R, Xiu N, Zhang C. Greedy Projected Gradient-Newton Method for Sparse Logistic Regression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:527-538. [PMID: 30990444 DOI: 10.1109/tnnls.2019.2905261] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Sparse logistic regression (SLR), which is widely used for classification and feature selection in many fields, such as neural networks, deep learning, and bioinformatics, is the classical logistic regression model with sparsity constraints. In this paper, we perform theoretical analysis on the existence and uniqueness of the solution to the SLR, and we propose a greedy projected gradient-Newton (GPGN) method for solving the SLR. The GPGN method is a combination of the projected gradient method and the Newton method. The following characteristics show that the GPGN method achieves not only elegant theoretical results but also a remarkable numerical performance in solving the SLR: 1) the full iterative sequence generated by the GPGN method converges to a global/local minimizer of the SLR under weaker conditions; 2) the GPGN method has the properties of afinite identification for an optimal support set and local quadratic convergence; and 3) the GPGN method achieves higher accuracy and higher speed compared with a number of state-of-the-art solvers according to numerical experiments.
Collapse
|
10
|
Bartoš M, Rajmic P, Šorel M, Mangová M, Keunen O, Jiřík R. Spatially regularized estimation of the tissue homogeneity model parameters in DCE-MRI using proximal minimization. Magn Reson Med 2019; 82:2257-2272. [PMID: 31317577 DOI: 10.1002/mrm.27874] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 04/24/2019] [Accepted: 05/29/2019] [Indexed: 12/26/2022]
Abstract
PURPOSE The Tofts and the extended Tofts models are the pharmacokinetic models commonly used in dynamic contrast-enhanced MRI (DCE-MRI) perfusion analysis, although they do not provide two important biological markers, namely, the plasma flow and the permeability-surface area product. Estimates of such markers are possible using advanced pharmacokinetic models describing the vascular distribution phase, such as the tissue homogeneity model. However, the disadvantage of the advanced models lies in biased and uncertain estimates, especially when the estimates are computed voxelwise. The goal of this work is to improve the reliability of the estimates by including information from neighboring voxels. THEORY AND METHODS Information from the neighboring voxels is incorporated in the estimation process through spatial regularization in the form of total variation. The spatial regularization is applied on five maps of perfusion parameters estimated using the tissue homogeneity model. Since the total variation is not differentiable, two proximal techniques of convex optimization are used to solve the problem numerically. RESULTS The proposed algorithm helps to reduce noise in the estimated perfusion-parameter maps together with improving accuracy of the estimates. These conclusions are proved using a numerical phantom. In addition, experiments on real data show improved spatial consistency and readability of perfusion maps without considerable lowering of the quality of fit. CONCLUSION The reliability of the DCE-MRI perfusion analysis using the tissue homogeneity model can be improved by employing spatial regularization. The proposed utilization of modern optimization techniques implies only slightly higher computational costs compared to the standard approach without spatial regularization.
Collapse
Affiliation(s)
- Michal Bartoš
- The Czech Academy of Sciences, Institute of Information Theory and Automation, Prague, Czech Republic
| | - Pavel Rajmic
- SPLab, Department of Telecommunications, FEEC, Brno University of Technology, Brno, Czech Republic
| | - Michal Šorel
- The Czech Academy of Sciences, Institute of Information Theory and Automation, Prague, Czech Republic
| | - Marie Mangová
- SPLab, Department of Telecommunications, FEEC, Brno University of Technology, Brno, Czech Republic
| | - Olivier Keunen
- Norlux Neuro-Oncology Laboratory, Department of Oncology, Luxembourg Institute of Health, Luxembourg, Luxembourg
| | - Radovan Jiřík
- The Czech Academy of Sciences, Institute of Scientific Instruments, Brno, Czech Republic
| |
Collapse
|
11
|
Abstract
The stochastic gradient matching pursuit algorithm requires the sparsity of the signal as prior information. However, this prior information is unknown in practical applications, which restricts the practical applications of the algorithm to some extent. An improved method was proposed to overcome this problem. First, a pre-evaluation strategy was used to evaluate the sparsity of the signal and the estimated sparsity was used as the initial sparsity. Second, if the number of columns of the candidate atomic matrix was smaller than that of the rows, the least square solution of the signal was calculated, otherwise, the least square solution of the signal was set as zero. Finally, if the current residual was greater than the previous residual, the estimated sparsity was adjusted by the fixed step-size and stage index, otherwise we did not need to adjust the estimated sparsity. The simulation results showed that the proposed method was better than other methods in terms of the aspect of reconstruction percentage in the larger sparsity environment.
Collapse
|
12
|
Li Z, Ding S, Chen W, Yang Z, Xie S. Proximal Alternating Minimization for Analysis Dictionary Learning and Convergence Analysis. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2018. [DOI: 10.1109/tetci.2018.2806890] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
13
|
Tsang IW, Yu CP. Progressive Stochastic Learning for Noisy Labels. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:5136-5148. [PMID: 29994430 DOI: 10.1109/tnnls.2018.2792062] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Large-scale learning problems require a plethora of labels that can be efficiently collected from crowdsourcing services at low cost. However, labels annotated by crowdsourced workers are often noisy, which inevitably degrades the performance of large-scale optimizations including the prevalent stochastic gradient descent (SGD). Specifically, these noisy labels adversely affect updates of the primal variable in conventional SGD. To solve this challenge, we propose a robust SGD mechanism called progressive stochastic learning (POSTAL), which naturally integrates the learning regime of curriculum learning (CL) with the update process of vanilla SGD. Our inspiration comes from the progressive learning process of CL, namely learning from "easy" tasks to "complex" tasks. Through the robust learning process of CL, POSTAL aims to yield robust updates of the primal variable on an ordered label sequence, namely, from "reliable" labels to "noisy" labels. To realize POSTAL mechanism, we design a cluster of "screening losses," which sorts all labels from the reliable region to the noisy region. To sum up, POSTAL using screening losses ensures robust updates of the primal variable on reliable labels first, then on noisy labels incrementally until convergence. In theory, we derive the convergence rate of POSTAL realized by screening losses. Meanwhile, we provide the robustness analysis of representative screening losses. Experimental results on UCI1 simulated and Amazon Mechanical Turk crowdsourcing data sets show that the POSTAL using screening losses is more effective and robust than several existing baselines.1UCI is the abbreviation of University of California Irvine.
Collapse
|
14
|
Malek A, Hosseinipour-Mahani N. Solving Multiextremal Problems by Using Recurrent Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1562-1574. [PMID: 28328515 DOI: 10.1109/tnnls.2017.2676046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this paper, a neural network model for solving a class of multiextremal smooth nonconvex constrained optimization problems is proposed. Neural network is designed in such a way that its equilibrium points coincide with the local and global optimal solutions of the corresponding optimization problem. Based on the suitable underestimators for the Lagrangian of the problem, one give geometric criteria for an equilibrium point to be a global minimizer of multiextremal constrained optimization problem with or without bounds on the variables. Both necessary and sufficient global optimality conditions for a class of multiextremal constrained optimization problems are presented to determine a global optimal solution. By study of the resulting dynamic system, it is shown that under given assumptions, steady states of the dynamic system are stable and trajectories of the proposed model converge to the local and global optimal solutions of the problem. Numerical results are given and related graphs are depicted to illustrate the global convergence and performance of the solver for multiextremal constrained optimization problems.
Collapse
|
15
|
Li Z, Ding S, Li Y, Yang Z, Xie S, Chen W. Manifold optimization-based analysis dictionary learning with an ℓ 1∕2-norm regularizer. Neural Netw 2017; 98:212-222. [PMID: 29272726 DOI: 10.1016/j.neunet.2017.11.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Revised: 11/13/2017] [Accepted: 11/21/2017] [Indexed: 10/18/2022]
Abstract
Recently there has been increasing attention towards analysis dictionary learning. In analysis dictionary learning, it is an open problem to obtain the strong sparsity-promoting solutions efficiently while simultaneously avoiding the trivial solutions of the dictionary. In this paper, to obtain the strong sparsity-promoting solutions, we employ the ℓ1∕2 norm as a regularizer. The very recent study on ℓ1∕2 norm regularization theory in compressive sensing shows that its solutions can give sparser results than using the ℓ1 norm. We transform a complex nonconvex optimization into a number of one-dimensional minimization problems. Then the closed-form solutions can be obtained efficiently. To avoid trivial solutions, we apply manifold optimization to update the dictionary directly on the manifold satisfying the orthonormality constraint, so that the dictionary can avoid the trivial solutions well while simultaneously capturing the intrinsic properties of the dictionary. The experiments with synthetic and real-world data verify that the proposed algorithm for analysis dictionary learning can not only obtain strong sparsity-promoting solutions efficiently, but also learn more accurate dictionary in terms of dictionary recovery and image processing than the state-of-the-art algorithms.
Collapse
Affiliation(s)
- Zhenni Li
- School of Automation, Guangdong Key Laboratory of IoT Information Technology, Guangdong University of Technology, Guangzhou, 510006, China.
| | - Shuxue Ding
- School of Computer Science and Engineering, The University of Aizu Aizu-Wakamatsu City, Fukushima, 965-8580, Japan.
| | - Yujie Li
- Artificial Intelligence Center, AIST Tsukuba, Ibaraki 305-8560, Japan.
| | - Zuyuan Yang
- School of Automation, Guangdong Key Laboratory of IoT Information Technology, Guangdong University of Technology, Guangzhou, 510006, China.
| | - Shengli Xie
- School of Automation, Guangdong Key Laboratory of IoT Information Technology, Guangdong University of Technology, Guangzhou, 510006, China.
| | - Wuhui Chen
- School of Data and Computer Science, Sun Yat-sen University, China.
| |
Collapse
|
16
|
Rakotomamonjy A, Koco S, Ralaivola L. Greedy Methods, Randomization Approaches, and Multiarm Bandit Algorithms for Efficient Sparsity-Constrained Optimization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:2789-2802. [PMID: 28113680 DOI: 10.1109/tnnls.2016.2600243] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Several sparsity-constrained algorithms, such as orthogonal matching pursuit (OMP) or the Frank-Wolfe (FW) algorithm, with sparsity constraints work by iteratively selecting a novel atom to add to the current nonzero set of variables. This selection step is usually performed by computing the gradient and then by looking for the gradient component with maximal absolute entry. This step can be computationally expensive especially for large-scale and high-dimensional data. In this paper, we aim at accelerating these sparsity-constrained optimization algorithms by exploiting the key observation that, for these algorithms to work, one only needs the coordinate of the gradient's top entry. Hence, we introduce algorithms based on greedy methods and randomization approaches that aim at cheaply estimating the gradient and its top entry. Another of our contribution is to cast the problem of finding the best gradient entry as a best-arm identification in a multiarmed bandit problem. Owing to this novel insight, we are able to provide a bandit-based algorithm that directly estimates the top entry in a very efficient way. Theoretical observations stating that the resulting inexact FW or OMP algorithms act, with high probability, similar to their exact versions are also given. We have carried out several experiments showing that the greedy deterministic and the bandit approaches we propose can achieve an acceleration of an order of magnitude while being as efficient as the exact gradient when used in algorithms, such as OMP, FW, or CoSaMP.
Collapse
|