1
|
Kang Y, Liu J, Wu F, Wang K, Qiang J, Hu D, Zhang Y. Deep convolutional dictionary learning network for sparse view CT reconstruction with a group sparse prior. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:108010. [PMID: 38199137 DOI: 10.1016/j.cmpb.2024.108010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 12/25/2023] [Accepted: 01/05/2024] [Indexed: 01/12/2024]
Abstract
Purpose Numerous techniques based on deep learning have been utilized in sparse view computed tomography (CT) imaging. Nevertheless, the majority of techniques are instinctively constructed utilizing state-of-the-art opaque convolutional neural networks (CNNs) and lack interpretability. Moreover, CNNs tend to focus on local receptive fields and neglect nonlocal self-similarity prior information. Obtaining diagnostically valuable images from sparsely sampled projections is a challenging and ill-posed task. Method To address this issue, we propose a unique and understandable model named DCDL-GS for sparse view CT imaging. This model relies on a network comprised of convolutional dictionary learning and a nonlocal group sparse prior. To enhance the quality of image reconstruction, we utilize a neural network in conjunction with a statistical iterative reconstruction framework and perform a set number of iterations. Inspired by group sparsity priors, we adopt a novel group thresholding operation to improve the feature representation and constraint ability and obtain a theoretical interpretation. Furthermore, our DCDL-GS model incorporates filtered backprojection (FBP) reconstruction, fast sliding window nonlocal self-similarity operations, and a lightweight and interpretable convolutional dictionary learning network to enhance the applicability of the model. Results The efficiency of our proposed DCDL-GS model in preserving edges and recovering features is demonstrated by the visual results obtained on the LDCT-P and UIH datasets. Compared to the results of the most advanced techniques, the quantitative results are enhanced, with increases of 0.6-0.8 dB for the peak signal-to-noise ratio (PSNR), 0.005-0.01 for the structural similarity index measure (SSIM), and 1-1.3 for the regulated Fréchet inception distance (rFID) on the test dataset. The quantitative results also show the effectiveness of our proposed deep convolution iterative reconstruction module and nonlocal group sparse prior. Conclusion In this paper, we create a consolidated and enhanced mathematical model by integrating projection data and prior knowledge of images into a deep iterative model. The model is more practical and interpretable than existing approaches. The results from the experiment show that the proposed model performs well in comparison to the others.
Collapse
Affiliation(s)
- Yanqin Kang
- College of Computer and Information, Anhui Polytechnic University, Wuhu, China; Key Laboratory of Computer Network and Information Integration (Southeast University) Ministry of Education Nanjing, China
| | - Jin Liu
- College of Computer and Information, Anhui Polytechnic University, Wuhu, China; Key Laboratory of Computer Network and Information Integration (Southeast University) Ministry of Education Nanjing, China.
| | - Fan Wu
- College of Computer and Information, Anhui Polytechnic University, Wuhu, China
| | - Kun Wang
- College of Computer and Information, Anhui Polytechnic University, Wuhu, China
| | - Jun Qiang
- College of Computer and Information, Anhui Polytechnic University, Wuhu, China
| | - Dianlin Hu
- Key Laboratory of Computer Network and Information Integration (Southeast University) Ministry of Education Nanjing, China; School of Computer Science and Engineering, Southeast University, Nanjing, China
| | - Yikun Zhang
- Key Laboratory of Computer Network and Information Integration (Southeast University) Ministry of Education Nanjing, China; School of Computer Science and Engineering, Southeast University, Nanjing, China
| |
Collapse
|
2
|
Cui ZX, Jia S, Cheng J, Zhu Q, Liu Y, Zhao K, Ke Z, Huang W, Wang H, Zhu Y, Ying L, Liang D. Equilibrated Zeroth-Order Unrolled Deep Network for Parallel MR Imaging. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:3540-3554. [PMID: 37428656 DOI: 10.1109/tmi.2023.3293826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/12/2023]
Abstract
In recent times, model-driven deep learning has evolved an iterative algorithm into a cascade network by replacing the regularizer's first-order information, such as the (sub)gradient or proximal operator, with a network module. This approach offers greater explainability and predictability compared to typical data-driven networks. However, in theory, there is no assurance that a functional regularizer exists whose first-order information matches the substituted network module. This implies that the unrolled network output may not align with the regularization models. Furthermore, there are few established theories that guarantee global convergence and robustness (regularity) of unrolled networks under practical assumptions. To address this gap, we propose a safeguarded methodology for network unrolling. Specifically, for parallel MR imaging, we unroll a zeroth-order algorithm, where the network module serves as a regularizer itself, allowing the network output to be covered by a regularization model. Additionally, inspired by deep equilibrium models, we conduct the unrolled network before backpropagation to converge to a fixed point and then demonstrate that it can tightly approximate the actual MR image. We also prove that the proposed network is robust against noisy interferences if the measurement data contain noise. Finally, numerical experiments indicate that the proposed network consistently outperforms state-of-the-art MRI reconstruction methods, including traditional regularization and unrolled deep learning techniques.
Collapse
|
3
|
Liu R, Liu X, Zeng S, Zhang J, Zhang Y. Hierarchical Optimization-Derived Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14693-14708. [PMID: 37708018 DOI: 10.1109/tpami.2023.3315333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/16/2023]
Abstract
In recent years, by utilizing optimization techniques to formulate the propagation of deep model, a variety of so-called Optimization-Derived Learning (ODL) approaches have been proposed to address diverse learning and vision tasks. Although having achieved relatively satisfying practical performance, there still exist fundamental issues in existing ODL methods. In particular, current ODL methods tend to consider model constructing and learning as two separate phases, and thus fail to formulate their underlying coupling and depending relationship. In this work, we first establish a new framework, named Hierarchical ODL (HODL), to simultaneously investigate the intrinsic behaviors of optimization-derived model construction and its corresponding learning process. Then we rigorously prove the joint convergence of these two sub-tasks, from the perspectives of both approximation quality and stationary analysis. To our best knowledge, this is the first theoretical guarantee for these two coupled ODL components: optimization and learning. We further demonstrate the flexibility of our framework by applying HODL to challenging learning tasks, which have not been properly addressed by existing ODL methods. Finally, we conduct extensive experiments on both synthetic data and real applications in vision and other learning tasks to verify the theoretical properties and practical performance of HODL in various application scenarios.
Collapse
|
4
|
Liu R, Liu X, Zeng S, Zhang J, Zhang Y. Value-Function-Based Sequential Minimization for Bi-Level Optimization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:15930-15948. [PMID: 37552592 DOI: 10.1109/tpami.2023.3303227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/10/2023]
Abstract
Gradient-based Bi-Level Optimization (BLO) methods have been widely applied to handle modern learning tasks. However, most existing strategies are theoretically designed based on restrictive assumptions (e.g., convexity of the lower-level sub-problem), and computationally not applicable for high-dimensional tasks. Moreover, there are almost no gradient-based methods able to solve BLO in those challenging scenarios, such as BLO with functional constraints and pessimistic BLO. In this work, by reformulating BLO into approximated single-level problems, we provide a new algorithm, named Bi-level Value-Function-based Sequential Minimization (BVFSM), to address the above issues. Specifically, BVFSM constructs a series of value-function-based approximations, and thus avoids repeated calculations of recurrent gradient and Hessian inverse required by existing approaches, time-consuming especially for high-dimensional tasks. We also extend BVFSM to address BLO with additional functional constraints. More importantly, BVFSM can be used for the challenging pessimistic BLO, which has never been properly solved before. In theory, we prove the asymptotic convergence of BVFSM on these types of BLO, in which the restrictive lower-level convexity assumption is discarded. To our best knowledge, this is the first gradient-based algorithm that can solve different kinds of BLO (e.g., optimistic, pessimistic, and with constraints) with solid convergence guarantees. Extensive experiments verify the theoretical investigations and demonstrate our superiority on various real-world applications.
Collapse
|
5
|
Liu R, Liu Z, Mu P, Fan X, Luo Z. Optimization-Inspired Learning With Architecture Augmentations and Control Mechanisms for Low-Level Vision. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:6075-6089. [PMID: 37922167 DOI: 10.1109/tip.2023.3328486] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2023]
Abstract
In recent years, there has been a growing interest in combining learnable modules with numerical optimization to solve low-level vision tasks. However, most existing approaches focus on designing specialized schemes to generate image/feature propagation. There is a lack of unified consideration to construct propagative modules, provide theoretical analysis tools, and design effective learning mechanisms. To mitigate the above issues, this paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC for short) principles with strong generalization for diverse optimization models. Specifically, by introducing a general energy minimization model and formulating its descent direction from different viewpoints (i.e., in a generative manner, based on the discriminative metric and with optimality-based correction), we construct three propagative modules to effectively solve the optimization models with flexible combinations. We design two control mechanisms that provide the non-trivial theoretical guarantees for both fully- and partially-defined optimization formulations. Under the support of theoretical guarantees, we can introduce diverse architecture augmentation strategies such as normalization and search to ensure stable propagation with convergence and seamlessly integrate the suitable modules into the propagation respectively. Extensive experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
Collapse
|
6
|
Xiao J, Fu X, Liu A, Wu F, Zha ZJ. Image De-Raining Transformer. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:12978-12995. [PMID: 35709118 DOI: 10.1109/tpami.2022.3183612] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Existing deep learning based de-raining approaches have resorted to the convolutional architectures. However, the intrinsic limitations of convolution, including local receptive fields and independence of input content, hinder the model's ability to capture long-range and complicated rainy artifacts. To overcome these limitations, we propose an effective and efficient transformer-based architecture for the image de-raining. First, we introduce general priors of vision tasks, i.e., locality and hierarchy, into the network architecture so that our model can achieve excellent de-raining performance without costly pre-training. Second, since the geometric appearance of rainy artifacts is complicated and of significant variance in space, it is essential for de-raining models to extract both local and non-local features. Therefore, we design the complementary window-based transformer and spatial transformer to enhance locality while capturing long-range dependencies. Besides, to compensate for the positional blindness of self-attention, we establish a separate representative space for modeling positional relationship, and design a new relative position enhanced multi-head self-attention. In this way, our model enjoys powerful abilities to capture dependencies from both content and position, so as to achieve better image content recovery while removing rainy artifacts. Experiments substantiate that our approach attains more appealing results than state-of-the-art methods quantitatively and qualitatively.
Collapse
|
7
|
Liu J, Zhang T, Kang Y, Wang Y, Zhang Y, Hu D, Chen Y. Deep residual constrained reconstruction via learned convolutional sparse coding for low-dose CT imaging. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]
|
8
|
Gao Z, Wu Y, Fan X, Harandi M, Jia Y. Learning to Optimize on Riemannian Manifolds. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:5935-5952. [PMID: 36260581 DOI: 10.1109/tpami.2022.3215702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Many learning tasks are modeled as optimization problems with nonlinear constraints, such as principal component analysis and fitting a Gaussian mixture model. A popular way to solve such problems is resorting to Riemannian optimization algorithms, which yet heavily rely on both human involvement and expert knowledge about Riemannian manifolds. In this paper, we propose a Riemannian meta-optimization method to automatically learn a Riemannian optimizer. We parameterize the Riemannian optimizer by a novel recurrent network and utilize Riemannian operations to ensure that our method is faithful to the geometry of manifolds. The proposed method explores the distribution of the underlying data by minimizing the objective of updated parameters, and hence is capable of learning task-specific optimizations. We introduce a Riemannian implicit differentiation training scheme to achieve efficient training in terms of numerical stability and computational cost. Unlike conventional meta-optimization training schemes that need to differentiate through the whole optimization trajectory, our training scheme is only related to the final two optimization steps. In this way, our training scheme avoids the exploding gradient problem, and significantly reduces the computational load and memory footprint. We discuss experimental results across various constrained problems, including principal component analysis on Grassmann manifolds, face recognition, person re-identification, and texture image classification on Stiefel manifolds, clustering and similarity learning on symmetric positive definite manifolds, and few-shot learning on hyperbolic manifolds.
Collapse
|
9
|
Liu R, Ma L, Ma T, Fan X, Luo Z. Learning With Nested Scene Modeling and Cooperative Architecture Search for Low-Light Vision. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:5953-5969. [PMID: 36215366 DOI: 10.1109/tpami.2022.3212995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Images captured from low-light scenes often suffer from severe degradations, including low visibility, color casts, intensive noises, etc. These factors not only degrade image qualities, but also affect the performance of downstream Low-Light Vision (LLV) applications. A variety of deep networks have been proposed to enhance the visual quality of low-light images. However, they mostly rely on significant architecture engineering and often suffer from the high computational burden. More importantly, it still lacks an efficient paradigm to uniformly handle various tasks in the LLV scenarios. To partially address the above issues, we establish Retinex-inspired Unrolling with Architecture Search (RUAS), a general learning framework, that can address low-light enhancement task, and has the flexibility to handle other challenging downstream vision tasks. Specifically, we first establish a nested optimization formulation, together with an unrolling strategy, to explore underlying principles of a series of LLV tasks. Furthermore, we design a differentiable strategy to cooperatively search specific scene and task architectures for RUAS. Last but not least, we demonstrate how to apply RUAS for both low- and high-level LLV applications (e.g., enhancement, detection and segmentation). Extensive experiments verify the flexibility, effectiveness, and efficiency of RUAS.
Collapse
|
10
|
Liu R, Li Z, Fan X, Zhao C, Huang H, Luo Z. Learning Deformable Image Registration From Optimization: Perspective, Modules, Bilevel Training and Beyond. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:7688-7704. [PMID: 34582346 DOI: 10.1109/tpami.2021.3115825] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Conventional deformable registration methods aim at solving an optimization model carefully designed on image pairs and their computational costs are exceptionally high. In contrast, recent deep learning-based approaches can provide fast deformation estimation. These heuristic network architectures are fully data-driven and thus lack explicit geometric constraints which are indispensable to generate plausible deformations, e.g., topology-preserving. Moreover, these learning-based approaches typically pose hyper-parameter learning as a black-box problem and require considerable computational and human effort to perform many training runs. To tackle the aforementioned problems, we propose a new learning-based framework to optimize a diffeomorphic model via multi-scale propagation. Specifically, we introduce a generic optimization model to formulate diffeomorphic registration and develop a series of learnable architectures to obtain propagative updating in the coarse-to-fine feature space. Further, we propose a new bilevel self-tuned training strategy, allowing efficient search of task-specific hyper-parameters. This training strategy increases the flexibility to various types of data while reduces computational and human burdens. We conduct two groups of image registration experiments on 3D volume datasets including image-to-atlas registration on brain MRI data and image-to-image registration on liver CT data. Extensive results demonstrate the state-of-the-art performance of the proposed method with diffeomorphic guarantee and extreme efficiency. We also apply our framework to challenging multi-modal image registration, and investigate how our registration to support the down-streaming tasks for medical image analysis including multi-modal fusion and image segmentation.
Collapse
|
11
|
Image restoration algorithm incorporating methods to remove noise and blurring from positron emission tomography imaging for Alzheimer's disease diagnosis. Phys Med 2022; 103:181-189. [DOI: 10.1016/j.ejmp.2022.10.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/26/2022] [Accepted: 10/22/2022] [Indexed: 11/11/2022] Open
|
12
|
Mai TTN, Lam EY, Lee C. Deep Unrolled Low-Rank Tensor Completion for High Dynamic Range Imaging. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5774-5787. [PMID: 36048976 DOI: 10.1109/tip.2022.3201708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The major challenge in high dynamic range (HDR) imaging for dynamic scenes is suppressing ghosting artifacts caused by large object motions or poor exposures. Whereas recent deep learning-based approaches have shown significant synthesis performance, interpretation and analysis of their behaviors are difficult and their performance is affected by the diversity of training data. In contrast, traditional model-based approaches yield inferior synthesis performance to learning-based algorithms despite their theoretical thoroughness. In this paper, we propose an algorithm unrolling approach to ghost-free HDR image synthesis algorithm that unrolls an iterative low-rank tensor completion algorithm into deep neural networks to take advantage of the merits of both learning- and model-based approaches while overcoming their weaknesses. First, we formulate ghost-free HDR image synthesis as a low-rank tensor completion problem by assuming the low-rank structure of the tensor constructed from low dynamic range (LDR) images and linear dependency among LDR images. We also define two regularization functions to compensate for modeling inaccuracy by extracting hidden model information. Then, we solve the problem efficiently using an iterative optimization algorithm by reformulating it into a series of subproblems. Finally, we unroll the iterative algorithm into a series of blocks corresponding to each iteration, in which the optimization variables are updated by rigorous closed-form solutions and the regularizers are updated by learned deep neural networks. Experimental results on different datasets show that the proposed algorithm provides better HDR image synthesis performance with superior robustness compared with state-of-the-art algorithms, while using significantly fewer training samples.
Collapse
|
13
|
Liu R, Ma L, Zhang Y, Fan X, Luo Z. Underexposed Image Correction via Hybrid Priors Navigated Deep Propagation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:3425-3436. [PMID: 33513118 DOI: 10.1109/tnnls.2021.3052903] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Enhancing visual quality for underexposed images is an extensively concerning task that plays an important role in various areas of multimedia and computer vision. Most existing methods often fail to generate high-quality results with appropriate luminance and abundant details. To address these issues, we develop a novel framework, integrating both knowledge from physical principles and implicit distributions from data to address underexposed image correction. More concretely, we propose a new perspective to formulate this task as an energy-inspired model with advanced hybrid priors. A propagation procedure navigated by the hybrid priors is well designed for simultaneously propagating the reflectance and illumination toward desired results. We conduct extensive experiments to verify the necessity of integrating both underlying principles (i.e., with knowledge) and distributions (i.e., from data) as navigated deep propagation. Plenty of experimental results of underexposed image correction demonstrate that our proposed method performs favorably against the state-of-the-art methods on both subjective and objective assessments. In addition, we execute the task of face detection to further verify the naturalness and practical value of underexposed image correction. What is more, we apply our method to solve single-image haze removal whose experimental results further demonstrate our superiorities.
Collapse
|
14
|
Liu J, Kang Y, Xia Z, Qiang J, Zhang J, Zhang Y, Chen Y. MRCON-Net: Multiscale reweighted convolutional coding neural network for low-dose CT imaging. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 221:106851. [PMID: 35576686 DOI: 10.1016/j.cmpb.2022.106851] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 03/28/2022] [Accepted: 04/30/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE Low-dose computed tomography (LDCT) has become increasingly important for alleviating X-ray radiation damage. However, reducing the administered radiation dose may lead to degraded CT images with amplified mottle noise and nonstationary streak artifacts. Previous studies have confirmed that deep learning (DL) is promising for improving LDCT imaging. However, most DL-based frameworks are built intuitively, lack interpretability, and suffer from image detail information loss, which has become a general challenging issue. METHODS A multiscale reweighted convolutional coding neural network (MRCON-Net) is developed to address the above problems. MRCON-Net is compact and more explainable than other networks. First, inspired by the learning-based reweighted iterative soft thresholding algorithm (ISTA), we extend traditional convolutional sparse coding (CSC) to its reweighted convolutional learning form. Second, we use dilated convolution to extract multiscale image features, allowing our single model to capture the correlations between features of different scales. Finally, to automatically adjust the elements in the feature code to correct the obtained solution, a channel attention (CA) mechanism is utilized to learn appropriate weights. RESULTS The visual results obtained based on the American Association of Physicians in Medicine (AAPM) Challenge and United Image Healthcare (UIH) clinical datasets confirm that the proposed model significantly reduces serious artifact noise while retaining the desired structures. Quantitative results show that the average structural similarity index measurement (SSIM) and peak signal-to-noise ratio (PSNR) achieved on the AAPM Challenge dataset are 0.9491 and 40.66, respectively, and the SSIM and PSNR achieved on the UIH clinical dataset are 0.915 and 42.44, respectively; these are promising quantitative results. CONCLUSION Compared with recent state-of-the-art methods, the proposed model achieves subtle structure-enhanced LDCT imaging. In addition, through ablation studies, the components of the proposed model are validated to achieve performance improvements.
Collapse
Affiliation(s)
- Jin Liu
- College of Computer and Information, Anhui Polytechnic University, Wuhu, China; Key Laboratory of Computer Network and Information Integration (Southeast University) Ministry of Education Nanjing, China.
| | - Yanqin Kang
- College of Computer and Information, Anhui Polytechnic University, Wuhu, China; Key Laboratory of Computer Network and Information Integration (Southeast University) Ministry of Education Nanjing, China
| | - Zhenyu Xia
- College of Computer and Information, Anhui Polytechnic University, Wuhu, China
| | - Jun Qiang
- College of Computer and Information, Anhui Polytechnic University, Wuhu, China
| | - JunFeng Zhang
- School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou, China
| | - Yikun Zhang
- Key Laboratory of Computer Network and Information Integration (Southeast University) Ministry of Education Nanjing, China; School of Cyber Science and Engineering, Southeast University, Nanjing, China; School of Computer Science and Engineering, Southeast University, Nanjing, China
| | - Yang Chen
- Key Laboratory of Computer Network and Information Integration (Southeast University) Ministry of Education Nanjing, China; School of Cyber Science and Engineering, Southeast University, Nanjing, China; School of Computer Science and Engineering, Southeast University, Nanjing, China
| |
Collapse
|
15
|
Zhang M, Young GS, Tie Y, Gu X, Xu X. A New Framework of Designing Iterative Techniques for Image Deblurring. PATTERN RECOGNITION 2022; 124:108463. [PMID: 34949896 PMCID: PMC8691531 DOI: 10.1016/j.patcog.2021.108463] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In this work we present a framework of designing iterative techniques for image deblurring in inverse problem. The new framework is based on two observations about existing methods. We used Landweber method as the basis to develop and present the new framework but note that the framework is applicable to other iterative techniques. First, we observed that the iterative steps of Landweber method consist of a constant term, which is a low-pass filtered version of the already blurry observation. We proposed a modification to use the observed image directly. Second, we observed that Landweber method uses an estimate of the true image as the starting point. This estimate, however, does not get updated over iterations. We proposed a modification that updates this estimate as the iterative process progresses. We integrated the two modifications into one framework of iteratively deblurring images. Finally, we tested the new method and compared its performance with several existing techniques, including Landweber method, Van Cittert method, GMRES (generalized minimal residual method), and LSQR (least square), to demonstrate its superior performance in image deblurring.
Collapse
Affiliation(s)
- Min Zhang
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Geoffrey S Young
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Yanmei Tie
- Department of Neurosurgery, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Xianfeng Gu
- Department of Computer Science, Stony Brook University, Stony Brook, NY
| | - Xiaoyin Xu
- Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
16
|
Liu R, Ma L, Yuan X, Zeng S, Zhang J. Task-Oriented Convex Bilevel Optimization With Latent Feasibility. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:1190-1203. [PMID: 35015638 DOI: 10.1109/tip.2022.3140607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This paper firstly proposes a convex bilevel optimization paradigm to formulate and optimize popular learning and vision problems in real-world scenarios. Different from conventional approaches, which directly design their iteration schemes based on given problem formulation, we introduce a task-oriented energy as our latent constraint which integrates richer task information. By explicitly re- characterizing the feasibility, we establish an efficient and flexible algorithmic framework to tackle convex models with both shrunken solution space and powerful auxiliary (based on domain knowledge and data distribution of the task). In theory, we present the convergence analysis of our latent feasibility re- characterization based numerical strategy. We also analyze the stability of the theoretical convergence under computational error perturbation. Extensive numerical experiments are conducted to verify our theoretical findings and evaluate the practical performance of our method on different applications.
Collapse
|
17
|
Liu R, Mu P, Zhang J. Investigating Customization Strategies and Convergence Behaviors of Task-Specific ADMM. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2021; 30:8278-8292. [PMID: 34559653 DOI: 10.1109/tip.2021.3113796] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Alternating Direction Method of Multiplier (ADMM) has been a popular algorithmic framework for separable optimization problems with linear constraints. For numerical ADMM fail to exploit the particular structure of the problem at hand nor the input data information, leveraging task-specific modules (e.g., neural networks and other data-driven architectures) to extend ADMM is a significant but challenging task. This work focuses on designing a flexible algorithmic framework to incorporate various task-specific modules (with no additional constraints) to improve the performance of ADMM in real-world applications. Specifically, we propose Guidance from Optimality (GO), a new customization strategy, to embed task-specific modules into ADMM (GO-ADMM). By introducing an optimality-based criterion to guide the propagation, GO-ADMM establishes an updating scheme agnostic to the choice of additional modules. The existing task-specific methods just plug their task-specific modules into the numerical iterations in a straightforward manner. Even with some restrictive constraints on the plug-in modules, they can only obtain some relatively weaker convergence properties for the resulted ADMM iterations. Fortunately, without any restrictions on the embedded modules, we prove the convergence of GO-ADMM regarding objective values and constraint violations, and derive the worst-case convergence rate measured by iteration complexity. Extensive experiments are conducted to verify the theoretical results and demonstrate the efficiency of GO-ADMM.
Collapse
|
18
|
Liu R, Chen Q, Yao Y, Fan X, Luo Z. Location-Aware and Regularization-Adaptive Correlation Filters for Robust Visual Tracking. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2430-2442. [PMID: 32749966 DOI: 10.1109/tnnls.2020.3005447] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Correlation filter (CF) has recently been widely used for visual tracking. The estimation of the search window and the filter-learning strategies is the key component of the CF trackers. Nevertheless, prevalent CF models separately address these issues in heuristic manners. The commonly used CF models directly set the estimated location in the previous frame as the search center for the current one. Moreover, these models usually rely on simple and fixed regularization for filter learning, and thus, their performance is compromised by the search window size and optimization heuristics. To break these limits, this article proposes a location-aware and regularization-adaptive CF (LRCF) for robust visual tracking. LRCF establishes a novel bilevel optimization model to address simultaneously the location-estimation and filter-training problems. We prove that our bilevel formulation can successfully obtain a globally converged CF and the corresponding object location in a collaborative manner. Moreover, based on the LRCF framework, we design two trackers named LRCF-S and LRCF-SA and a series of comparisons to prove the flexibility and effectiveness of the LRCF framework. Extensive experiments on different challenging benchmark data sets demonstrate that our LRCF trackers perform favorably against the state-of-the-art methods in practice.
Collapse
|
19
|
Brightening the Low-Light Images via a Dual Guided Network. ARTIF INTELL 2021. [DOI: 10.1007/978-3-030-93046-2_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
20
|
Li Z, Xin F, Liu R, Luo Z. Optimizing Loss Function for Uni-modal and Multi-modal Medical Registration. ARTIF INTELL 2021. [DOI: 10.1007/978-3-030-93046-2_23] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Li B, Gou Y, Liu JZ, Zhu H, Zhou JT, Peng X. Zero-Shot Image Dehazing. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2020; PP:8457-8466. [PMID: 32809939 DOI: 10.1109/tip.2020.3016134] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In this paper, we study two less-touched challenging problems in single image dehazing neural networks, namely, how to remove haze from a given image in an unsupervised and zeroshot manner. To the ends, we propose a novel method based on the idea of layer disentanglement by viewing a hazy image as the entanglement of several "simpler" layers, i.e., a hazy-free image layer, transmission map layer, and atmospheric light layer. The major advantages of the proposed ZID are two-fold. First, it is an unsupervised method that does not use any clean images including hazy-clean pairs as the ground-truth. Second, ZID is a "zero-shot" method, which just uses the observed single hazy image to perform learning and inference. In other words, it does not follow the conventional paradigm of training deep model on a large scale dataset. These two advantages enable our method to avoid the labor-intensive data collection and the domain shift issue of using the synthetic hazy images to address the real-world images. Extensive comparisons show the promising performance of our method compared with 15 approaches in the qualitative and quantitive evaluations. The source code could be found at www.pengxi.me.
Collapse
|
22
|
Liu R, Jiang Z, Fan X, Luo Z. Knowledge-Driven Deep Unrolling for Robust Image Layer Separation. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:1653-1666. [PMID: 31329566 DOI: 10.1109/tnnls.2019.2921597] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Single-image layer separation targets to decompose the observed image into two independent components in terms of different application demands. It is known that many vision and multimedia applications can be (re)formulated as a separation problem. Due to the fundamentally ill-posed natural of these separations, existing methods are inclined to investigate model priors on the separated components elaborately. Nevertheless, it is knotty to optimize the cost function with complicated model regularizations. Effectiveness is greatly conceded by the settled iteration mechanism, and the adaption cannot be guaranteed due to the poor data fitting. What is more, for a universal framework, the most taxing point is that one type of visual cue cannot be shared with different tasks. To partly overcome the weaknesses mentioned earlier, we delve into a generic optimization unrolling technique to incorporate deep architectures into iterations for adaptive image layer separation. First, we propose a general energy model with implicit priors, which is based on maximum a posterior, and employ the extensively accepted alternating direction method of multiplier to determine our elementary iteration mechanism. By unrolling with one general residual architecture prior and one task-specific prior, we attain a straightforward, flexible, and data-dependent image separation framework successfully. We apply our method to four different tasks, including single-image-rain streak removal, high-dynamic-range tone mapping, low-light image enhancement, and single-image reflection removal. Extensive experiments demonstrate that the proposed method is applicable to multiple tasks and outperforms the state of the arts by a large margin qualitatively and quantitatively.
Collapse
|
23
|
Gu S, Guo S, Zuo W, Chen Y, Timofte R, Van Gool L, Zhang L. Learned Dynamic Guidance for Depth Image Reconstruction. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2019; 42:2437-2452. [PMID: 31870979 DOI: 10.1109/tpami.2019.2961672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The depth images acquired by consumer depth sensors (e.g., Kinect and ToF) usually are of low resolution and insufficient quality. One natural solution is to incorporate a high resolution RGB camera and exploit the statistical correlation of its data and depth. In recent years, both optimization-based and learning-based approaches have been proposed to deal with the guided depth reconstruction problems. In this paper, we introduce a weighted analysis sparse representation (WASR) model for guided depth image enhancement, which can be considered a generalized formulation of a wide range of previous optimization-based models. We unfold the optimization by the WASR model and conduct guided depth reconstruction with dynamically changed stage-wise operations. Such a guidance strategy enables us to dynamically adjust the stage-wise operations that update the depth image, thus improving the reconstruction quality and speed. To learn the stage-wise operations in a task-driven manner, we propose two parameterizations and their corresponding methods: dynamic guidance with Gaussian RBF nonlinearity parameterization (DG-RBF) and dynamic guidance with CNN nonlinearity parameterization (DG-CNN). The network structures of the proposed DG-RBF and DG-CNN methods are designed with the the objective function of our WASR model in mind and the optimal network parameters are learned from paired training data. Such optimization-inspired network architectures enable our models to leverage the previous expertise as well as take benefit from training data. The effectiveness is validated for guided depth image super-resolution and for realistic depth image reconstruction tasks using standard benchmarks. Our DG-RBF and DG-CNN methods achieve the best quantitative results (RMSE) and better visual quality than the state-of-the-art approaches at the time of writing. The code is available at https://github.com/ShuhangGu/GuidedDepthSR.
Collapse
|