1
|
Paul VS, Nelson PA. Efficient design of complex-valued neural networks with application to the classification of transient acoustic signals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 156:1099-1110. [PMID: 39140882 DOI: 10.1121/10.0028230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Accepted: 07/29/2024] [Indexed: 08/15/2024]
Abstract
A paper by the current authors Paul and Nelson [JASA Express Lett. 3(9), 094802 (2023)] showed how the singular value decomposition (SVD) of the matrix of real weights in a neural network could be used to prune the network during training. The paper presented here shows that a similar approach can be used to reduce the training time and increase the implementation efficiency of complex-valued neural networks. Such networks have potential advantages compared to their real-valued counterparts, especially when the complex representation of the data is important, which is the often case in acoustic signal processing. In comparing the performance of networks having both real and complex elements, it is demonstrated that there are some advantages to the use of complex networks in the cases considered. The paper includes a derivation of the backpropagation algorithm, in matrix form, for training a complex-valued multilayer perceptron with an arbitrary number of layers. The matrix-based analysis enables the application of the SVD to the complex weight matrices in the network. The SVD-based pruning technique is applied to the problem of the classification of transient acoustic signals. It is shown how training times can be reduced, and implementation efficiency increased, while ensuring that such signals can be classified with remarkable accuracy.
Collapse
Affiliation(s)
- Vlad S Paul
- Institute of Sound and Vibration Research, University of Southampton, Southampton SO17 1BJ, United Kingdom
| | - Philip A Nelson
- Institute of Sound and Vibration Research, University of Southampton, Southampton SO17 1BJ, United Kingdom
| |
Collapse
|
2
|
Li X, Huang JC, Zhang GZ, Li HE, Cao CS, Lv D, Hu HS. A Nonstochastic Optimization Algorithm for Neural-Network Quantum States. J Chem Theory Comput 2023; 19:8156-8165. [PMID: 37962975 DOI: 10.1021/acs.jctc.3c00831] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Neural-network quantum states (NQS) employ artificial neural networks to encode many-body wave functions in a second quantization through variational Monte Carlo (VMC). They have recently been applied to accurately describe electronic wave functions of molecules and have shown the challenges in efficiency compared with traditional quantum chemistry methods. Here, we introduce a general nonstochastic optimization algorithm for NQS in chemical systems, which deterministically generates a selected set of important configurations simultaneously with energy evaluation of NQS. This method bypasses the need for Markov-chain Monte Carlo within the VMC framework, thereby accelerating the entire optimization process. Furthermore, this newly developed nonstochastic optimization algorithm for NQS offers comparable or superior accuracy compared to its stochastic counterpart and ensures more stable convergence. The application of this model to test molecules exhibiting strong electron correlations provides further insight into the performance of NQS in chemical systems and opens avenues for future enhancements.
Collapse
Affiliation(s)
- Xiang Li
- Department of Chemistry and Engineering Research Center of Advanced Rare-Earth Materials of Ministry of Education, Tsinghua University, Beijing 100084, China
| | - Jia-Cheng Huang
- Department of Chemistry and Engineering Research Center of Advanced Rare-Earth Materials of Ministry of Education, Tsinghua University, Beijing 100084, China
| | - Guang-Ze Zhang
- Department of Chemistry and Engineering Research Center of Advanced Rare-Earth Materials of Ministry of Education, Tsinghua University, Beijing 100084, China
| | - Hao-En Li
- Department of Chemistry and Engineering Research Center of Advanced Rare-Earth Materials of Ministry of Education, Tsinghua University, Beijing 100084, China
| | - Chang-Su Cao
- Department of Chemistry and Engineering Research Center of Advanced Rare-Earth Materials of Ministry of Education, Tsinghua University, Beijing 100084, China
- ByteDance Research, Zhonghang Plaza, No. 43, North Third Ring West Road, Haidian District, Beijing 100089, China
| | - Dingshun Lv
- ByteDance Research, Zhonghang Plaza, No. 43, North Third Ring West Road, Haidian District, Beijing 100089, China
| | - Han-Shi Hu
- Department of Chemistry and Engineering Research Center of Advanced Rare-Earth Materials of Ministry of Education, Tsinghua University, Beijing 100084, China
| |
Collapse
|
3
|
Zhang Y, Huang H, Shen G. Adaptive CL-BFGS Algorithms for Complex-Valued Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6313-6327. [PMID: 34995196 DOI: 10.1109/tnnls.2021.3135553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Complex-valued limited-memory BFGS (CL-BFGS) algorithm is efficient for the training of complex-valued neural networks (CVNNs). As an important parameter, the memory size represents the number of saved vector pairs and would essentially affect the performance of the algorithm. However, the determination of a suitable memory size for the CL-BFGS algorithm remains challenging. To deal with this issue, an adaptive method is proposed in which the memory size is allowed to vary during the iteration process. Basically, at each iteration, with the help of multistep quasi-Newton method, an appropriate memory size is chosen from a variable set {1,2, ... , M} by approximating complex Hessian matrix as close as possible. To reduce the computational complexity and ensure desired performance, the upper bound M is adjustable according to the moving average of memory sizes found in previous iterations. The proposed adaptive CL-BFGS (ACL-BFGS) algorithm can be efficiently applied for the training of CVNNs. Moreover, it is suggested to take multiple memory sizes to construct the search direction, which further improves the performance of the ACL-BFGS algorithm. Experimental results on some benchmark problems including the pattern classification, complex function approximation, and nonlinear channel equalization problems are given to illustrate the advantages of the developed algorithms over some previous ones.
Collapse
|
4
|
Li Z, Lange K, Fessler JA. Poisson Phase Retrieval in Very Low-count Regimes. IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING 2022; 8:838-850. [PMID: 37065711 PMCID: PMC10099278 DOI: 10.1109/tci.2022.3209936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
This paper discusses phase retrieval algorithms for maximum likelihood (ML) estimation from measurements following independent Poisson distributions in very low-count regimes, e.g., 0.25 photon per pixel. To maximize the log-likelihood of the Poisson ML model, we propose a modified Wirtinger flow (WF) algorithm using a step size based on the observed Fisher information. This approach eliminates all parameter tuning except the number of iterations. We also propose a novel curvature for majorize-minimize (MM) algorithms with a quadratic majorizer. We show theoretically that our proposed curvature is sharper than the curvature derived from the supremum of the second derivative of the Poisson ML cost function. We compare the proposed algorithms (WF, MM) with existing optimization methods, including WF using other step-size schemes, quasi-Newton methods such as LBFGS and alternating direction method of multipliers (ADMM) algorithms, under a variety of experimental settings. Simulation experiments with a random Gaussian matrix, a canonical DFT matrix, a masked DFT matrix and an empirical transmission matrix demonstrate the following. 1) As expected, algorithms based on the Poisson ML model consistently produce higher quality reconstructions than algorithms derived from Gaussian noise ML models when applied to low-count data. Furthermore, incorporating regularizers, such as corner-rounded anisotropic total variation (TV) that exploit the assumed properties of the latent image, can further improve the reconstruction quality. 2) For unregularized cases, our proposed WF algorithm with Fisher information for step size converges faster (in terms of cost function and PSNR vs. time) than other WF methods, e.g., WF with empirical step size, backtracking line search, and optimal step size for the Gaussian noise model; it also converges faster than the LBFGS quasi-Newton method. 3) In regularized cases, our proposed WF algorithm converges faster than WF with backtracking line search, LBFGS, MM and ADMM.
Collapse
Affiliation(s)
- Zongyu Li
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109-2122
| | - Kenneth Lange
- Departments of Computational Medicine, Human Genetics, and Statistics, University of California, Los Angeles, CA 90095
| | - Jeffrey A Fessler
- Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109-2122
| |
Collapse
|
5
|
Karkalousos D, Noteboom S, Hulst HE, Vos FM, Caan MWA. Assessment of data consistency through cascades of independently recurrent inference machines for fast and robust accelerated MRI reconstruction. Phys Med Biol 2022; 67. [DOI: 10.1088/1361-6560/ac6cc2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 05/04/2022] [Indexed: 11/11/2022]
Abstract
Abstract
Objective. Machine Learning methods can learn how to reconstruct magnetic resonance images (MRI) and thereby accelerate acquisition, which is of paramount importance to the clinical workflow. Physics-informed networks incorporate the forward model of accelerated MRI reconstruction in the learning process. With increasing network complexity, robustness is not ensured when reconstructing data unseen during training. We aim to embed data consistency (DC) in deep networks while balancing the degree of network complexity. While doing so, we will assess whether either explicit or implicit enforcement of DC in varying network architectures is preferred to optimize performance. Approach. We propose a scheme called Cascades of Independently Recurrent Inference Machines (CIRIM) to assess DC through unrolled optimization. Herein we assess DC both implicitly by gradient descent and explicitly by a designed term. Extensive comparison of the CIRIM to compressed sensing as well as other Machine Learning methods is performed: the End-to-End Variational Network (E2EVN), CascadeNet, KIKINet, LPDNet, RIM, IRIM, and UNet. Models were trained and evaluated on T1-weighted and FLAIR contrast brain data, and T2-weighted knee data. Both 1D and 2D undersampling patterns were evaluated. Robustness was tested by reconstructing 7.5× prospectively undersampled 3D FLAIR MRI data of multiple sclerosis (MS) patients with white matter lesions. Main results. The CIRIM performed best when implicitly enforcing DC, while the E2EVN required an explicit DC formulation. Through its cascades, the CIRIM was able to score higher on structural similarity and PSNR compared to other methods, in particular under heterogeneous imaging conditions. In reconstructing MS patient data, prospectively acquired with a sampling pattern unseen during model training, the CIRIM maintained lesion contrast while efficiently denoising the images. Significance. The CIRIM showed highly promising generalization capabilities maintaining a very fair trade-off between reconstructed image quality and fast reconstruction times, which is crucial in the clinical workflow.
Collapse
|
6
|
Zhang SQ, Gao W, Zhou ZH. Towards understanding theoretical advantages of complex-reaction networks. Neural Netw 2022; 151:80-93. [DOI: 10.1016/j.neunet.2022.03.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 01/21/2022] [Accepted: 03/22/2022] [Indexed: 11/17/2022]
|
7
|
Information geometry of hyperbolic-valued Boltzmann machines. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
8
|
Dong Z, Huang H. A training algorithm with selectable search direction for complex-valued feedforward neural networks. Neural Netw 2021; 137:75-84. [PMID: 33556803 DOI: 10.1016/j.neunet.2021.01.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 12/04/2020] [Accepted: 01/14/2021] [Indexed: 10/22/2022]
Abstract
This paper focuses on presenting an efficient training algorithm for complex-valued feedforward neural networks by utilizing a tree structure. The basic idea of the proposed algorithm is that, by introducing a set of direction factors, distinctive search directions are available to be selected at each iteration such that the objective function is reduced as much as possible. Compared with some well-known training algorithms, one of the advantages of our algorithm is that the determination of search direction is of great flexibility and thus more accurate solution is obtained with faster convergence speed. Experimental simulations on pattern recognition, channel equalization and complex function approximation are provided to verify the effectiveness and applications of the proposed algorithm.
Collapse
Affiliation(s)
- Zhongying Dong
- School of Electronics and Information Engineering, Soochow University, Suzhou 215006, PR China
| | - He Huang
- School of Electronics and Information Engineering, Soochow University, Suzhou 215006, PR China.
| |
Collapse
|
9
|
|
10
|
Kobayashi M. Synthesis of complex- and hyperbolic-valued Hopfield neural networks. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
11
|
Zhang H, Zhang Y, Zhu S, Xu D. Deterministic convergence of complex mini-batch gradient learning algorithm for fully complex-valued neural networks. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.04.114] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
12
|
Scardapane S, Van Vaerenbergh S, Hussain A, Uncini A. Complex-Valued Neural Networks With Nonparametric Activation Functions. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2020. [DOI: 10.1109/tetci.2018.2872600] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
13
|
Adaptive complex-valued stepsize based fast learning of complex-valued neural networks. Neural Netw 2020; 124:233-242. [DOI: 10.1016/j.neunet.2020.01.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2019] [Revised: 12/05/2019] [Accepted: 01/14/2020] [Indexed: 11/23/2022]
|
14
|
|
15
|
|
16
|
Kobayashi M. Singularities of Three-Layered Complex-Valued Neural Networks With Split Activation Function. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:1900-1907. [PMID: 28422693 DOI: 10.1109/tnnls.2017.2688322] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
There are three important concepts related to learning processes in neural networks: reducibility, nonminimality, and singularity. Although the definitions of these three concepts differ, they are equivalent in real-valued neural networks. This is also true of complex-valued neural networks (CVNNs) with hidden neurons not employing biases. The situation of CVNNs with hidden neurons employing biases, however, is very complicated. Exceptional reducibility was found, and it was shown that reducibility and nonminimality are not the same. Irreducibility consists of minimality and exceptional reducibility. The relationship between minimality and singularity has not yet been established. In this paper, we describe our surprising finding that minimality and singularity are independent. We also provide several examples based on exceptional reducibility.
Collapse
|
17
|
A novel conjugate gradient method with generalized Armijo search for efficient training of feedforward neural networks. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.08.037] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
18
|
|