1
|
Liu Y, Lee A, Qian K, Zhang P, Xiao Z, He H, Ren Z, Cheung SK, Liu R, Li Y, Zhang X, Ma Z, Zhao J, Zhao W, Yu G, Wang X, Liu J, Wang Z, Wang KL, Shao Q. Cryogenic in-memory computing using magnetic topological insulators. NATURE MATERIALS 2025; 24:559-564. [PMID: 39870991 DOI: 10.1038/s41563-024-02088-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 11/20/2024] [Indexed: 01/29/2025]
Abstract
Machine learning algorithms have proven to be effective for essential quantum computation tasks such as quantum error correction and quantum control. Efficient hardware implementation of these algorithms at cryogenic temperatures is essential. Here we utilize magnetic topological insulators as memristors (termed magnetic topological memristors) and introduce a cryogenic in-memory computing scheme based on the coexistence of a chiral edge state and a topological surface state. The memristive switching and reading of the giant anomalous Hall effect exhibit high energy efficiency, high stability and low stochasticity. We achieve high accuracy in a proof-of-concept classification task using four magnetic topological memristors. Furthermore, our algorithm-level and circuit-level simulations of large-scale neural networks demonstrate software-level accuracy and lower energy consumption for image recognition and quantum state preparation compared with existing magnetic memristor and complementary metal-oxide-semiconductor technologies. Our results not only showcase a new application of chiral edge states but also may inspire further topological quantum-physics-based novel computing schemes.
Collapse
Affiliation(s)
- Yuting Liu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
- School of Integrated Circuit, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Albert Lee
- Device Research Laboratory, Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, CA, USA
| | - Kun Qian
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
- IAS Center for Quantum Technologies, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Peng Zhang
- Device Research Laboratory, Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, CA, USA
| | - Zhihua Xiao
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
- ACCESS - AI Chip Center for Emerging Smart Systems, InnoHK Centers, Hong Kong, China
| | - Haoran He
- Device Research Laboratory, Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, CA, USA
| | - Zheyu Ren
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
- IAS Center for Quantum Technologies, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Shun Kong Cheung
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Ruizi Liu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
- IAS Center for Quantum Technologies, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Yaoyin Li
- School of Integrated Circuit, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Xu Zhang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Zichao Ma
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Jianyuan Zhao
- School of Integrated Circuit, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Weiwei Zhao
- School of Integrated Circuit, Harbin Institute of Technology (Shenzhen), Shenzhen, China
| | - Guoqiang Yu
- Beijing National Laboratory for Condensed Matter Physics, Institute of Physics, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing, China
| | - Xin Wang
- Department of Physics, City University of Hong Kong, Hong Kong, China
| | - Junwei Liu
- IAS Center for Quantum Technologies, The Hong Kong University of Science and Technology, Hong Kong, China
- Department of Physics, The Hong Kong University of Science and Technology, Hong Kong, China
| | - Zhongrui Wang
- School of Microelectronics, Southern University of Science and Technology, Shenzhen, China
| | - Kang L Wang
- Device Research Laboratory, Department of Electrical and Computer Engineering, University of California, Los Angeles, Los Angeles, CA, USA
| | - Qiming Shao
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.
- IAS Center for Quantum Technologies, The Hong Kong University of Science and Technology, Hong Kong, China.
- ACCESS - AI Chip Center for Emerging Smart Systems, InnoHK Centers, Hong Kong, China.
- Department of Physics, The Hong Kong University of Science and Technology, Hong Kong, China.
- Guangdong-Hong Kong-Macao Joint Laboratory for Intelligent Micro-Nano Optoelectronic Technology, The Hong Kong University of Science and Technology, Hong Kong, China.
| |
Collapse
|
2
|
Li Y, Zhou Z, Sun C, Chen X, Yan R. Variational Attention-Based Interpretable Transformer Network for Rotary Machine Fault Diagnosis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:6180-6193. [PMID: 36094988 DOI: 10.1109/tnnls.2022.3202234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Deep learning technology provides a promising approach for rotary machine fault diagnosis (RMFD), where vibration signals are commonly utilized as input of a deep network model to reveal the internal state of machinery. However, most existing methods fail to mine association relationships within signals. Unlike deep neural networks, transformer networks are capable of capturing association relationships through the global self-attention mechanism to enhance feature representations from vibration signals. Despite this, transformer networks cannot explicitly establish the causal association between signal patterns and fault types, resulting in poor interpretability. To tackle these problems, an interpretable deep learning model named the variational attention-based transformer network (VATN) is proposed for RMFD. VATN is improved from transformer encoder to mine the association relationships within signals. To embed the prior knowledge of the fault type, which can be recognized based on several key features of vibration signals, a sparse constraint is designed for attention weights. Variational inference is employed to force attention weights to samples from Dirichlet distributions, and Laplace approximation is applied to realize reparameterization. Finally, two experimental studies conducted on bevel gear and bearing datasets demonstrate the effectiveness of VATN to other comparison methods, and the heat map of attention weights illustrates the causal association between fault types and signal patterns.
Collapse
|
3
|
Lu W, Zhang Z, Qin F, Zhang W, Lu Y, Liu Y, Zheng Y. Analysis on the inherent noise tolerance of feedforward network and one noise-resilient structure. Neural Netw 2023; 165:786-798. [PMID: 37418861 DOI: 10.1016/j.neunet.2023.06.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 03/11/2023] [Accepted: 06/06/2023] [Indexed: 07/09/2023]
Abstract
In the past few decades, feedforward neural networks have gained much attraction in their hardware implementations. However, when we realize a neural network in analog circuits, the circuit-based model is sensitive to hardware nonidealities. The nonidealities, such as random offset voltage drifts and thermal noise, may lead to variation in hidden neurons and further affect neural behaviors. This paper considers that time-varying noise exists at the input of hidden neurons, with zero-mean Gaussian distribution. First, we derive lower and upper bounds on the mean square error loss to estimate the inherent noise tolerance of a noise-free trained feedforward network. Then, the lower bound is extended for any non-Gaussian noise cases based on the Gaussian mixture model concept. The upper bound is generalized for any non-zero-mean noise case. As the noise could degrade the neural performance, a new network architecture is designed to suppress the noise effect. This noise-resilient design does not require any training process. We also discuss its limitation and give a closed-form expression to describe the noise tolerance when the limitation is exceeded.
Collapse
Affiliation(s)
- Wenhao Lu
- School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, 639798, Singapore
| | - Zhengyuan Zhang
- School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, 639798, Singapore
| | - Feng Qin
- School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, 639798, Singapore; State Key Laboratory for Manufacturing Systems Engineering, Xi'an Jiaotong University, No. 99 Yanxiang Road, Yanta District, Xi'an, 710054 Shaanxi, China; International Joint Laboratory for Micro/Nano Manufacturing and Measurement Technologies, Xi'an Jiaotong University, Xi'an, 710049 Shaanxi, China
| | - Wenwen Zhang
- School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, 639798, Singapore
| | - Yuncheng Lu
- School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, 639798, Singapore
| | - Yue Liu
- School of Mechanical Engineering, Shanghai Dianji University, Shanghai, 201306, China.
| | - Yuanjin Zheng
- School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, 639798, Singapore.
| |
Collapse
|
4
|
Buczynski M, Chlebus M. GARCHNet: Value-at-Risk Forecasting with GARCH Models Based on Neural Networks. COMPUTATIONAL ECONOMICS 2023:1-31. [PMID: 37362594 PMCID: PMC10201522 DOI: 10.1007/s10614-023-10390-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 04/04/2023] [Indexed: 06/28/2023]
Abstract
This paper proposes a new GARCH specification that adapts the architecture of a long-term short memory neural network (LSTM). It is shown that classical GARCH models generally give good results in financial modeling, where high volatility can be observed. In particular, their high value is often praised in Value-at-Risk. However, the lack of nonlinear structure in most approaches means that conditional variance is not adequately represented in the model. On the contrary, the recent rapid development of deep learning methods is able to describe any nonlinear relationship in a clear way. We propose GARCHNet, a nonlinear approach to conditional variance that combines LSTM neural networks with maximum likelihood estimators in GARCH. The variance distributions considered in the paper are normal, t and skewed t, but the approach allows extension to other distributions. To evaluate our model, we conducted an empirical study on the logarithmic returns of the WIG 20 (Warsaw Stock Exchange Index), S&P 500 (Standard & Poor's 500) and FTSE 100 (Financial Times Stock Exchange) indices over four different time periods from 2005 to 2021 with different levels of observed volatility. Our results confirm the validity of the solution, but we provide some directions for its further development.
Collapse
Affiliation(s)
- Mateusz Buczynski
- Faculty of Economic Sciences, University of Warsaw, Dluga 44/50, Warsaw, Poland
- Interdisciplinary Doctoral School, University of Warsaw, Dobra 56/66, Warsaw, Poland
| | - Marcin Chlebus
- Interdisciplinary Doctoral School, University of Warsaw, Dobra 56/66, Warsaw, Poland
| |
Collapse
|
5
|
Effect of germ orientation during Vis-NIR hyperspectral imaging for the detection of fungal contamination in maize kernel using PLS-DA, ANN and 1D-CNN modelling. Food Control 2022. [DOI: 10.1016/j.foodcont.2022.109077] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
6
|
Chen X, Zeng Y, Kang S, Jin R. INN: An Interpretable Neural Network for AI Incubation in Manufacturing. ACM T INTEL SYST TEC 2022. [DOI: 10.1145/3519313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
Both artificial intelligence (AI) and domain knowledge from human experts play an important role in manufacturing decision-making. While smart manufacturing emphasizes a fully automated data-driven decision-making, the AI incubation process involves human experts to enhance AI systems by integrating domain knowledge for modeling, data collection and annotation, and feature extraction. Such an AI incubation process will not only enhance the domain knowledge discovery, but also improve the interpretability and trustworthiness of AI methods. In this paper, we focus on the knowledge transfer from human experts to a supervised learning problem by learning domain knowledge as interpretable features and rules, which can be used to construct rule-based systems to support manufacturing decision-making, such as process modeling and quality inspection. Although many advanced statistical and machine learning methods have shown promising modeling accuracy and efficiency, rule-based systems are still highly preferred and widely adopted due to their interpretability for human experts to comprehend. However, most of the existing rule-based systems are constructed based on deterministic human-crafted rules, whose parameters, e.g., thresholds of decision rules, are suboptimal. On the other hand, the machine learning methods, such as tree models or neural networks, can learn a decision-rule based structure without much interpretation or agreement with domain knowledge. Therefore, the traditional machine learning models and human experts’ domain knowledge cannot be directly improved by learning from data. In this research, we propose an interpretable neural network (INN) model with a center-adjustable Sigmoid activation function to efficiently optimize the rule-based systems. Using the rule-based system from domain knowledge to regulate the INN architecture will not only improve the prediction accuracy with optimized parameters, but also ensure the interpretability by adopting the interpretable rule-based systems from domain knowledge. The proposed INN will be effective for supervised learning problems when rule-based systems are available. The merits of INN model are demonstrated via a simulation study and a real case study in the quality modeling of a semiconductor manufacturing process. The source code of this paper is hosted here: https://github.com/XiaoyuChenUofL/Interpretable-Neural-Network.
Collapse
Affiliation(s)
- Xiaoyu Chen
- Department of Industrial Engineering, University of Louisville, USA
| | - Yingyan Zeng
- Grado Department of Industrial and Systems Engineering, Virginia Tech, USA
| | - Sungku Kang
- Civil and Environmental Engineering, Northeastern University, USA
| | - Ran Jin
- Grado Department of Industrial and Systems Engineering, Virginia Tech, USA
| |
Collapse
|
7
|
Luo X, Liu Z, Jin L, Zhou Y, Zhou M. Symmetric Nonnegative Matrix Factorization-Based Community Detection Models and Their Convergence Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:1203-1215. [PMID: 33513110 DOI: 10.1109/tnnls.2020.3041360] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Community detection is a popular yet thorny issue in social network analysis. A symmetric and nonnegative matrix factorization (SNMF) model based on a nonnegative multiplicative update (NMU) scheme is frequently adopted to address it. Current research mainly focuses on integrating additional information into it without considering the effects of a learning scheme. This study aims to implement highly accurate community detectors via the connections between an SNMF-based community detector's detection accuracy and an NMU scheme's scaling factor. The main idea is to adjust such scaling factor via a linear or nonlinear strategy, thereby innovatively implementing several scaling-factor-adjusted NMU schemes. They are applied to SNMF and graph-regularized SNMF models to achieve four novel SNMF-based community detectors. Theoretical studies indicate that with the proposed schemes and proper hyperparameter settings, each model can: 1) keep its loss function nonincreasing during its training process and 2) converge to a stationary point. Empirical studies on eight social networks show that they achieve significant accuracy gain in community detection over the state-of-the-art community detectors.
Collapse
|
8
|
Fan FL, Xiong J, Li M, Wang G. On Interpretability of Artificial Neural Networks: A Survey. IEEE TRANSACTIONS ON RADIATION AND PLASMA MEDICAL SCIENCES 2021; 5:741-760. [PMID: 35573928 PMCID: PMC9105427 DOI: 10.1109/trpms.2021.3066428] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2023]
Abstract
Deep learning as represented by the artificial deep neural networks (DNNs) has achieved great success recently in many important areas that deal with text, images, videos, graphs, and so on. However, the black-box nature of DNNs has become one of the primary obstacles for their wide adoption in mission-critical applications such as medical diagnosis and therapy. Because of the huge potentials of deep learning, increasing the interpretability of deep neural networks has recently attracted much research attention. In this paper, we propose a simple but comprehensive taxonomy for interpretability, systematically review recent studies in improving interpretability of neural networks, describe applications of interpretability in medicine, and discuss possible future research directions of interpretability, such as in relation to fuzzy logic and brain science.
Collapse
Affiliation(s)
- Feng-Lei Fan
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Jinjun Xiong
- IBM Thomas J. Watson Research Center, Yorktown Heights, NY, 10598, USA
| | - Mengzhou Li
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Ge Wang
- Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY, USA
| |
Collapse
|
9
|
Li H, Weng J, Mao Y, Wang Y, Zhan Y, Cai Q, Gu W. Adaptive Dropout Method Based on Biological Principles. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:4267-4276. [PMID: 33872159 DOI: 10.1109/tnnls.2021.3070895] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Dropout is one of the most widely used methods to avoid overfitting neural networks. However, it rigidly and randomly activates neurons according to a fixed probability, which is not consistent with the activation mode of neurons in the human cerebral cortex. Inspired by gene theory and the activation mechanism of brain neurons, we propose a more intelligent adaptive dropout, in which a variational self-encoder (VAE) overlaps to an existing neural network to regularize its hidden neurons by adaptively setting activities to zero. Through alternating iterative training, the discarding probability of each hidden neuron can be learned according to the weights and thus effectively avoid the shortcomings of the standard dropout method. The experimental results in multiple data sets illustrate that this method can better suppress overfitting in various neural networks than can the standard dropout. Additionally, this adaptive dropout technique can reduce the number of neurons and improve training efficiency.
Collapse
|
10
|
Thakkar A, Patel D, Shah P. Pearson Correlation Coefficient-based performance enhancement of Vanilla Neural Network for stock trend prediction. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06290-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
11
|
Affiliation(s)
- Sang Jun Moon
- Department of Statistics, University of Seoul, Seoul, South Korea
| | - Jong-June Jeon
- Department of Statistics, University of Seoul, Seoul, South Korea
| | | | - Yongdai Kim
- Department of Statistics, Seoul National University, Seoul, South Korea
| |
Collapse
|
12
|
Le V, Quinn TP, Tran T, Venkatesh S. Deep in the Bowel: Highly Interpretable Neural Encoder-Decoder Networks Predict Gut Metabolites from Gut Microbiome. BMC Genomics 2020; 21:256. [PMID: 32689932 PMCID: PMC7370527 DOI: 10.1186/s12864-020-6652-7] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 03/04/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Technological advances in next-generation sequencing (NGS) and chromatographic assays [e.g., liquid chromatography mass spectrometry (LC-MS)] have made it possible to identify thousands of microbe and metabolite species, and to measure their relative abundance. In this paper, we propose a sparse neural encoder-decoder network to predict metabolite abundances from microbe abundances. RESULTS Using paired data from a cohort of inflammatory bowel disease (IBD) patients, we show that our neural encoder-decoder model outperforms linear univariate and multivariate methods in terms of accuracy, sparsity, and stability. Importantly, we show that our neural encoder-decoder model is not simply a black box designed to maximize predictive accuracy. Rather, the network's hidden layer (i.e., the latent space, comprised only of sparsely weighted microbe counts) actually captures key microbe-metabolite relationships that are themselves clinically meaningful. Although this hidden layer is learned without any knowledge of the patient's diagnosis, we show that the learned latent features are structured in a way that predicts IBD and treatment status with high accuracy. CONCLUSIONS By imposing a non-negative weights constraint, the network becomes a directed graph where each downstream node is interpretable as the additive combination of the upstream nodes. Here, the middle layer comprises distinct microbe-metabolite axes that relate key microbial biomarkers with metabolite biomarkers. By pre-processing the microbiome and metabolome data using compositional data analysis methods, we ensure that our proposed multi-omics workflow will generalize to any pair of -omics data. To the best of our knowledge, this work is the first application of neural encoder-decoders for the interpretable integration of multi-omics biological data.
Collapse
Affiliation(s)
- Vuong Le
- Applied AI Institute, Deakin University, Geelong, Australia
| | | | - Truyen Tran
- Applied AI Institute, Deakin University, Geelong, Australia
| | | |
Collapse
|
13
|
|
14
|
Chen J, Wu Z, Zhang J, Li F. Mutual information-based dropout: Learning deep relevant feature representation architectures. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.04.090] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
15
|
Chen J, Wu Z, Zhang J. Driver identification based on hidden feature extraction by using adaptive nonnegativity-constrained autoencoder. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2018.09.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
16
|
Chen J, Wu Z, Zhang J, Li F, Li W, Wu Z. Cross-covariance regularized autoencoders for nonredundant sparse feature representation. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.07.050] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
17
|
Shen X, Tian X, Liu T, Xu F, Tao D. Continuous Dropout. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3926-3937. [PMID: 28981433 DOI: 10.1109/tnnls.2017.2750679] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Dropout has been proven to be an effective algorithm for training robust deep networks because of its ability to prevent overfitting by avoiding the co-adaptation of feature detectors. Current explanations of dropout include bagging, naive Bayes, regularization, and sex in evolution. According to the activation patterns of neurons in the human brain, when faced with different situations, the firing rates of neurons are random and continuous, not binary as current dropout does. Inspired by this phenomenon, we extend the traditional binary dropout to continuous dropout. On the one hand, continuous dropout is considerably closer to the activation characteristics of neurons in the human brain than traditional binary dropout. On the other hand, we demonstrate that continuous dropout has the property of avoiding the co-adaptation of feature detectors, which suggests that we can extract more independent feature detectors for model averaging in the test stage. We introduce the proposed continuous dropout to a feedforward neural network and comprehensively compare it with binary dropout, adaptive dropout, and DropConnect on Modified National Institute of Standards and Technology, Canadian Institute for Advanced Research-10, Street View House Numbers, NORB, and ImageNet large scale visual recognition competition-12. Thorough experiments demonstrate that our method performs better in preventing the co-adaptation of feature detectors and improves test performance.
Collapse
|
18
|
Ayinde BO, Zurada JM. Deep Learning of Constrained Autoencoders for Enhanced Understanding of Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:3969-3979. [PMID: 28961128 DOI: 10.1109/tnnls.2017.2747861] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Unsupervised feature extractors are known to perform an efficient and discriminative representation of data. Insight into the mappings they perform and human ability to understand them, however, remain very limited. This is especially prominent when multilayer deep learning architectures are used. This paper demonstrates how to remove these bottlenecks within the architecture of non-negativity constrained autoencoder. It is shown that using both L1 and L2 regularizations that induce non-negativity of weights, most of the weights in the network become constrained to be non-negative, thereby resulting into a more understandable structure with minute deterioration in classification accuracy. Also, this proposed approach extracts features that are more sparse and produces additional output layer sparsification. The method is analyzed for accuracy and feature interpretation on the MNIST data, the NORB normalized uniform object data, and the Reuters text categorization data set.
Collapse
|
19
|
Pla A, Zhong X, Rayner S. miRAW: A deep learning-based approach to predict microRNA targets by analyzing whole microRNA transcripts. PLoS Comput Biol 2018; 14:e1006185. [PMID: 30005074 PMCID: PMC6067737 DOI: 10.1371/journal.pcbi.1006185] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2017] [Revised: 07/31/2018] [Accepted: 05/08/2018] [Indexed: 12/20/2022] Open
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs that regulate gene expression by binding to partially complementary regions within the 3'UTR of their target genes. Computational methods play an important role in target prediction and assume that the miRNA "seed region" (nt 2 to 8) is required for functional targeting, but typically only identify ∼80% of known bindings. Recent studies have highlighted a role for the entire miRNA, suggesting that a more flexible methodology is needed. We present a novel approach for miRNA target prediction based on Deep Learning (DL) which, rather than incorporating any knowledge (such as seed regions), investigates the entire miRNA and 3'TR mRNA nucleotides to learn a uninhibited set of feature descriptors related to the targeting process. We collected more than 150,000 experimentally validated homo sapiens miRNA:gene targets and cross referenced them with different CLIP-Seq, CLASH and iPAR-CLIP datasets to obtain ∼20,000 validated miRNA:gene exact target sites. Using this data, we implemented and trained a deep neural network-composed of autoencoders and a feed-forward network-able to automatically learn features describing miRNA-mRNA interactions and assess functionality. Predictions were then refined using information such as site location or site accessibility energy. In a comparison using independent datasets, our DL approach consistently outperformed existing prediction methods, recognizing the seed region as a common feature in the targeting process, but also identifying the role of pairings outside this region. Thermodynamic analysis also suggests that site accessibility plays a role in targeting but that it cannot be used as a sole indicator for functionality. Data and source code available at: https://bitbucket.org/account/user/bipous/projects/MIRAW.
Collapse
Affiliation(s)
- Albert Pla
- Department of Medical Genetics, University of Oslo, Oslo, Norway
| | - Xiangfu Zhong
- Department of Medical Genetics, University of Oslo, Oslo, Norway
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway
| | - Simon Rayner
- Department of Medical Genetics, University of Oslo, Oslo, Norway
- Department of Medical Genetics, Oslo University Hospital, Oslo, Norway
| |
Collapse
|
20
|
|
21
|
Dai X, Li C, He X, Li C. Nonnegative matrix factorization algorithms based on the inertial projection neural network. Neural Comput Appl 2018. [DOI: 10.1007/s00521-017-3337-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
22
|
Fan J, Wang J. A Collective Neurodynamic Optimization Approach to Nonnegative Matrix Factorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:2344-2356. [PMID: 27429450 DOI: 10.1109/tnnls.2016.2582381] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Nonnegative matrix factorization (NMF) is an advanced method for nonnegative feature extraction, with widespread applications. However, the NMF solution often entails to solve a global optimization problem with a nonconvex objective function and nonnegativity constraints. This paper presents a collective neurodynamic optimization (CNO) approach to this challenging problem. The proposed collective neurodynamic system consists of a population of recurrent neural networks (RNNs) at the lower level and a particle swarm optimization (PSO) algorithm with wavelet mutation at the upper level. The RNNs act as search agents carrying out precise local searches according to their neurodynamics and initial conditions. The PSO algorithm coordinates and guides the RNNs with updated initial states toward global optimal solution(s). A wavelet mutation operator is added to enhance PSO exploration diversity. Through iterative interaction and improvement of the locally best solutions of RNNs and global best positions of the whole population, the population-based neurodynamic systems are almost sure able to achieve the global optimality for the NMF problem. It is proved that the convergence of the group-best state to the global optimal solution with probability one. The experimental results substantiate the efficacy and superiority of the CNO approach to bound-constrained global optimization with several benchmark nonconvex functions and NMF-based clustering with benchmark data sets in comparison with the state-of-the-art algorithms.
Collapse
|
23
|
Ayinde BO, Zurada JM. Nonredundant sparse feature extraction using autoencoders with receptive fields clustering. Neural Netw 2017; 93:99-109. [DOI: 10.1016/j.neunet.2017.04.012] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2016] [Revised: 04/14/2017] [Accepted: 04/24/2017] [Indexed: 11/28/2022]
|
24
|
Bologna G, Hayashi Y. Characterization of Symbolic Rules Embedded in Deep DIMLP Networks: A Challenge to Transparency of Deep Learning. JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH 2017. [DOI: 10.1515/jaiscr-2017-0019] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Abstract
Rule extraction from neural networks is a fervent research topic. In the last 20 years many authors presented a number of techniques showing how to extract symbolic rules from Multi Layer Perceptrons (MLPs). Nevertheless, very few were related to ensembles of neural networks and even less for networks trained by deep learning. On several datasets we performed rule extraction from ensembles of Discretized Interpretable Multi Layer Perceptrons (DIMLP), and DIMLPs trained by deep learning. The results obtained on the Thyroid dataset and the Wisconsin Breast Cancer dataset show that the predictive accuracy of the extracted rules compare very favorably with respect to state of the art results. Finally, in the last classification problem on digit recognition, generated rules from the MNIST dataset can be viewed as discriminatory features in particular digit areas. Qualitatively, with respect to rule complexity in terms of number of generated rules and number of antecedents per rule, deep DIMLPs and DIMLPs trained by arcing give similar results on a binary classification problem involving digits 5 and 8. On the whole MNIST problem we showed that it is possible to determine the feature detectors created by neural networks and also that the complexity of the extracted rulesets can be well balanced between accuracy and interpretability.
Collapse
Affiliation(s)
- Guido Bologna
- Department of Computer Science, University of Applied Science of Western Switzerland , Rue de la Prairie 4, Geneva 1202, Switzerland
| | - Yoichi Hayashi
- Department of Computer Science, Meiji University , Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| |
Collapse
|
25
|
Hosseini-Asl E, Zurada JM, Nasraoui O. Deep Learning of Part-Based Representation of Data Using Sparse Autoencoders With Nonnegativity Constraints. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:2486-2498. [PMID: 26529786 DOI: 10.1109/tnnls.2015.2479223] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We demonstrate a new deep learning autoencoder network, trained by a nonnegativity constraint algorithm (nonnegativity-constrained autoencoder), that learns features that show part-based representation of data. The learning algorithm is based on constraining negative weights. The performance of the algorithm is assessed based on decomposing data into parts and its prediction performance is tested on three standard image data sets and one text data set. The results indicate that the nonnegativity constraint forces the autoencoder to learn features that amount to a part-based representation of data, while improving sparsity and reconstruction quality in comparison with the traditional sparse autoencoder and nonnegative matrix factorization. It is also shown that this newly acquired representation improves the prediction performance of a deep neural network.
Collapse
|
26
|
Yeung DS, Li JC, Ng WWY, Chan PPK. MLPNN Training via a Multiobjective Optimization of Training Error and Stochastic Sensitivity. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:978-992. [PMID: 26054075 DOI: 10.1109/tnnls.2015.2431251] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The training of a multilayer perceptron neural network (MLPNN) concerns the selection of its architecture and the connection weights via the minimization of both the training error and a penalty term. Different penalty terms have been proposed to control the smoothness of the MLPNN for better generalization capability. However, controlling its smoothness using, for instance, the norm of weights or the Vapnik-Chervonenkis dimension cannot distinguish individual MLPNNs with the same number of free parameters or the same norm. In this paper, to enhance generalization capabilities, we propose a stochastic sensitivity measure (ST-SM) to realize a new penalty term for MLPNN training. The ST-SM determines the expectation of the squared output differences between the training samples and the unseen samples located within their Q -neighborhoods for a given MLPNN. It provides a direct measurement of the MLPNNs output fluctuations, i.e., smoothness. We adopt a two-phase Pareto-based multiobjective training algorithm for minimizing both the training error and the ST-SM as biobjective functions. Experiments on 20 UCI data sets show that the MLPNNs trained by the proposed algorithm yield better accuracies on testing data than several recent and classical MLPNN training methods.
Collapse
|
27
|
Junqué de Fortuny E, Martens D. Active Learning-Based Pedagogical Rule Extraction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:2664-2677. [PMID: 25622329 DOI: 10.1109/tnnls.2015.2389037] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Many of the state-of-the-art data mining techniques introduce nonlinearities in their models to cope with complex data relationships effectively. Although such techniques are consistently included among the top classification techniques in terms of predictive power, their lack of transparency renders them useless in any domain where comprehensibility is of importance. Rule-extraction algorithms remedy this by distilling comprehensible rule sets from complex models that explain how the classifications are made. This paper considers a new rule extraction technique, based on active learning. The technique generates artificial data points around training data with low confidence in the output score, after which these are labeled by the black-box model. The main novelty of the proposed method is that it uses a pedagogical approach without making any architectural assumptions of the underlying model. It can therefore be applied to any black-box technique. Furthermore, it can generate any rule format, depending on the chosen underlying rule induction technique. In a large-scale empirical study, we demonstrate the validity of our technique to extract trees and rules from artificial neural networks, support vector machines, and random forests, on 25 data sets of varying size and dimensionality. Our results show that not only do the generated rules explain the black-box models well (thereby facilitating the acceptance of such models), the proposed algorithm also performs significantly better than traditional rule induction techniques in terms of accuracy as well as fidelity.
Collapse
|