1
|
Borisov V, Leemann T, Sebler K, Haug J, Pawelczyk M, Kasneci G. Deep Neural Networks and Tabular Data: A Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7499-7519. [PMID: 37015381 DOI: 10.1109/tnnls.2022.3229161] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Heterogeneous tabular data are the most commonly used form of data and are essential for numerous critical and computationally demanding applications. On homogeneous datasets, deep neural networks have repeatedly shown excellent performance and have therefore been widely adopted. However, their adaptation to tabular data for inference or data generation tasks remains highly challenging. To facilitate further progress in the field, this work provides an overview of state-of-the-art deep learning methods for tabular data. We categorize these methods into three groups: data transformations, specialized architectures, and regularization models. For each of these groups, our work offers a comprehensive overview of the main approaches. Moreover, we discuss deep learning approaches for generating tabular data and also provide an overview over strategies for explaining deep models on tabular data. Thus, our first contribution is to address the main research streams and existing methodologies in the mentioned areas while highlighting relevant challenges and open research questions. Our second contribution is to provide an empirical comparison of traditional machine learning methods with 11 deep learning approaches across five popular real-world tabular datasets of different sizes and with different learning objectives. Our results, which we have made publicly available as competitive benchmarks, indicate that algorithms based on gradient-boosted tree ensembles still mostly outperform deep learning models on supervised learning tasks, suggesting that the research progress on competitive deep learning models for tabular data is stagnating. To the best of our knowledge, this is the first in-depth overview of deep learning approaches for tabular data; as such, this work can serve as a valuable starting point to guide researchers and practitioners interested in deep learning with tabular data.
Collapse
|
2
|
Ding S, Chen R, Liu H, Liu F, Zhang J. IRMSwin-T: A lightweight shifted windows transformer based on inverted residual structure and residual multi-layer perceptron for rolling bearing fault diagnosis. THE REVIEW OF SCIENTIFIC INSTRUMENTS 2023; 94:095116. [PMID: 37737703 DOI: 10.1063/5.0171091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 09/05/2023] [Indexed: 09/23/2023]
Abstract
The data-driven fault diagnosis method has achieved many good results. However, classical convolutional and recurrent neural networks have problems with large parameters and poor anti-noise performance. To solve these problems, we propose a lightweight shifted windows transformer based on inverted residual structure and residual multi-layer perceptron (IRMSwin-T) for fault diagnosis of rolling bearings. First, the original data are expanded by using overlapping sampling technology. Then, the collected one-dimensional vibration signals are vector serialized by using the patch embedding strategy. Finally, the IRMSwin-T network is developed to extract features of vector sequences and classify faults. The experimental results showed that compared with mainstream lightweight models, the IRMSwin-T model in this paper has fewer parameters and higher diagnostic accuracy.
Collapse
Affiliation(s)
- Shanshan Ding
- College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, People's Republic of China
| | - Renwen Chen
- College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, People's Republic of China
| | - Hao Liu
- College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, People's Republic of China
| | - Fei Liu
- College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, People's Republic of China
| | - Junyi Zhang
- College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, People's Republic of China
| |
Collapse
|
3
|
Jin Y, Hou L, Chen Y. A Time Series Transformer based method for the rotating machinery fault diagnosis. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.04.111] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
4
|
The Effect of Learning Rate on Fractal Image Coding Using Artificial Neural Networks. FRACTAL AND FRACTIONAL 2022. [DOI: 10.3390/fractalfract6050280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
The amount by which the artificial neural network weights are updated during the training process is called the learning rate. More precisely, the learning rate is an adjustable parameter used in training neural networks in which small values, often in the interval [0, 1], are handled. The learning rate determines how quickly the model updates its weights to adapt to the problem. Smaller learning rates require more training periods due to small changes to the weights per refresh cycle, while larger learning rates lead to faster changes and require fewer training periods. In this paper, the effect of changing the learning rate value in the artificial neural network designed to solve the inverse problem of fractals was studied. Some results were obtained showing the impact of this change, whether when using large values of the learning rate or small values based on the type of fractal shape required to identify the recursive functions that generate it.
Collapse
|
5
|
Connecting weighted automata, tensor networks and recurrent neural networks through spectral learning. Mach Learn 2022. [DOI: 10.1007/s10994-022-06164-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
6
|
Abstract
A major goal of linguistics and cognitive science is to understand what class of learning systems can acquire natural language. Until recently, the computational requirements of language have been used to argue that learning is impossible without a highly constrained hypothesis space. Here, we describe a learning system that is maximally unconstrained, operating over the space of all computations, and is able to acquire many of the key structures present in natural language from positive evidence alone. We demonstrate this by providing the same learning model with data from 74 distinct formal languages which have been argued to capture key features of language, have been studied in experimental work, or come from an interesting complexity class. The model is able to successfully induce the latent system generating the observed strings from small amounts of evidence in almost all cases, including for regular (e.g., an , [Formula: see text], and [Formula: see text]), context-free (e.g., [Formula: see text], and [Formula: see text]), and context-sensitive (e.g., [Formula: see text], and xx) languages, as well as for many languages studied in learning experiments. These results show that relatively small amounts of positive evidence can support learning of rich classes of generative computations over structures. The model provides an idealized learning setup upon which additional cognitive constraints and biases can be formalized.
Collapse
|
7
|
Kaadoud IC, Rougier NP, Alexandre F. Knowledge extraction from the learning of sequences in a long short term memory (LSTM) architecture. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107657] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Comparison of the Deep Learning Performance for Short-Term Power Load Forecasting. SUSTAINABILITY 2021. [DOI: 10.3390/su132212493] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Electricity demand forecasting enables the stable operation of electric power systems and reduces electric power consumption. Previous studies have predicted electricity demand through a correlation analysis between power consumption and weather data; however, this analysis does not consider the influence of various factors on power consumption, such as industrial activities, economic factors, power horizon, and resident living patterns of buildings. This study proposes an efficient power demand prediction using deep learning techniques for two industrial buildings with different power consumption patterns. The problems are presented by analyzing the correlation between the power consumption and weather data by season for industrial buildings with different power consumption patterns. Four models were analyzed using the most important factors for predicting power consumption and weather data (temperature, humidity, sunlight, solar radiation, total cloud cover, wind speed, wind direction, humidity, and vapor pressure). The prediction horizon for power consumption forecasting was kept at 24 h. The existing deep learning methods (DNN, RNN, CNN, and LSTM) cannot accurately predict power consumption when it increases or decreases rapidly. Hence, a method to reduce this prediction error is proposed. DNN, RNN, and LSTM were superior when using two-year electricity consumption rather than one-year electricity consumption and weather data.
Collapse
|
9
|
Distillation of weighted automata from recurrent neural networks using a spectral approach. Mach Learn 2021. [DOI: 10.1007/s10994-021-05948-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
10
|
|
11
|
Zhang K, Wang Q, Giles CL. An Entropy Metric for Regular Grammar Classification and Learning with Recurrent Neural Networks. ENTROPY 2021; 23:e23010127. [PMID: 33478020 PMCID: PMC7835824 DOI: 10.3390/e23010127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 12/28/2020] [Accepted: 01/13/2021] [Indexed: 12/02/2022]
Abstract
Recently, there has been a resurgence of formal language theory in deep learning research. However, most research focused on the more practical problems of attempting to represent symbolic knowledge by machine learning. In contrast, there has been limited research on exploring the fundamental connection between them. To obtain a better understanding of the internal structures of regular grammars and their corresponding complexity, we focus on categorizing regular grammars by using both theoretical analysis and empirical evidence. Specifically, motivated by the concentric ring representation, we relaxed the original order information and introduced an entropy metric for describing the complexity of different regular grammars. Based on the entropy metric, we categorized regular grammars into three disjoint subclasses: the polynomial, exponential and proportional classes. In addition, several classification theorems are provided for different representations of regular grammars. Our analysis was validated by examining the process of learning grammars with multiple recurrent neural networks. Our results show that as expected more complex grammars are generally more difficult to learn.
Collapse
Affiliation(s)
- Kaixuan Zhang
- Information Sciences and Technology, Pennsylvania State University, University Park, PA 16802, USA;
| | - Qinglong Wang
- Alibaba Group, Building A2, Lane 55 Chuan He Road Zhangjiang, Pudong New District, Shanghai 200135, China;
| | - C. Lee Giles
- Information Sciences and Technology, Pennsylvania State University, University Park, PA 16802, USA;
- Correspondence:
| |
Collapse
|
12
|
Kuntoğlu M, Aslan A, Pimenov DY, Usca ÜA, Salur E, Gupta MK, Mikolajczyk T, Giasin K, Kapłonek W, Sharma S. A Review of Indirect Tool Condition Monitoring Systems and Decision-Making Methods in Turning: Critical Analysis and Trends. SENSORS 2020; 21:s21010108. [PMID: 33375340 PMCID: PMC7794675 DOI: 10.3390/s21010108] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2020] [Revised: 12/20/2020] [Accepted: 12/22/2020] [Indexed: 01/29/2023]
Abstract
The complex structure of turning aggravates obtaining the desired results in terms of tool wear and surface roughness. The existence of high temperature and pressure make difficult to reach and observe the cutting area. In-direct tool condition, monitoring systems provide tracking the condition of cutting tool via several released or converted energy types, namely, heat, acoustic emission, vibration, cutting forces and motor current. Tool wear inevitably progresses during metal cutting and has a relationship with these energy types. Indirect tool condition monitoring systems use sensors situated around the cutting area to state the wear condition of the cutting tool without intervention to cutting zone. In this study, sensors mostly used in indirect tool condition monitoring systems and their correlations between tool wear are reviewed to summarize the literature survey in this field for the last two decades. The reviews about tool condition monitoring systems in turning are very limited, and relationship between measured variables such as tool wear and vibration require a detailed analysis. In this work, the main aim is to discuss the effect of sensorial data on tool wear by considering previous published papers. As a computer aided electronic and mechanical support system, tool condition monitoring paves the way for machining industry and the future and development of Industry 4.0.
Collapse
Affiliation(s)
- Mustafa Kuntoğlu
- Mechanical Engineering Department, Technology Faculty, Selcuk University, Selçuklu, 42130 Konya, Turkey;
| | - Abdullah Aslan
- Mechanical Engineering Department, Engineering and Architecture Faculty, Selcuk University, Akşehir, 42130 Konya, Turkey;
| | - Danil Yurievich Pimenov
- Department of Automated Mechanical Engineering, South Ural State University, Lenin Prosp. 76, 454080 Chelyabinsk, Russia;
- Correspondence:
| | - Üsame Ali Usca
- Mechanical Engineering Department, Engineering and Architecture Faculty, Bingöl University, 12000 Bingöl, Turkey;
| | - Emin Salur
- Department of Metallurgical and Materials Engineering, Selcuk University, Selçuklu, 42130 Konya, Turkey;
| | - Munish Kumar Gupta
- Department of Automated Mechanical Engineering, South Ural State University, Lenin Prosp. 76, 454080 Chelyabinsk, Russia;
- Key Laboratory of High Efficiency and Clean Mechanical Manufacture, Ministry of Education, School of Mechanical Engineering, Shandong University, Jinan 250100, China
| | - Tadeusz Mikolajczyk
- Department of Production Engineering, UTP University of Science and Technology, Al. Prof. S. Kaliskiego 7, 85-796 Bydgoszcz, Poland;
| | - Khaled Giasin
- School of Mechanical and Design Engineering, University of Portsmouth, Portsmouth PO1 3DJ, UK;
| | - Wojciech Kapłonek
- Department of Production Engineering, Faculty of Mechanical Engineering, Koszalin University of Technology, Racławicka 15-17, 75-620 Koszalin, Poland;
| | - Shubham Sharma
- Department of Mechanical Engineering, IKG Punjab Technical University, Jalandhar-Kapurthala Road, Kapurthala, Punjab 144603, India;
| |
Collapse
|
13
|
Alamia A, Gauducheau V, Paisios D, VanRullen R. Comparing feedforward and recurrent neural network architectures with human behavior in artificial grammar learning. Sci Rep 2020; 10:22172. [PMID: 33335190 PMCID: PMC7747619 DOI: 10.1038/s41598-020-79127-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2020] [Accepted: 12/03/2020] [Indexed: 11/24/2022] Open
Abstract
In recent years artificial neural networks achieved performance close to or better than humans in several domains: tasks that were previously human prerogatives, such as language processing, have witnessed remarkable improvements in state of the art models. One advantage of this technological boost is to facilitate comparison between different neural networks and human performance, in order to deepen our understanding of human cognition. Here, we investigate which neural network architecture (feedforward vs. recurrent) matches human behavior in artificial grammar learning, a crucial aspect of language acquisition. Prior experimental studies proved that artificial grammars can be learnt by human subjects after little exposure and often without explicit knowledge of the underlying rules. We tested four grammars with different complexity levels both in humans and in feedforward and recurrent networks. Our results show that both architectures can "learn" (via error back-propagation) the grammars after the same number of training sequences as humans do, but recurrent networks perform closer to humans than feedforward ones, irrespective of the grammar complexity level. Moreover, similar to visual processing, in which feedforward and recurrent architectures have been related to unconscious and conscious processes, the difference in performance between architectures over ten regular grammars shows that simpler and more explicit grammars are better learnt by recurrent architectures, supporting the hypothesis that explicit learning is best modeled by recurrent networks, whereas feedforward networks supposedly capture the dynamics involved in implicit learning.
Collapse
Affiliation(s)
| | | | - Dimitri Paisios
- CerCo, CNRS, 31055, Toulouse, France
- Laboratoire Cognition, Langues, Langage, Ergonomie, CNRS, Université Toulouse, Toulouse, France
| | - Rufin VanRullen
- CerCo, CNRS, 31055, Toulouse, France
- ANITI, Université de Toulouse, 31055, Toulouse, France
| |
Collapse
|
14
|
Mantas C. Interpretation of first-order recurrent neural networks by means of fuzzy rules. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-190215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- C.J. Mantas
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| |
Collapse
|
15
|
Wang Q, Zhang K, Ororbia Ii AG, Xing X, Liu X, Giles CL. An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks. Neural Comput 2018; 30:2568-2591. [PMID: 30021081 DOI: 10.1162/neco_a_01111] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Rule extraction from black box models is critical in domains that require model validation before implementation, as can be the case in credit scoring and medical diagnosis. Though already a challenging problem in statistical learning in general, the difficulty is even greater when highly nonlinear, recursive models, such as recurrent neural networks (RNNs), are fit to data. Here, we study the extraction of rules from second-order RNNs trained to recognize the Tomita grammars. We show that production rules can be stably extracted from trained RNNs and that in certain cases, the rules outperform the trained RNNs.
Collapse
Affiliation(s)
| | - Kaixuan Zhang
- Pennsylvania State University, State College, PA 16801, U.S.A.
| | | | - Xinyu Xing
- Pennsylvania State University, State College, PA 16801, U.S.A.
| | - Xue Liu
- McGill University, Montreal, Quebec H3A 0G4, Canada
| | - C Lee Giles
- Pennsylvania State University, State College, PA 16801, U.S.A.
| |
Collapse
|
16
|
|
17
|
Ororbia Ii AG, Mikolov T, Reitter D. Learning Simpler Language Models with the Differential State Framework. Neural Comput 2017; 29:3327-3352. [PMID: 28957029 DOI: 10.1162/neco_a_01017] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Learning useful information across long time lags is a critical and difficult problem for temporal neural models in tasks such as language modeling. Existing architectures that address the issue are often complex and costly to train. The differential state framework (DSF) is a simple and high-performing design that unifies previously introduced gated neural models. DSF models maintain longer-term memory by learning to interpolate between a fast-changing data-driven representation and a slowly changing, implicitly stable state. Within the DSF framework, a new architecture is presented, the delta-RNN. This model requires hardly any more parameters than a classical, simple recurrent network. In language modeling at the word and character levels, the delta-RNN outperforms popular complex architectures, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU), and, when regularized, performs comparably to several state-of-the-art baselines. At the subword level, the delta-RNN's performance is comparable to that of complex gated architectures.
Collapse
Affiliation(s)
- Alexander G Ororbia Ii
- College of Information Sciences and Technology, Pennsylvania State University, State College, PA 16802, U.S.A.
| | | | - David Reitter
- College of Information Sciences and Technology, Pennsylvania State University, State College, PA 16802, U.S.A.
| |
Collapse
|
18
|
Donnarumma F, Prevete R, Chersi F, Pezzulo G. A Programmer–Interpreter Neural Network Architecture for Prefrontal Cognitive Control. Int J Neural Syst 2015; 25:1550017. [DOI: 10.1142/s0129065715500173] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
There is wide consensus that the prefrontal cortex (PFC) is able to exert cognitive control on behavior by biasing processing toward task-relevant information and by modulating response selection. This idea is typically framed in terms of top-down influences within a cortical control hierarchy, where prefrontal-basal ganglia loops gate multiple input–output channels, which in turn can activate or sequence motor primitives expressed in (pre-)motor cortices. Here we advance a new hypothesis, based on the notion of programmability and an interpreter–programmer computational scheme, on how the PFC can flexibly bias the selection of sensorimotor patterns depending on internal goal and task contexts. In this approach, multiple elementary behaviors representing motor primitives are expressed by a single multi-purpose neural network, which is seen as a reusable area of "recycled" neurons (interpreter). The PFC thus acts as a "programmer" that, without modifying the network connectivity, feeds the interpreter networks with specific input parameters encoding the programs (corresponding to network structures) to be interpreted by the (pre-)motor areas. Our architecture is validated in a standard test for executive function: the 1-2-AX task. Our results show that this computational framework provides a robust, scalable and flexible scheme that can be iterated at different hierarchical layers, supporting the realization of multiple goals. We discuss the plausibility of the "programmer–interpreter" scheme to explain the functioning of prefrontal-(pre)motor cortical hierarchies.
Collapse
Affiliation(s)
- Francesco Donnarumma
- Institute of Cognitive Sciences and Technologies, National Research Council of Italy, Via S. Martino della Battaglia 44-00185 Roma, Italy
| | - Roberto Prevete
- Università degli Studi di Napoli Federico II, Dipartimento di Ingegneria Elettrica e Tecnologie dell'Informazione (DIETI), Via Claudio, 21, 80125 Napoli, Italy
| | - Fabian Chersi
- University College London, Institute of Cognitive Neuroscience, 17 Queen Square, London, WC1N 3AR, England
| | - Giovanni Pezzulo
- Institute of Cognitive Sciences and Technologies, National Research Council of Italy, Via S. Martino della Battaglia 44-00185 Rome, Italy
| |
Collapse
|
19
|
Sopena JM, Ramos PJ, López-Moliner J, Gilboy E. Composicionalidad, cómputo de estructura y redes neuronales. STUDIES IN PSYCHOLOGY 2014. [DOI: 10.1174/02109390260050021] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|
20
|
Bianchini M, Gori M, Maggini M. On the problem of local minima in recurrent neural networks. ACTA ACUST UNITED AC 2012; 5:167-77. [PMID: 18267788 DOI: 10.1109/72.279182] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Many researchers have recently focused their efforts on devising efficient algorithms, mainly based on optimization schemes, for learning the weights of recurrent neural networks. As in the case of feedforward networks, however, these learning algorithms may get stuck in local minima during gradient descent, thus discovering sub-optimal solutions. This paper analyses the problem of optimal learning in recurrent networks by proposing conditions that guarantee local minima free error surfaces. An example is given that also shows the constructive role of the proposed theory in designing networks suitable for solving a given task. Moreover, a formal relationship between recurrent and static feedforward networks is established such that the examples of local minima for feedforward networks already known in the literature can be associated with analogous ones in recurrent networks.
Collapse
|
21
|
Giles CL, Omlin CW. Pruning recurrent neural networks for improved generalization performance. ACTA ACUST UNITED AC 2012; 5:848-51. [PMID: 18267860 DOI: 10.1109/72.317740] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Determining the architecture of a neural network is an important issue for any learning task. For recurrent neural networks no general methods exist that permit the estimation of the number of layers of hidden neurons, the size of layers or the number of weights. We present a simple pruning heuristic that significantly improves the generalization performance of trained recurrent networks. We illustrate this heuristic by training a fully recurrent neural network on positive and negative strings of a regular grammar. We also show that rules extracted from networks trained with this pruning heuristic are more consistent with the rules to be learned. This performance improvement is obtained by pruning and retraining the networks. Simulations are shown for training and pruning a recurrent neural net on strings generated by two regular grammars, a randomly-generated 10-state grammar and an 8-state, triple-parity grammar. Further simulations indicate that this pruning method can have generalization performance superior to that obtained by training with weight decay.
Collapse
|
22
|
Jim KC, Giles CL, Horne BG. An analysis of noise in recurrent neural networks: convergence and generalization. ACTA ACUST UNITED AC 2012; 7:1424-38. [PMID: 18263536 DOI: 10.1109/72.548170] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Concerns the effect of noise on the performance of feedforward neural nets. We introduce and analyze various methods of injecting synaptic noise into dynamically driven recurrent nets during training. Theoretical results show that applying a controlled amount of noise during training may improve convergence and generalization performance. We analyze the effects of various noise parameters and predict that best overall performance can be achieved by injecting additive noise at each time step. Noise contributes a second-order gradient term to the error function which can be viewed as an anticipatory agent to aid convergence. This term appears to find promising regions of weight space in the beginning stages of training when the training error is large and should improve convergence on error surfaces with local minima. The first-order term is a regularization term that can improve generalization. Specifically, it can encourage internal representations where the state nodes operate in the saturated regions of the sigmoid discriminant function. While this effect can improve performance on automata inference problems with binary inputs and target outputs, it is unclear what effect it will have on other types of problems. To substantiate these predictions, we present simulations on learning the dual parity grammar from temporal strings for all noise models, and present simulations on learning a randomly generated six-state grammar using the predicted best noise model.
Collapse
Affiliation(s)
- K C Jim
- NEC Res. Inst., Princeton, NJ
| | | | | |
Collapse
|
23
|
Olurotimi O. Recurrent neural network training with feedforward complexity. ACTA ACUST UNITED AC 2012; 5:185-97. [PMID: 18267790 DOI: 10.1109/72.279184] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This paper presents a training method that is of no more than feedforward complexity for fully recurrent networks. The method is not approximate, but rather depends on an exact transformation that reveals an embedded feedforward structure in every recurrent network. It turns out that given any unambiguous training data set, such as samples of the state variables and their derivatives, we need only to train this embedded feedforward structure. The necessary recurrent network parameters are then obtained by an inverse transformation that consists only of linear operators. As an example of modeling a representative nonlinear dynamical system, the method is applied to learn Bessel's differential equation, thereby generating Bessel functions within, as well as outside the training set.
Collapse
Affiliation(s)
- O Olurotimi
- Dept. of Electr. and Comput. Eng., George Mason Univ., Fairfax, VA
| |
Collapse
|
24
|
Hinoshita W, Arie H, Tani J, Okuno HG, Ogata T. Emergence of hierarchical structure mirroring linguistic composition in a recurrent neural network. Neural Netw 2011; 24:311-20. [DOI: 10.1016/j.neunet.2010.12.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2010] [Revised: 12/21/2010] [Accepted: 12/29/2010] [Indexed: 10/18/2022]
|
25
|
Won SH, Song I, Lee SY, Park CH. Identification of finite state automata with a class of recurrent neural networks. IEEE TRANSACTIONS ON NEURAL NETWORKS 2010; 21:1408-21. [PMID: 20709639 DOI: 10.1109/tnn.2010.2059040] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A class of recurrent neural networks is proposed and proven to be capable of identifying any discrete-time dynamical system. The application of the proposed network is addressed in the encoding, identification, and extraction of finite state automata (FSAs). Simulation results show that the identification of FSAs using the proposed network, trained by the hybrid greedy simulated annealing with a modified cost function in the training stage, generally exhibits better performance than the conventional identification procedures.
Collapse
Affiliation(s)
- Sung Hwan Won
- Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea.
| | | | | | | |
Collapse
|
26
|
Tickle AB, Andrews R, Golea M, Diederich J. The truth will come to light: directions and challenges in extracting the knowledge embedded within trained artificial neural networks. ACTA ACUST UNITED AC 2010; 9:1057-68. [PMID: 18255792 DOI: 10.1109/72.728352] [Citation(s) in RCA: 266] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
To date, the preponderance of techniques for eliciting the knowledge embedded in trained artificial neural networks (ANN's) has focused primarily on extracting rule-based explanations from feedforward ANN's. The ADT taxonomy for categorizing such techniques was proposed in 1995 to provide a basis for the systematic comparison of the different approaches. This paper shows that not only is this taxonomy applicable to a cross section of current techniques for extracting rules from trained feedforward ANN's but also how the taxonomy can be adapted and extended to embrace a broader range of ANN types (e.g., recurrent neural networks) and explanation structures. In addition the paper identifies some of the key research questions in extracting the knowledge embedded within ANN's including the need for the formulation of a consistent theoretical basis for what has been, until recently, a disparate collection of empirical results.
Collapse
Affiliation(s)
- A B Tickle
- Neurocomputing Research Centre, Queensland University of Technology, GPO Brisbane, Queensland 4001, Australia
| | | | | | | |
Collapse
|
27
|
Christiansen MH, Chater N. Toward a Connectionist Model of Recursion in Human Linguistic Performance. Cogn Sci 2010. [DOI: 10.1207/s15516709cog2302_2] [Citation(s) in RCA: 170] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
28
|
Chen J, Chaudhari NS. Segmented-memory recurrent neural networks. IEEE TRANSACTIONS ON NEURAL NETWORKS 2009; 20:1267-80. [PMID: 19605323 DOI: 10.1109/tnn.2009.2022980] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Conventional recurrent neural networks (RNNs) have difficulties in learning long-term dependencies. To tackle this problem, we propose an architecture called segmented-memory recurrent neural network (SMRNN). A symbolic sequence is broken into segments and then presented as inputs to the SMRNN one symbol per cycle. The SMRNN uses separate internal states to store symbol-level context, as well as segment-level context. The symbol-level context is updated for each symbol presented for input. The segment-level context is updated after each segment. The SMRNN is trained using an extended real-time recurrent learning algorithm. We test the performance of SMRNN on the information latching problem, the "two-sequence problem" and the problem of protein secondary structure (PSS) prediction. Our implementation results indicate that SMRNN performs better on long-term dependency problems than conventional RNNs. Besides, we also theoretically analyze how the segmented memory of SMRNN helps learning long-term temporal dependencies and study the impact of the segment length.
Collapse
Affiliation(s)
- Jinmiao Chen
- School of Computer Engineering, Nanyang Technological University, Singapore 639798, Singapore.
| | | |
Collapse
|
29
|
Rutishauser U, Douglas RJ. State-dependent computation using coupled recurrent networks. Neural Comput 2009; 21:478-509. [PMID: 19431267 DOI: 10.1162/neco.2008.03-08-734] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Although conditional branching between possible behavioral states is a hallmark of intelligent behavior, very little is known about the neuronal mechanisms that support this processing. In a step toward solving this problem, we demonstrate by theoretical analysis and simulation how networks of richly interconnected neurons, such as those observed in the superficial layers of the neocortex, can embed reliable, robust finite state machines. We show how a multistable neuronal network containing a number of states can be created very simply by coupling two recurrent networks whose synaptic weights have been configured for soft winner-take-all (sWTA) performance. These two sWTAs have simple, homogeneous, locally recurrent connectivity except for a small fraction of recurrent cross-connections between them, which are used to embed the required states. This coupling between the maps allows the network to continue to express the current state even after the input that elicited that state is withdrawn. In addition, a small number of transition neurons implement the necessary input-driven transitions between the embedded states. We provide simple rules to systematically design and construct neuronal state machines of this kind. The significance of our finding is that it offers a method whereby the cortex could construct networks supporting a broad range of sophisticated processing by applying only small specializations to the same generic neuronal circuit.
Collapse
Affiliation(s)
- Ueli Rutishauser
- Computation and Neural Systems, California Institute of Technology, Pasadena, CA 91225, USA.
| | | |
Collapse
|
30
|
Nayak R. Generating rules with predicates, terms and variables from the pruned neural networks. Neural Netw 2009; 22:405-14. [DOI: 10.1016/j.neunet.2009.02.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2007] [Revised: 10/03/2008] [Accepted: 02/06/2009] [Indexed: 10/21/2022]
|
31
|
Fernández-Caballero A, López MT, Castillo JC, Maldonado-Bascón S. Real-time accumulative computation motion detectors. SENSORS 2009; 9:10044-65. [PMID: 22303161 PMCID: PMC3267209 DOI: 10.3390/s91210044] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2009] [Revised: 11/24/2009] [Accepted: 11/30/2009] [Indexed: 11/27/2022]
Abstract
The neurally inspired accumulative computation (AC) method and its application to motion detection have been introduced in the past years. This paper revisits the fact that many researchers have explored the relationship between neural networks and finite state machines. Indeed, finite state machines constitute the best characterized computational model, whereas artificial neural networks have become a very successful tool for modeling and problem solving. The article shows how to reach real-time performance after using a model described as a finite state machine. This paper introduces two steps towards that direction: (a) A simplification of the general AC method is performed by formally transforming it into a finite state machine. (b) A hardware implementation in FPGA of such a designed AC module, as well as an 8-AC motion detector, providing promising performance results. We also offer two case studies of the use of AC motion detectors in surveillance applications, namely infrared-based people segmentation and color-based people tracking, respectively.
Collapse
Affiliation(s)
- Antonio Fernández-Caballero
- Instituto de Investigación en Informática de Albacete, 02071-Albacete, Spain; E-Mails: (M.T.L.); (J.C.C.)
- Departamento de Sistemas Informáticos, Escuela de Ingeníeros Industrials de Albacete, Universidad de Castilla-La Mancha, 02071-Albacete, Spain
- Author to whom correspondence should be addressed; E-Mail:
| | - María Teresa López
- Instituto de Investigación en Informática de Albacete, 02071-Albacete, Spain; E-Mails: (M.T.L.); (J.C.C.)
- Departamento de Sistemas Informáticos, Escuela Superior de Ingeniería Informática, Universidad de Castilla-La Mancha, 02071-Albacete, Spain
| | - José Carlos Castillo
- Instituto de Investigación en Informática de Albacete, 02071-Albacete, Spain; E-Mails: (M.T.L.); (J.C.C.)
| | - Saturnino Maldonado-Bascón
- Department of Signal Theory and Communications, Escuela Politécnica Superior, Universidad de Alcalá, 28871-Alcalá de Henares, Madrid, Spain; E-Mail:
| |
Collapse
|
32
|
Kolman E, Margaliot M. A new approach to knowledge-based design of recurrent neural networks. IEEE TRANSACTIONS ON NEURAL NETWORKS 2008; 19:1389-401. [PMID: 18701369 DOI: 10.1109/tnn.2008.2000393] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A major drawback of artificial neural networks (ANNs) is their black-box character. This is especially true for recurrent neural networks (RNNs) because of their intricate feedback connections. In particular, given a problem and some initial information concerning its solution, it is not at all obvious how to design an RNN that is suitable for solving this problem. In this paper, we consider a fuzzy rule base with a special structure, referred to as the fuzzy all-permutations rule base (FARB). Inferring the FARB yields an input-output (IO) mapping that is mathematically equivalent to that of an RNN. We use this equivalence to develop two new knowledge-based design methods for RNNs. The first method, referred to as the direct approach, is based on stating the desired functioning of the RNN in terms of several sets of symbolic rules, each one corresponding to a subnetwork. Each set is then transformed into a suitable FARB. The second method is based on first using the direct approach to design a library of simple modules, such as counters or comparators, and realize them using RNNs. Once designed, the correctness of each RNN can be verified. Then, the initial design problem is solved by using these basic modules as building blocks. This yields a modular and systematic approach for knowledge-based design of RNNs. We demonstrate the efficiency of these approaches by designing RNNs that recognize both regular and nonregular formal languages.
Collapse
Affiliation(s)
- Eyal Kolman
- School of Electrical Engineering-Systems, Tel Aviv University, Tel Aviv 69978, Israel.
| | | |
Collapse
|
33
|
Delgado M, Cuéllar MP, Pegalajar MC. Multiobjective hybrid optimization and training of recurrent neural networks. ACTA ACUST UNITED AC 2008; 38:381-403. [PMID: 18348922 DOI: 10.1109/tsmcb.2007.912937] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The application of neural networks to solve a problem involves tasks with a high computational cost until a suitable network is found, and these tasks mainly involve the selection of the network topology and the training step. We usually select the network structure by means of a trial-and-error procedure, and we then train the network. In the case of recurrent neural networks (RNNs), the lack of suitable training algorithms sometimes hampers these procedures due to vanishing gradient problems. This paper addresses the simultaneous training and topology optimization of RNNs using multiobjective hybrid procedures. The proposal is based on the SPEA2 and NSGA2 algorithms for making hybrid methods using the Baldwinian hybridization strategy. We also study the effects of the selection of the objectives, crossover, and mutation in the diversity during evolution. The proposals are tested in the experimental section to train and optimize the networks in the competition on artificial time-series (CATS) benchmark.
Collapse
Affiliation(s)
- Miguel Delgado
- Department of Computer Science and Artificial Intelligence, University of Grenada, Grenada, Spain
| | | | | |
Collapse
|
34
|
Cartling B. On the implicit acquisition of a context-free grammar by a simple recurrent neural network. Neurocomputing 2008. [DOI: 10.1016/j.neucom.2007.05.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
35
|
Das S, Olurotimi O. Noisy recurrent neural networks: the continuous-time case. ACTA ACUST UNITED AC 2008; 9:913-36. [PMID: 18255776 DOI: 10.1109/72.712164] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The classical stochastic analog of the deterministic linear system in engineering is the linear system driven by white noise. This model is the basis of many important engineering methodologies in stochastic control, system identification, and signal estimation, and classification. As the promise of artificial neural networks in modeling nonlinear systems continues to grow, the need for a stochastic analog with quantitative foundations for analysis and synthesis will increase. This paper (along with a companion paper) represent recent work in this direction, examining recurrent neural networks (RNN's) driven by white noise. In this paper, we examine the effect of noise on the typical continuous-time RNN model. First, we perform qualitative analysis establishing uniform boundedness of moments of the neuron states over time. To enable practical application in RNN design and evaluation, however, it is necessary to relate these properties to useful measures that can be estimated. We thus subsequently derive bias and variance measures for the noisy RNN with respect to the corresponding deterministic RNN. This has significant practical implications, since neural-network design is nonminimal in the sense that several different networks can be constructed to solve the same problem. The results in this paper allow the user to quantitatively evaluate given RNN's for noise performance. In addition, the designer can use these results to constrain the design space so that the achieved design satisfies performance specifications whenever possible. An example is provided using the measures derived in this paper to predetermine the best among several RNN designs for a given problem. The companion paper presents results for the discrete-time (so-called time-lagged recurrent) case.
Collapse
Affiliation(s)
- S Das
- Department of Electrical and Computer Engineering, MS 1G5, George Mason University, Fairfax, VA 22030, USA
| | | |
Collapse
|
36
|
Chen CH, Honavar V. A neural-network architecture for syntax analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS 2008; 10:94-114. [PMID: 18252507 DOI: 10.1109/72.737497] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Artificial neural networks (ANN's), due to their inherent parallelism, offer an attractive paradigm for implementation of symbol processing systems for applications in computer science and artificial intelligence. This paper explores systematic synthesis of modular neural-network architectures for syntax analysis using a prespecified grammar--a prototypical symbol processing task which finds applications in programming language interpretation, syntax analysis of symbolic expressions, and high-performance compilers. The proposed architecture is assembled from ANN components for lexical analysis, stack, parsing and parse tree construction. Each of these modules takes advantage of parallel content-based pattern matching using a neural associative memory. The proposed neural-network architecture for syntax analysis provides a relatively efficient and high performance alternative to current computer systems for applications that involve parsing of LR grammars which constitute a widely used subset of deterministic context-free grammars. Comparison of quantitatively estimated performance of such a system [implemented using current CMOS very large scale integration (VLSI) technology] with that of conventional computers demonstrates the benefits of massively parallel neural-network architectures for symbol processing applications.
Collapse
Affiliation(s)
- C H Chen
- Advanced Technology Center, Computer and Communication Laboratories, Industrial Technology Research Institute, Chutung, Hsinchu, Taiwan, R.O.C
| | | |
Collapse
|
37
|
Abstract
This letter presents an algorithm, CrySSMEx, for extracting minimal finite state machine descriptions of dynamic systems such as recurrent neural networks. Unlike previous algorithms, CrySSMEx is parameter free and deterministic, and it efficiently generates a series of increasingly refined models. A novel finite stochastic model of dynamic systems and a novel vector quantization function have been developed to take into account the state-space dynamics of the system. The experiments show that (1) extraction from systems that can be described as regular grammars is trivial, (2) extraction from high-dimensional systems is feasible, and (3) extraction of approximative models from chaotic systems is possible. The results are promising, and an analysis of shortcomings suggests some possible further improvements. Some largely overlooked connections, of the field of rule extraction from recurrent neural networks, to other fields are also identified.
Collapse
|
38
|
Sun GZ, Giles CL, Chen HH. The neural network pushdown automaton: Architecture, dynamics and training. ACTA ACUST UNITED AC 2006. [DOI: 10.1007/bfb0054003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
39
|
Abstract
We investigate possibilities of inducing temporal structures without fading memory in recurrent networks of spiking neurons strictly operating in the pulse-coding regime. We extend the existing gradient-based algorithm for training feedforward spiking neuron networks, SpikeProp (Bohte, Kok, & La Poutré, 2002), to recurrent network topologies, so that temporal dependencies in the input stream are taken into account. It is shown that temporal structures with unbounded input memory specified by simple Moore machines (MM) can be induced by recurrent spiking neuron networks (RSNN). The networks are able to discover pulse-coded representations of abstract information processing states coding potentially unbounded histories of processed inputs. We show that it is often possible to extract from trained RSNN the target MM by grouping together similar spike trains appearing in the recurrent layer. Even when the target MM was not perfectly induced in a RSNN, the extraction procedure was able to reveal weaknesses of the induced mechanism and the extent to which the target machine had been learned.
Collapse
Affiliation(s)
| | - Ashely J. S. Mills
- School of Computer Science, University of Birmingham, Birmingham B15 2TT, U.K
| |
Collapse
|
40
|
Abstract
Rule extraction (RE) from recurrent neural networks (RNNs) refers to finding models of the underlying RNN, typically in the form of finite state machines, that mimic the network to a satisfactory degree while having the advantage of being more transparent. RE from RNNs can be argued to allow a deeper and more profound form of analysis of RNNs than other, more or less ad hoc methods. RE may give us understanding of RNNs in the intermediate levels between quite abstract theoretical knowledge of RNNs as a class of computing devices and quantitative performance evaluations of RNN instantiations. The development of techniques for extraction of rules from RNNs has been an active field since the early 1990s. This article reviews the progress of this development and analyzes it in detail. In order to structure the survey and evaluate the techniques, a taxonomy specifically designed for this purpose has been developed. Moreover, important open research issues are identified that, if addressed properly, possibly can give the field a significant push forward.
Collapse
Affiliation(s)
- Henrik Jacobsson
- School of Humanities and Informatics, University of Skövde, Skövde, Sweden, and Department of Computer Science, University of Sheffield, United Kingdom
| |
Collapse
|
41
|
|
42
|
|
43
|
|
44
|
Brouwer RK. Training of a discrete recurrent neural network for sequence classification by using a helper FNN. Soft comput 2004. [DOI: 10.1007/s00500-004-0409-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
45
|
Vahed A, Omlin CW. A Machine Learning Method for Extracting Symbolic Knowledge from Recurrent Neural Networks. Neural Comput 2004; 16:59-71. [PMID: 15006023 DOI: 10.1162/08997660460733994] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Neural networks do not readily provide an explanation of the knowledge stored in their weights as part of their information processing. Until recently, neural networks were considered to be black boxes, with the knowledge stored in their weights not readily accessible. Since then, research has resulted in a number of algorithms for extracting knowledge in symbolic form from trained neural networks. This article addresses the extraction of knowledge in symbolic form from recurrent neural networks trained to behave like deterministic finite-state automata (DFAs). To date, methods used to extract knowledge from such networks have relied on the hypothesis that networks' states tend to cluster and that clusters of network states correspond to DFA states. The computational complexity of such a cluster analysis has led to heuristics that either limit the number of clusters that may form during training or limit the exploration of the space of hidden recurrent state neurons. These limitations, while necessary, may lead to decreased fidelity, in which the extracted knowledge may not model the true behavior of a trained network, perhaps not even for the training set. The method proposed here uses a polynomial time, symbolic learning algorithm to infer DFAs solely from the observation of a trained network's input-output behavior. Thus, this method has the potential to increase the fidelity of the extracted knowledge.
Collapse
Affiliation(s)
- A Vahed
- Department of Computer Science, University of the Western Cape, Bellville, South Africa.
| | | |
Collapse
|
46
|
Tino P, Cernanský M, Benusková L. Markovian Architectural Bias of Recurrent Neural Networks. ACTA ACUST UNITED AC 2004; 15:6-15. [PMID: 15387243 DOI: 10.1109/tnn.2003.820839] [Citation(s) in RCA: 138] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
In this paper, we elaborate upon the claim that clustering in the recurrent layer of recurrent neural networks (RNNs) reflects meaningful information processing states even prior to training [1], [2]. By concentrating on activation clusters in RNNs, while not throwing away the continuous state space network dynamics, we extract predictive models that we call neural prediction machines (NPMs). When RNNs with sigmoid activation functions are initialized with small weights (a common technique in the RNN community), the clusters of recurrent activations emerging prior to training are indeed meaningful and correspond to Markov prediction contexts. In this case, the extracted NPMs correspond to a class of Markov models, called variable memory length Markov models (VLMMs). In order to appreciate how much information has really been induced during the training, the RNN performance should always be compared with that of VLMMs and NPMs extracted before training as the "null" base models. Our arguments are supported by experiments on a chaotic symbolic sequence and a context-free language with a deep recursive structure. Index Terms-Complex symbolic sequences, information latching problem, iterative function systems, Markov models, recurrent neural networks (RNNs).
Collapse
Affiliation(s)
- Peter Tino
- School of Computer Science, University of Birmingham, Edgbaston, Birmingham, B15 2TT, U.K.
| | | | | |
Collapse
|
47
|
Chalup SK, Blair AD. Incremental training of first order recurrent neural networks to predict a context-sensitive language. Neural Netw 2003; 16:955-72. [PMID: 14692631 DOI: 10.1016/s0893-6080(03)00054-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In recent years it has been shown that first order recurrent neural networks trained by gradient-descent can learn not only regular but also simple context-free and context-sensitive languages. However, the success rate was generally low and severe instability issues were encountered. The present study examines the hypothesis that a combination of evolutionary hill climbing with incremental learning and a well-balanced training set enables first order recurrent networks to reliably learn context-free and mildly context-sensitive languages. In particular, we trained the networks to predict symbols in string sequences of the context-sensitive language [a(n)b(n)c(n); n > or = 1. Comparative experiments with and without incremental learning indicated that incremental learning can accelerate and facilitate training. Furthermore, incrementally trained networks generally resulted in monotonic trajectories in hidden unit activation space, while the trajectories of non-incrementally trained networks were oscillating. The non-incrementally trained networks were more likely to generalise.
Collapse
Affiliation(s)
- Stephan K Chalup
- School of Electrical Engineering and Computer Science, The University of Newcastle, Callaghan, NSW 2308, Australia.
| | | |
Collapse
|
48
|
Abstract
We have recently shown that when initialized with “small” weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased toward Markov models; even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tiňo, 2002; Tiňo, Čerňanský, &Beňušková, 2002a, 2002b). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this article, we extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram&a scenario that has been frequently considered in the past, for example, when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box counting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters, the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.
Collapse
Affiliation(s)
- Peter Tiňo
- Aston University, Birmingham B4 7ET, U.K.,
| | | |
Collapse
|
49
|
Maass W, Natschläger T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput 2002; 14:2531-60. [PMID: 12433288 DOI: 10.1162/089976602760407955] [Citation(s) in RCA: 1153] [Impact Index Per Article: 52.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
A key challenge for neural modeling is to explain how a continuous stream of multimodal input from a rapidly changing environment can be processed by stereotypical recurrent circuits of integrate-and-fire neurons in real time. We propose a new computational model for real-time computing on time-varying input that provides an alternative to paradigms based on Turing machines or attractor neural networks. It does not require a task-dependent construction of neural circuits. Instead, it is based on principles of high-dimensional dynamical systems in combination with statistical learning theory and can be implemented on generic evolved or found recurrent circuitry. It is shown that the inherent transient dynamics of the high-dimensional dynamical system formed by a sufficiently large and heterogeneous neural circuit may serve as universal analog fading memory. Readout neurons can learn to extract in real time from the current state of such recurrent neural circuit information about current and past inputs that may be needed for diverse tasks. Stable internal states are not required for giving a stable output, since transient internal states can be transformed by readout neurons into stable target outputs due to the high dimensionality of the dynamical system. Our approach is based on a rigorous computational model, the liquid state machine, that, unlike Turing machines, does not require sequential transitions between well-defined discrete internal states. It is supported, as the Turing machine is, by rigorous mathematical results that predict universal computational power under idealized conditions, but for the biologically more realistic scenario of real-time processing of time-varying inputs. Our approach provides new perspectives for the interpretation of neural coding, the design of experiments and data analysis in neurophysiology, and the solution of problems in robotics and neurotechnology.
Collapse
Affiliation(s)
- Wolfgang Maass
- Institute for Theoretical Computer Science, Technische Universität Graz, A-8010 Graz, Austria.
| | | | | |
Collapse
|
50
|
Abstract
The key developments of two decades of connectionist parsing are reviewed. Connectionist parsers are assessed according to their ability to learn to represent syntactic structures from examples automatically, without being presented with symbolic grammar rules. This review also considers the extent to which connectionist parsers offer computational models of human sentence processing and provide plausible accounts of psycholinguistic data. In considering these issues, special attention is paid to the level of realism, the nature of the modularity, and the type of processing that is to be found in a wide range of parsers.
Collapse
Affiliation(s)
- Dominic Palmer-Brown
- Leeds Metropolitan University, Computational Intelligence Research Group, School of Computing, Beckett Park, LS6 3QS, Leeds, UK
| | | | | |
Collapse
|