1
|
Kuhl E, Zang C, Esper J, Riechelmann DFC, Büntgen U, Briesch M, Reinig F, Römer P, Konter O, Schmidhalter M, Hartl C. Using machine learning on tree‐ring data to determine the geographical provenance of historical construction timbers. Ecosphere 2023. [DOI: 10.1002/ecs2.4453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023] Open
Affiliation(s)
- Eileen Kuhl
- Department of Geography Johannes Gutenberg University Mainz Germany
| | - Christian Zang
- Department of Forestry University of Applied Science Weihenstephan‐Triesdorf Freising Germany
| | - Jan Esper
- Department of Geography Johannes Gutenberg University Mainz Germany
- Global Change Research Centre (CzechGlobe) Brno Czech Republic
| | | | - Ulf Büntgen
- Global Change Research Centre (CzechGlobe) Brno Czech Republic
- Department of Geography University of Cambridge Cambridge UK
- Swiss Federal Research Institute (WSL) Birmensdorf Switzerland
- Department of Geography Masaryk University Brno Czech Republic
| | - Martin Briesch
- Department of Information Systems and Business Administration Johannes Gutenberg University Mainz Germany
| | - Frederick Reinig
- Department of Geography Johannes Gutenberg University Mainz Germany
| | - Philipp Römer
- Department of Geography Johannes Gutenberg University Mainz Germany
| | - Oliver Konter
- Department of Geography Johannes Gutenberg University Mainz Germany
| | | | - Claudia Hartl
- Nature Rings ‐ Environmental Research and Education Mainz Germany
| |
Collapse
|
2
|
TCMI: a non-parametric mutual-dependence estimator for multivariate continuous distributions. Data Min Knowl Discov 2022. [DOI: 10.1007/s10618-022-00847-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
AbstractThe identification of relevant features, i.e., the driving variables that determine a process or the properties of a system, is an essential part of the analysis of data sets with a large number of variables. A mathematical rigorous approach to quantifying the relevance of these features is mutual information. Mutual information determines the relevance of features in terms of their joint mutual dependence to the property of interest. However, mutual information requires as input probability distributions, which cannot be reliably estimated from continuous distributions such as physical quantities like lengths or energies. Here, we introduce total cumulative mutual information (TCMI), a measure of the relevance of mutual dependences that extends mutual information to random variables of continuous distribution based on cumulative probability distributions. TCMI is a non-parametric, robust, and deterministic measure that facilitates comparisons and rankings between feature sets with different cardinality. The ranking induced by TCMI allows for feature selection, i.e., the identification of variable sets that are nonlinear statistically related to a property of interest, taking into account the number of data samples as well as the cardinality of the set of variables. We evaluate the performance of our measure with simulated data, compare its performance with similar multivariate-dependence measures, and demonstrate the effectiveness of our feature-selection method on a set of standard data sets and a typical scenario in materials science.
Collapse
|
3
|
|
4
|
Abstract
The decisions we make are shaped by a lifetime of learning. Past experience guides the way that we encode information in neural systems for perception and valuation, and determines the information we retrieve when making decisions. Distinct literatures have discussed how lifelong learning and local context shape decisions made about sensory signals, propositional information, or economic prospects. Here, we build bridges between these literatures, arguing for common principles of adaptive rationality in perception, cognition, and economic choice. We discuss how a single common framework, based on normative principles of efficient coding and Bayesian inference, can help us understand a myriad of human decision biases, including sensory illusions, adaptive aftereffects, choice history biases, central tendency effects, anchoring effects, contrast effects, framing effects, congruency effects, reference-dependent valuation, nonlinear utility functions, and discretization heuristics. We describe a simple computational framework for explaining these phenomena. Expected final online publication date for the Annual Review of Psychology, Volume 73 is January 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Christopher Summerfield
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6GG, United Kingdom;
| | - Paula Parpart
- Department of Experimental Psychology, University of Oxford, Oxford OX2 6GG, United Kingdom;
| |
Collapse
|
5
|
|
6
|
Use of Generative Adversarial Networks (GAN) for Taphonomic Image Augmentation and Model Protocol for the Deep Learning Analysis of Bone Surface Modifications. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11115237] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Deep learning models are based on a combination of neural network architectures, optimization parameters and activation functions. All of them provide exponential combinations whose computational fitness is difficult to pinpoint. The intricate resemblance of the microscopic features that are found in bone surface modifications make their differentiation challenging, and determining a baseline combination of optimizers and activation functions for modeling seems necessary for computational economy. Here, we experiment with combinations of the most resolutive activation functions (relu, swish, and mish) and the most efficient optimizers (stochastic gradient descent (SGD) and Adam) for bone surface modification analysis. We show that despite a wide variability of outcomes, a baseline of relu–SGD is advised for raw bone surface modification data. For imbalanced samples, augmented datasets generated through generative adversarial networks are implemented, resulting in balanced accuracy and an inherent bias regarding mark replication. In summary, although baseline procedures are advised, these do not prevent to overcome Wolpert’s “no free lunch” theorem and extend it beyond model architectures.
Collapse
|
7
|
Sharma A. Stochastic nonparallel hyperplane support vector machine for binary classification problems and no-free-lunch theorems. EVOLUTIONARY INTELLIGENCE 2020. [DOI: 10.1007/s12065-020-00503-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
8
|
Hafez-Kolahi H, Kasaei S, Soleymani-Baghshah M. Sample complexity of classification with compressed input. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.07.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
9
|
|
10
|
Hayashi Y. Use of a Deep Belief Network for Small High-Level Abstraction Data Sets Using Artificial Intelligence with Rule Extraction. Neural Comput 2018; 30:3309-3326. [DOI: 10.1162/neco_a_01139] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/29/2023]
Abstract
We describe a simple method to transfer from weights in deep neural networks (NNs) trained by a deep belief network (DBN) to weights in a backpropagation NN (BPNN) in the recursive-rule eXtraction (Re-RX) algorithm with J48graft (Re-RX with J48graft) and propose a new method to extract accurate and interpretable classification rules for rating category data sets. We apply this method to the Wisconsin Breast Cancer Data Set (WBCD), the Mammographic Mass Data Set, and the Dermatology Dataset, which are small, high-abstraction data sets with prior knowledge. After training these three data sets, our proposed rule extraction method was able to extract accurate and concise rules for deep NNs trained by a DBN. These results suggest that our proposed method could help fill the gap between the very high learning capability of DBNs and the very high interpretability of rule extraction algorithms such as Re-RX with J48graft.
Collapse
Affiliation(s)
- Yoichi Hayashi
- Department of Computer Science, Meiji University, Kawasaki 214-8571, Japan
| |
Collapse
|
11
|
A Fuzzy Classifier with Feature Selection Based on the Gravitational Search Algorithm. Symmetry (Basel) 2018. [DOI: 10.3390/sym10110609] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper concerns several important topics of the Symmetry journal, namely, pattern recognition, computer-aided design, diversity and similarity. We also take advantage of the symmetric and asymmetric structure of a transfer function, which is responsible to map a continuous search space to a binary search space. A new method for design of a fuzzy-rule-based classifier using metaheuristics called Gravitational Search Algorithm (GSA) is discussed. The paper identifies three basic stages of the classifier construction: feature selection, creating of a fuzzy rule base and optimization of the antecedent parameters of rules. At the first stage, several feature subsets are obtained by using the wrapper scheme on the basis of the binary GSA. Creating fuzzy rules is a serious challenge in designing the fuzzy-rule-based classifier in the presence of high-dimensional data. The classifier structure is formed by the rule base generation algorithm by using minimum and maximum feature values. The optimal fuzzy-rule-based parameters are extracted from the training data using the continuous GSA. The classifier performance is tested on real-world KEEL (Knowledge Extraction based on Evolutionary Learning) datasets. The results demonstrate that highly accurate classifiers could be constructed with relatively few fuzzy rules and features.
Collapse
|
12
|
FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees. JUDGMENT AND DECISION MAKING 2017. [DOI: 10.1017/s1930297500006239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
AbstractFast-and-frugal trees (FFTs) are simple algorithms that facilitate efficient and accurate decisions based on limited information. But despite their successful use in many applied domains, there is no widely available toolbox that allows anyone to easily create, visualize, and evaluate FFTs. We fill this gap by introducing the R package FFTrees. In this paper, we explain how FFTs work, introduce a new class of algorithms called fan for constructing FFTs, and provide a tutorial for using the FFTrees package. We then conduct a simulation across ten real-world datasets to test how well FFTs created by FFTrees can predict data. Simulation results show that FFTs created by FFTrees can predict data as well as popular classification algorithms such as regression and random forests, while remaining simple enough for anyone to understand and use.
Collapse
|
13
|
Mirylenka K, Giannakopoulos G, Do LM, Palpanas T. On classifier behavior in the presence of mislabeling noise. Data Min Knowl Discov 2016. [DOI: 10.1007/s10618-016-0484-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
14
|
Nowosad J. Spatiotemporal models for predicting high pollen concentration level of Corylus, Alnus, and Betula. INTERNATIONAL JOURNAL OF BIOMETEOROLOGY 2016; 60:843-55. [PMID: 26487352 PMCID: PMC4879172 DOI: 10.1007/s00484-015-1077-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Revised: 09/25/2015] [Accepted: 09/26/2015] [Indexed: 05/22/2023]
Abstract
Corylus, Alnus, and Betula trees are among the most important sources of allergic pollen in the temperate zone of the Northern Hemisphere and have a large impact on the quality of life and productivity of allergy sufferers. Therefore, it is important to predict high pollen concentrations, both in time and space. The aim of this study was to create and evaluate spatiotemporal models for predicting high Corylus, Alnus, and Betula pollen concentration levels, based on gridded meteorological data. Aerobiological monitoring was carried out in 11 cities in Poland and gathered, depending on the site, between 2 and 16 years of measurements. According to the first allergy symptoms during exposure, a high pollen count level was established for each taxon. An optimizing probability threshold technique was used for mitigation of the problem of imbalance in the pollen concentration levels. For each taxon, the model was built using a random forest method. The study revealed the possibility of moderately reliable prediction of Corylus and highly reliable prediction of Alnus and Betula high pollen concentration levels, using preprocessed gridded meteorological data. Cumulative growing degree days and potential evaporation proved to be two of the most important predictor variables in the models. The final models predicted not only for single locations but also for continuous areas. Furthermore, the proposed modeling framework could be used to predict high pollen concentrations of Corylus, Alnus, Betula, and other taxa, and in other countries.
Collapse
Affiliation(s)
- Jakub Nowosad
- Institute of Geoecology and Geoinformation, Adam Mickiewicz University, Dzięgielowa 27, 61-680, Poznań, Poland.
| |
Collapse
|
15
|
Gómez D, Rojas A. An Empirical Overview of the No Free Lunch Theorem and Its Effect on Real-World Machine Learning Classification. Neural Comput 2015; 28:216-28. [PMID: 26599713 DOI: 10.1162/neco_a_00793] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
A sizable amount of research has been done to improve the mechanisms for knowledge extraction such as machine learning classification or regression. Quite unintuitively, the no free lunch (NFL) theorem states that all optimization problem strategies perform equally well when averaged over all possible problems. This fact seems to clash with the effort put forth toward better algorithms. This letter explores empirically the effect of the NFL theorem on some popular machine learning classification techniques over real-world data sets.
Collapse
Affiliation(s)
- David Gómez
- Telematics Engineering Department, Polytechnical University of Catalonia, Barcelona 08034, Spain
| | - Alfonso Rojas
- Telematics Engineering Department, Polytechnical University of Catalonia, Barcelona 08034, Spain
| |
Collapse
|
16
|
Ota K, Oishi N, Ito K, Fukuyama H. Effects of imaging modalities, brain atlases and feature selection on prediction of Alzheimer's disease. J Neurosci Methods 2015; 256:168-83. [PMID: 26318777 DOI: 10.1016/j.jneumeth.2015.08.020] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2015] [Revised: 07/27/2015] [Accepted: 08/18/2015] [Indexed: 12/21/2022]
Abstract
BACKGROUND The choice of biomarkers for early detection of Alzheimer's disease (AD) is important for improving the accuracy of imaging-based prediction of conversion from mild cognitive impairment (MCI) to AD. The primary goal of this study was to assess the effects of imaging modalities and brain atlases on prediction. We also investigated the influence of support vector machine recursive feature elimination (SVM-RFE) on predictive performance. METHODS Eighty individuals with amnestic MCI [40 developed AD within 3 years] underwent structural magnetic resonance imaging (MRI) and (18)F-fluorodeoxyglucose positron emission tomography (FDG-PET) scans at baseline. Using Automated Anatomical Labeling (AAL) and LONI Probabilistic Brain Atlas (LPBA40), we extracted features representing gray matter density and relative cerebral metabolic rate for glucose in each region of interest from the baseline MRI and FDG-PET data, respectively. We used linear SVM ensemble with bagging and computed the area under the receiver operating characteristic curve (AUC) as a measure of classification performance. We performed multiple SVM-RFE to compute feature ranking. We performed analysis of variance on the mean AUCs for eight feature sets. RESULTS The interactions between atlas and modality choices were significant. The main effect of SVM-RFE was significant, but the interactions with the other factors were not significant. COMPARISON WITH EXISTING METHOD Multimodal features were found to be better than unimodal features to predict AD. FDG-PET was found to be better than MRI. CONCLUSIONS Imaging modalities and brain atlases interact with each other and affect prediction. SVM-RFE can improve the predictive accuracy when using atlas-based features.
Collapse
Affiliation(s)
- Kenichi Ota
- Human Brain Research Center, Kyoto University Graduate School of Medicine, 54 Shogoin Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan; Center for the Promotion of Interdisciplinary Education and Research, Kyoto University, 54 Shogoin Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan
| | - Naoya Oishi
- Human Brain Research Center, Kyoto University Graduate School of Medicine, 54 Shogoin Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan; Department of Psychiatry, Kyoto University Graduate School of Medicine, 54 Shogoin Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan.
| | - Kengo Ito
- Department of Clinical and Experimental Neuroimaging, National Center for Geriatrics and Gerontology, 7-430 Morioka-cho, Obu-shi, Aichi 474-8511, Japan
| | - Hidenao Fukuyama
- Human Brain Research Center, Kyoto University Graduate School of Medicine, 54 Shogoin Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan; Center for the Promotion of Interdisciplinary Education and Research, Kyoto University, 54 Shogoin Kawahara-cho, Sakyo-ku, Kyoto 606-8507, Japan
| | | | | |
Collapse
|
17
|
Mittag F, Römer M, Zell A. Influence of Feature Encoding and Choice of Classifier on Disease Risk Prediction in Genome-Wide Association Studies. PLoS One 2015; 10:e0135832. [PMID: 26285210 PMCID: PMC4540285 DOI: 10.1371/journal.pone.0135832] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2015] [Accepted: 07/27/2015] [Indexed: 12/31/2022] Open
Abstract
Various attempts have been made to predict the individual disease risk based on genotype data from genome-wide association studies (GWAS). However, most studies only investigated one or two classification algorithms and feature encoding schemes. In this study, we applied seven different classification algorithms on GWAS case-control data sets for seven different diseases to create models for disease risk prediction. Further, we used three different encoding schemes for the genotypes of single nucleotide polymorphisms (SNPs) and investigated their influence on the predictive performance of these models. Our study suggests that an additive encoding of the SNP data should be the preferred encoding scheme, as it proved to yield the best predictive performances for all algorithms and data sets. Furthermore, our results showed that the differences between most state-of-the-art classification algorithms are not statistically significant. Consequently, we recommend to prefer algorithms with simple models like the linear support vector machine (SVM) as they allow for better subsequent interpretation without significant loss of accuracy.
Collapse
Affiliation(s)
- Florian Mittag
- Cognitive Systems Group, University of Tübingen, Tübingen, Germany
- * E-mail:
| | - Michael Römer
- Cognitive Systems Group, University of Tübingen, Tübingen, Germany
| | - Andreas Zell
- Cognitive Systems Group, University of Tübingen, Tübingen, Germany
| |
Collapse
|
18
|
Simulation studies as designed experiments: the comparison of penalized regression models in the "large p, small n" setting. PLoS One 2014; 9:e107957. [PMID: 25289666 PMCID: PMC4188526 DOI: 10.1371/journal.pone.0107957] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2014] [Accepted: 08/17/2014] [Indexed: 11/24/2022] Open
Abstract
New algorithms are continuously proposed in computational biology. Performance evaluation of novel methods is important in practice. Nonetheless, the field experiences a lack of rigorous methodology aimed to systematically and objectively evaluate competing approaches. Simulation studies are frequently used to show that a particular method outperforms another. Often times, however, simulation studies are not well designed, and it is hard to characterize the particular conditions under which different methods perform better. In this paper we propose the adoption of well established techniques in the design of computer and physical experiments for developing effective simulation studies. By following best practices in planning of experiments we are better able to understand the strengths and weaknesses of competing algorithms leading to more informed decisions about which method to use for a particular task. We illustrate the application of our proposed simulation framework with a detailed comparison of the ridge-regression, lasso and elastic-net algorithms in a large scale study investigating the effects on predictive performance of sample size, number of features, true model sparsity, signal-to-noise ratio, and feature correlation, in situations where the number of covariates is usually much larger than sample size. Analysis of data sets containing tens of thousands of features but only a few hundred samples is nowadays routine in computational biology, where “omics” features such as gene expression, copy number variation and sequence data are frequently used in the predictive modeling of complex phenotypes such as anticancer drug response. The penalized regression approaches investigated in this study are popular choices in this setting and our simulations corroborate well established results concerning the conditions under which each one of these methods is expected to perform best while providing several novel insights.
Collapse
|
19
|
Enhancing the generalization ability of neural networks through controlling the hidden layers. Appl Soft Comput 2009. [DOI: 10.1016/j.asoc.2008.01.013] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
20
|
van Dinther R, Patterson RD. Perception of acoustic scale and size in musical instrument sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 120:2158-76. [PMID: 17069313 PMCID: PMC2821800 DOI: 10.1121/1.2338295] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
There is size information in natural sounds. For example, as humans grow in height, their vocal tracts increase in length, producing a predictable decrease in the formant frequencies of speech sounds. Recent studies have shown that listeners can make fine discriminations about which of two speakers has the longer vocal tract, supporting the view that the auditory system discriminates changes on the acoustic-scale dimension. Listeners can also recognize vowels scaled well beyond the range of vocal tracts normally experienced, indicating that perception is robust to changes in acoustic scale. This paper reports two perceptual experiments designed to extend research on acoustic scale and size perception to the domain of musical sounds: The first study shows that listeners can discriminate the scale of musical instrument sounds reliably, although not quite as well as for voices. The second experiment shows that listeners can recognize the family of an instrument sound which has been modified in pitch and scale beyond the range of normal experience. We conclude that processing of acoustic scale in music perception is very similar to processing of acoustic scale in speech perception.
Collapse
Affiliation(s)
- Ralph van Dinther
- Centre for the Neural Basis of Hearing, Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG UK.
| | | |
Collapse
|
21
|
Smith DRR, Patterson RD, Turner R, Kawahara H, Irino T. The processing and perception of size information in speech sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 117:305-18. [PMID: 15704423 PMCID: PMC2346562 DOI: 10.1121/1.1828637] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds.
Collapse
Affiliation(s)
- David R R Smith
- Centre for Neural Basis of Hearing, Department of Physiology, University of Cambridge, Cambridge CB2 3EG, United Kingdom
| | | | | | | | | |
Collapse
|
22
|
Bensusan H, Kalousis A. Estimating the Predictive Accuracy of a Classifier. MACHINE LEARNING: ECML 2001 2003. [DOI: 10.1007/3-540-44795-4_3] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
23
|
|
24
|
Vehtari A, Lampinen J. Bayesian model assessment and comparison using cross-validation predictive densities. Neural Comput 2002; 14:2439-68. [PMID: 12396570 DOI: 10.1162/08997660260293292] [Citation(s) in RCA: 104] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
In this work, we discuss practical methods for the assessment, comparison, and selection of complex hierarchical Bayesian models. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities. Instead of just making a point estimate, it is important to obtain the distribution of the expected utility estimate because it describes the uncertainty in the estimate. The distributions of the expected utility estimates can also be used to compare models, for example, by computing the probability of one model having a better expected utility than some other model. We propose an approach using cross-validation predictive densities to obtain expected utility estimates and Bayesian bootstrap to obtain samples from their distributions. We also discuss the probabilistic assumptions made and properties of two practical cross-validation methods, importance sampling and k-fold cross-validation. As illustrative examples, we use multilayer perceptron neural networks and gaussian processes with Markov chain Monte Carlo sampling in one toy problem and two challenging real-world problems.
Collapse
Affiliation(s)
- Aki Vehtari
- Laboratory of Computational Engineering, Helsinki University of Technology, FIN-02015, HUT, Finland.
| | | |
Collapse
|
25
|
Peng Y, Flach PA, Soares C, Brazdil P. Improved Dataset Characterisation for Meta-learning. DISCOVERY SCIENCE 2002. [DOI: 10.1007/3-540-36182-0_14] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
26
|
O'Reilly RC. Generalization in interactive networks: the benefits of inhibitory competition and Hebbian learning. Neural Comput 2001; 13:1199-241. [PMID: 11387044 DOI: 10.1162/08997660152002834] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Computational models in cognitive neuroscience should ideally use biological properties and powerful computational principles to produce behavior consistent with psychological findings. Error-driven backpropagation is computationally powerful and has proven useful for modeling a range of psychological data but is not biologically plausible. Several approaches to implementing backpropagation in a biologically plausible fashion converge on the idea of using bidirectional activation propagation in interactive networks to convey error signals. This article demonstrates two main points about these error-driven interactive networks: (1) they generalize poorly due to attractor dynamics that interfere with the network's ability to produce novel combinatorial representations systematically in response to novel inputs, and (2) this generalization problem can be remedied by adding two widely used mechanistic principles, inhibitory competition and Hebbian learning, that can be independently motivated for a variety of biological, psychological, and computational reasons. Simulations using the Leabra algorithm, which combines the generalized recirculation (GeneRec), biologically plausible, error-driven learning algorithm with inhibitory competition and Hebbian learning, show that these mechanisms can result in good generalization in interactive networks. These results support the general conclusion that cognitive neuroscience models that incorporate the core mechanistic principles of interactivity, inhibitory competition, and error-driven and Hebbian learning satisfy a wider range of biological, psychological, and computational constraints than models employing a subset of these principles.
Collapse
Affiliation(s)
- R C O'Reilly
- Department of Psychology, University of Colorado at Boulder, Boulder, CO 80309, USA
| |
Collapse
|
27
|
Abstract
We give a short review on the Bayesian approach for neural network learning and demonstrate the advantages of the approach in three real applications. We discuss the Bayesian approach with emphasis on the role of prior knowledge in Bayesian models and in classical error minimization approaches. The generalization capability of a statistical model, classical or Bayesian, is ultimately based on the prior assumptions. The Bayesian approach permits propagation of uncertainty in quantities which are unknown to other assumptions in the model, which may be more generally valid or easier to guess in the problem. The case problem studied in this paper include a regression, a classification, and an inverse problem. In the most thoroughly analyzed regression problem, the best models were those with less restrictive priors. This emphasizes the major advantage of the Bayesian approach, that we are not forced to guess attributes that are unknown, such as the number of degrees of freedom in the model, non-linearity of the model with respect to each input variable, or the exact form for the distribution of the model residuals.
Collapse
Affiliation(s)
- J Lampinen
- Laboratory of Computational Engineering, Helsinki University of Technology, Espoo, Finland.
| | | |
Collapse
|
28
|
Architectures and Idioms: Making Progress in Agent Design. LECTURE NOTES IN COMPUTER SCIENCE 2001. [DOI: 10.1007/3-540-44631-1_6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
29
|
Abstract
No-free-lunch theorems have shown that learning algorithms cannot be universally good. We show that no free funch exists for noise prediction as well. We show that when the noise is additive and the prior over target functions is uniform, a prior on the noise distribution cannot be updated, in the Bayesian sense, from any finite data set. We emphasize the importance of a prior over the target function in order to justify superior performance for learning systems.
Collapse
|
30
|
Abstract
We show that with a uniform prior on models having the same training error, early stopping at some fixed training error above the training error minimum results in an increase in the expected generalization error.
Collapse
Affiliation(s)
- Z Cataltepe
- Bell Labs, Lucent Technologies, Room 2C-265, 600 Mountain Avenue, Murray Hill, NJ 07974, USA.
| | | | | |
Collapse
|
31
|
Sharkey AJC. Linear and Order Statistics Combiners for Pattern Classification. PERSPECTIVES IN NEURAL COMPUTING 1999. [DOI: 10.1007/978-1-4471-0793-4_6] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
|
32
|
Abstract
This article presents several additive corrections to the conventional quadratic loss bias-plus-variance formula. One of these corrections is appropriate when both the target is not fixed (as in Bayesian analysis) and training sets are averaged over (as in the conventional bias plus variance formula). Another additive correction casts conventional fixed-trainingset Bayesian analysis directly in terms of bias plus variance. Another correction is appropriate for measuring full generalization error over a test set rather than (as with conventional bias plus variance) error at a single point. Yet another correction can help explain the recent counterintuitive bias-variance decomposition of Friedman for zero-one loss. After presenting these corrections, this article discusses some other loss function-specific aspects of supervised learning. In particular, there is a discussion of the fact that if the loss function is a metric (e.g., zero-one loss), then there is bound on the change in generalization error accompanying changing the algorithm's guess from h1 to h2, a bound that depends only on h1 and h2 and not on the target. This article ends by presenting versions of the bias-plus-variance formula appropriate for logarithmic and quadratic scoring, and then all the additive corrections appropriate to those formulas. All the correction terms presented are a covariance, between the learning algorithm and the posterior distribution over targets. Accordingly, in the (very common) contexts in which those terms apply, there is not a “bias-variance trade-off” or a “bias-variance dilemma,” as one often hears. Rather there is a bias-variance-covariance trade-off.
Collapse
|