1
|
Murphy KA, Bassett DS. Machine-Learning Optimized Measurements of Chaotic Dynamical Systems via the Information Bottleneck. PHYSICAL REVIEW LETTERS 2024; 132:197201. [PMID: 38804957 DOI: 10.1103/physrevlett.132.197201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 04/09/2024] [Indexed: 05/29/2024]
Abstract
Deterministic chaos permits a precise notion of a "perfect measurement" as one that, when obtained repeatedly, captures all of the information created by the system's evolution with minimal redundancy. Finding an optimal measurement is challenging and has generally required intimate knowledge of the dynamics in the few cases where it has been done. We establish an equivalence between a perfect measurement and a variant of the information bottleneck. As a consequence, we can employ machine learning to optimize measurement processes that efficiently extract information from trajectory data. We obtain approximately optimal measurements for multiple chaotic maps and lay the necessary groundwork for efficient information extraction from general time series.
Collapse
Affiliation(s)
- Kieran A Murphy
- Department of Bioengineering, School of Engineering and Applied Science
| | - Dani S Bassett
- Department of Bioengineering, School of Engineering and Applied Science
- Department of Electrical and Systems Engineering, School of Engineering and Applied Science; Department of Neurology and Department of Psychiatry, Perelman School of Medicine; Department of Physics and Astronomy, College of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
- The Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| |
Collapse
|
2
|
Murphy KA, Bassett DS. Information decomposition in complex systems via machine learning. Proc Natl Acad Sci U S A 2024; 121:e2312988121. [PMID: 38498714 PMCID: PMC10990158 DOI: 10.1073/pnas.2312988121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 01/30/2024] [Indexed: 03/20/2024] Open
Abstract
One of the fundamental steps toward understanding a complex system is identifying variation at the scale of the system's components that is most relevant to behavior on a macroscopic scale. Mutual information provides a natural means of linking variation across scales of a system due to its independence of functional relationship between observables. However, characterizing the manner in which information is distributed across a set of observables is computationally challenging and generally infeasible beyond a handful of measurements. Here, we propose a practical and general methodology that uses machine learning to decompose the information contained in a set of measurements by jointly optimizing a lossy compression of each measurement. Guided by the distributed information bottleneck as a learning objective, the information decomposition identifies the variation in the measurements of the system state most relevant to specified macroscale behavior. We focus our analysis on two paradigmatic complex systems: a Boolean circuit and an amorphous material undergoing plastic deformation. In both examples, the large amount of entropy of the system state is decomposed, bit by bit, in terms of what is most related to macroscale behavior. The identification of meaningful variation in data, with the full generality brought by information theory, is made practical for studying the connection between micro- and macroscale structure in complex systems.
Collapse
Affiliation(s)
- Kieran A. Murphy
- Department of Bioengineering, School of Engineering & Applied Science, University of Pennsylvania, Philadelphia, PA19104
| | - Dani S. Bassett
- Department of Bioengineering, School of Engineering & Applied Science, University of Pennsylvania, Philadelphia, PA19104
- Department of Electrical & Systems Engineering, School of Engineering & Applied Science, University of Pennsylvania, Philadelphia, PA19104
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA19104
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA19104
- Department of Physics & Astronomy, College of Arts & Sciences, University of Pennsylvania, Philadelphia, PA19104
- The Santa Fe Institute, Santa Fe, NM87501
| |
Collapse
|
3
|
Soflaei M, Zhang R, Guo H, Al-Bashabsheh A, Mao Y. Information Bottleneck and Aggregated Learning. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14807-14820. [PMID: 37698970 DOI: 10.1109/tpami.2023.3302150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/14/2023]
Abstract
We consider the problem of learning a neural network classifier. Under the information bottleneck (IB) principle, we associate with this classification problem a representation learning problem, which we call "IB learning". We show that IB learning is, in fact, equivalent to a special class of the quantization problem. The classical results in rate-distortion theory then suggest that IB learning can benefit from a "vector quantization" approach, namely, simultaneously learning the representations of multiple input objects. Such an approach assisted with some variational techniques, result in a novel learning framework, "Aggregated Learning", for classification with neural network models. In this framework, several objects are jointly classified by a single neural network. The effectiveness of this framework is verified through extensive experiments on standard image recognition and text classification tasks.
Collapse
|
4
|
Charvin H, Catenacci Volpi N, Polani D. Exact and Soft Successive Refinement of the Information Bottleneck. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1355. [PMID: 37761653 PMCID: PMC10528077 DOI: 10.3390/e25091355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/08/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023]
Abstract
The information bottleneck (IB) framework formalises the essential requirement for efficient information processing systems to achieve an optimal balance between the complexity of their representation and the amount of information extracted about relevant features. However, since the representation complexity affordable by real-world systems may vary in time, the processing cost of updating the representations should also be taken into account. A crucial question is thus the extent to which adaptive systems can leverage the information content of already existing IB-optimal representations for producing new ones, which target the same relevant features but at a different granularity. We investigate the information-theoretic optimal limits of this process by studying and extending, within the IB framework, the notion of successive refinement, which describes the ideal situation where no information needs to be discarded for adapting an IB-optimal representation's granularity. Thanks in particular to a new geometric characterisation, we analytically derive the successive refinability of some specific IB problems (for binary variables, for jointly Gaussian variables, and for the relevancy variable being a deterministic function of the source variable), and provide a linear-programming-based tool to numerically investigate, in the discrete case, the successive refinement of the IB. We then soften this notion into a quantification of the loss of information optimality induced by several-stage processing through an existing measure of unique information. Simple numerical experiments suggest that this quantity is typically low, though not entirely negligible. These results could have important implications for (i) the structure and efficiency of incremental learning in biological and artificial agents, (ii) the comparison of IB-optimal observation channels in statistical decision problems, and (iii) the IB theory of deep neural networks.
Collapse
Affiliation(s)
- Hippolyte Charvin
- School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield AL10 9AB, UK; (N.C.V.); (D.P.)
| | | | | |
Collapse
|
5
|
Mahali MI, Leu JS, Darmawan JT, Avian C, Bachroin N, Prakosa SW, Faisal M, Putro NAS. A Dual Architecture Fusion and AutoEncoder for Automatic Morphological Classification of Human Sperm. SENSORS (BASEL, SWITZERLAND) 2023; 23:6613. [PMID: 37514907 PMCID: PMC10385996 DOI: 10.3390/s23146613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Revised: 07/16/2023] [Accepted: 07/20/2023] [Indexed: 07/30/2023]
Abstract
Infertility has become a common problem in global health, and unsurprisingly, many couples need medical assistance to achieve reproduction. Many human behaviors can lead to infertility, which is none other than unhealthy sperm. The important thing is that assisted reproductive techniques require selecting healthy sperm. Hence, machine learning algorithms are presented as the subject of this research to effectively modernize and make accurate standards and decisions in classifying sperm. In this study, we developed a deep learning fusion architecture called SwinMobile that combines the Shifted Windows Vision Transformer (Swin) and MobileNetV3 into a unified feature space and classifies sperm from impurities in the SVIA Subset-C. Swin Transformer provides long-range feature extraction, while MobileNetV3 is responsible for extracting local features. We also explored incorporating an autoencoder into the architecture for an automatic noise-removing model. Our model was tested on SVIA, HuSHem, and SMIDS. Comparison to the state-of-the-art models was based on F1-score and accuracy. Our deep learning results accurately classified sperm and performed well in direct comparisons with previous approaches despite the datasets' different characteristics. We compared the model from Xception on the SVIA dataset, the MC-HSH model on the HuSHem dataset, and Ilhan et al.'s model on the SMIDS dataset and the astonishing results given by our model. The proposed model, especially SwinMobile-AE, has strong classification capabilities that enable it to function with high classification results on three different datasets. We propose that our deep learning approach to sperm classification is suitable for modernizing the clinical world. Our work leverages the potential of artificial intelligence technologies to rival humans in terms of accuracy, reliability, and speed of analysis. The SwinMobile-AE method we provide can achieve better results than state-of-the-art, even for three different datasets. Our results were benchmarked by comparisons with three datasets, which included SVIA, HuSHem, and SMIDS, respectively (95.4% vs. 94.9%), (97.6% vs. 95.7%), and (91.7% vs. 90.9%). Thus, the proposed model can realize technological advances in classifying sperm morphology based on the evidential results with three different datasets, each having its characteristics related to data size, number of classes, and color space.
Collapse
Affiliation(s)
- Muhammad Izzuddin Mahali
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei City 10607, Taiwan
- Department of Electronic and Informatic Engineering Education, Universitas Negeri Yogyakarta, Yogyakarta 55281, Indonesia
| | - Jenq-Shiou Leu
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei City 10607, Taiwan
| | - Jeremie Theddy Darmawan
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei City 10607, Taiwan
- Department of Bioinformatics, Indonesia International Institute for Life Science, Jakarta 13210, Indonesia
| | - Cries Avian
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei City 10607, Taiwan
| | - Nabil Bachroin
- Departement of Electrical Engineering, National Taiwan University of Science and Technology, Taipei City 10607, Taiwan
| | - Setya Widyawan Prakosa
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei City 10607, Taiwan
| | - Muhamad Faisal
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei City 10607, Taiwan
| | - Nur Achmad Sulistyo Putro
- Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, Taipei City 10607, Taiwan
- Department of Computer Science and Electronics, Universitas Gadjah Mada, Yogyakarta 55281, Indonesia
| |
Collapse
|
6
|
Lyu Z, Aminian G, Rodrigues MRD. On Neural Networks Fitting, Compression, and Generalization Behavior via Information-Bottleneck-like Approaches. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1063. [PMID: 37510010 PMCID: PMC10377965 DOI: 10.3390/e25071063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 07/11/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023]
Abstract
It is well-known that a neural network learning process-along with its connections to fitting, compression, and generalization-is not yet well understood. In this paper, we propose a novel approach to capturing such neural network dynamics using information-bottleneck-type techniques, involving the replacement of mutual information measures (which are notoriously difficult to estimate in high-dimensional spaces) by other more tractable ones, including (1) the minimum mean-squared error associated with the reconstruction of the network input data from some intermediate network representation and (2) the cross-entropy associated with a certain class label given some network representation. We then conducted an empirical study in order to ascertain how different network models, network learning algorithms, and datasets may affect the learning dynamics. Our experiments show that our proposed approach appears to be more reliable in comparison with classical information bottleneck ones in capturing network dynamics during both the training and testing phases. Our experiments also reveal that the fitting and compression phases exist regardless of the choice of activation function. Additionally, our findings suggest that model architectures, training algorithms, and datasets that lead to better generalization tend to exhibit more pronounced fitting and compression phases.
Collapse
Affiliation(s)
- Zhaoyan Lyu
- Department of Electronic and Electrical Engineering, University College London, Gower St., London WC1E 6BT, UK
| | - Gholamali Aminian
- The Alan Turing Institute, British Library, 96 Euston Rd., London NW1 2DB, UK
| | - Miguel R D Rodrigues
- Department of Electronic and Electrical Engineering, University College London, Gower St., London WC1E 6BT, UK
| |
Collapse
|
7
|
Wickstrøm KK, Løkse S, Kampffmeyer MC, Yu S, Príncipe JC, Jenssen R. Analysis of Deep Convolutional Neural Networks Using Tensor Kernels and Matrix-Based Entropy. ENTROPY (BASEL, SWITZERLAND) 2023; 25:899. [PMID: 37372243 DOI: 10.3390/e25060899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/31/2023] [Accepted: 06/02/2023] [Indexed: 06/29/2023]
Abstract
Analyzing deep neural networks (DNNs) via information plane (IP) theory has gained tremendous attention recently to gain insight into, among others, DNNs' generalization ability. However, it is by no means obvious how to estimate the mutual information (MI) between each hidden layer and the input/desired output to construct the IP. For instance, hidden layers with many neurons require MI estimators with robustness toward the high dimensionality associated with such layers. MI estimators should also be able to handle convolutional layers while at the same time being computationally tractable to scale to large networks. Existing IP methods have not been able to study truly deep convolutional neural networks (CNNs). We propose an IP analysis using the new matrix-based Rényi's entropy coupled with tensor kernels, leveraging the power of kernel methods to represent properties of the probability distribution independently of the dimensionality of the data. Our results shed new light on previous studies concerning small-scale DNNs using a completely new approach. We provide a comprehensive IP analysis of large-scale CNNs, investigating the different training phases and providing new insights into the training dynamics of large-scale neural networks.
Collapse
Affiliation(s)
- Kristoffer K Wickstrøm
- Machine Learning Group, Department of Physics and Technology, UiT The Arctic University of Norway, NO-9037 Tromsø, Norway
| | - Sigurd Løkse
- Machine Learning Group, Department of Physics and Technology, UiT The Arctic University of Norway, NO-9037 Tromsø, Norway
| | - Michael C Kampffmeyer
- Machine Learning Group, Department of Physics and Technology, UiT The Arctic University of Norway, NO-9037 Tromsø, Norway
- Norwegian Computing Center, Department of Statistical Analysis and Machine Learning, 114 Blindern, NO-0314 Oslo, Norway
| | - Shujian Yu
- Machine Learning Group, Department of Physics and Technology, UiT The Arctic University of Norway, NO-9037 Tromsø, Norway
- Computational NeuroEngineering Laboratory, Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA
- Department of Computer Science, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands
| | - José C Príncipe
- Computational NeuroEngineering Laboratory, Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32611, USA
| | - Robert Jenssen
- Machine Learning Group, Department of Physics and Technology, UiT The Arctic University of Norway, NO-9037 Tromsø, Norway
- Norwegian Computing Center, Department of Statistical Analysis and Machine Learning, 114 Blindern, NO-0314 Oslo, Norway
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark
| |
Collapse
|
8
|
Alesiani F, Yu S, Yu X. Gated information bottleneck for generalization in sequential environments. Knowl Inf Syst 2023. [DOI: 10.1007/s10115-022-01770-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
9
|
Geiger BC. On Information Plane Analyses of Neural Network Classifiers-A Review. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:7039-7051. [PMID: 34191733 DOI: 10.1109/tnnls.2021.3089037] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
We review the current literature concerned with information plane (IP) analyses of neural network (NN) classifiers. While the underlying information bottleneck theory and the claim that information-theoretic compression is causally linked to generalization are plausible, empirical evidence was found to be both supporting and conflicting. We review this evidence together with a detailed analysis of how the respective information quantities were estimated. Our survey suggests that compression visualized in IPs is not necessarily information-theoretic but is rather often compatible with geometric compression of the latent representations. This insight gives the IP a renewed justification. Aside from this, we shed light on the problem of estimating mutual information in deterministic NNs and its consequences. Specifically, we argue that, even in feedforward NNs, the data processing inequality needs not to hold for estimates of mutual information. Similarly, while a fitting phase, in which the mutual information is between the latent representation and the target increases, is necessary (but not sufficient) for good classification performance, depending on the specifics of mutual information estimation, such a fitting phase needs to not be visible in the IP.
Collapse
|
10
|
Peng X, Zhang J, Wang FY, Li L. Drill the Cork of Information Bottleneck by Inputting the Most Important Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6360-6372. [PMID: 34029196 DOI: 10.1109/tnnls.2021.3079112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Deep learning has become the most powerful machine learning tool in the last decade. However, how to efficiently train deep neural networks remains to be thoroughly solved. The widely used minibatch stochastic gradient descent (SGD) still needs to be accelerated. As a promising tool to better understand the learning dynamic of minibatch SGD, the information bottleneck (IB) theory claims that the optimization process consists of an initial fitting phase and the following compression phase. Based on this principle, we further study typicality sampling, an efficient data selection method, and propose a new explanation of how it helps accelerate the training process of the deep networks. We show that the fitting phase depicted in the IB theory will be boosted with a high signal-to-noise ratio of gradient approximation if the typicality sampling is appropriately adopted. Furthermore, this finding also implies that the prior information of the training set is critical to the optimization process, and the better use of the most important data can help the information flow through the bottleneck faster. Both theoretical analysis and experimental results on synthetic and real-world datasets demonstrate our conclusions.
Collapse
|
11
|
Du D, Chen J, Li Y, Ma K, Wu G, Zheng Y, Wang L. Cross-Domain Gated Learning for Domain Generalization. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01674-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Mielniczuk J. Information Theoretic Methods for Variable Selection-A Review. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1079. [PMID: 36010742 PMCID: PMC9407310 DOI: 10.3390/e24081079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 08/02/2022] [Accepted: 08/02/2022] [Indexed: 02/05/2023]
Abstract
We review the principal information theoretic tools and their use for feature selection, with the main emphasis on classification problems with discrete features. Since it is known that empirical versions of conditional mutual information perform poorly for high-dimensional problems, we focus on various ways of constructing its counterparts and the properties and limitations of such methods. We present a unified way of constructing such measures based on truncation, or truncation and weighing, for the Möbius expansion of conditional mutual information. We also discuss the main approaches to feature selection which apply the introduced measures of conditional dependence, together with the ways of assessing the quality of the obtained vector of predictors. This involves discussion of recent results on asymptotic distributions of empirical counterparts of criteria, as well as advances in resampling.
Collapse
Affiliation(s)
- Jan Mielniczuk
- Institute of Computer Science, Polish Academy of Sciences, Jana Kazimierza 5, 01-248 Warsaw, Poland;
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
| |
Collapse
|
13
|
Larsson DT, Maity D, Tsiotras P. A Generalized Information-Theoretic Framework for the Emergence of Hierarchical Abstractions in Resource-Limited Systems. ENTROPY 2022; 24:e24060809. [PMID: 35741530 PMCID: PMC9222931 DOI: 10.3390/e24060809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 06/05/2022] [Accepted: 06/06/2022] [Indexed: 02/01/2023]
Abstract
In this paper, a generalized information-theoretic framework for the emergence of multi-resolution hierarchical tree abstractions is developed. By leveraging ideas from information-theoretic signal encoding with side information, this paper develops a tree search problem which considers the generation of multi-resolution tree abstractions when there are multiple sources of relevant and irrelevant, or possibly confidential, information. We rigorously formulate an information-theoretic driven tree abstraction problem and discuss its connections with information-theoretic privacy and resource-limited systems. The problem structure is investigated and a novel algorithm, called G-tree search, is proposed. The proposed algorithm is analyzed and a number of theoretical results are established, including the optimally of the G-tree search algorithm. To demonstrate the utility of the proposed framework, we apply our method to a real-world example and provide a discussion of the results from the viewpoint of designing hierarchical abstractions for autonomous systems.
Collapse
Affiliation(s)
- Daniel T. Larsson
- D. Guggenheim School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0150, USA
- Correspondence:
| | - Dipankar Maity
- Department of Electrical and Computer Engineering, The University of North Carolina at Charlotte, Charlotte, NC 28223-0001, USA;
| | - Panagiotis Tsiotras
- D. Guggenheim School of Aerospace Engineering, Institute for Robotics and Intelligent Machines, Georgia Institute of Technology, Atlanta, GA 30332-0150, USA;
| |
Collapse
|
14
|
Zhai P, Zhang S. Adversarial Information Bottleneck. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:221-230. [PMID: 35594234 DOI: 10.1109/tnnls.2022.3172986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The information bottleneck (IB) principle has been adopted to explain deep learning in terms of information compression and prediction, which are balanced by a tradeoff hyperparameter. How to optimize the IB principle for better robustness and figure out the effects of compression through the tradeoff hyperparameter are two challenging problems. Previous methods attempted to optimize the IB principle by introducing random noise into learning the representation and achieved the state-of-the-art performance in the nuisance information compression and semantic information extraction. However, their performance on resisting adversarial perturbations is far less impressive. To this end, we propose an adversarial IB (AIB) method without any explicit assumptions about the underlying distribution of the representations, which can be optimized effectively by solving a min-max optimization problem. Numerical experiments on synthetic and real-world datasets demonstrate its effectiveness on learning more invariant representations and mitigating adversarial perturbations compared to several competing IB methods. In addition, we analyze the adversarial robustness of diverse IB methods contrasting with their IB curves and reveal that IB models with the hyperparameter β corresponding to the knee point in the IB curve achieve the best tradeoff between compression and prediction and has the best robustness against various attacks.
Collapse
|
15
|
Kline AG, Palmer SE. Gaussian Information Bottleneck and the Non-Perturbative Renormalization Group. NEW JOURNAL OF PHYSICS 2022; 24:033007. [PMID: 35368649 PMCID: PMC8967309 DOI: 10.1088/1367-2630/ac395d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The renormalization group (RG) is a class of theoretical techniques used to explain the collective physics of interacting, many-body systems. It has been suggested that the RG formalism may be useful in finding and interpreting emergent low-dimensional structure in complex systems outside of the traditional physics context, such as in biology or computer science. In such contexts, one common dimensionality-reduction framework already in use is information bottleneck (IB), in which the goal is to compress an "input" signal X while maximizing its mutual information with some stochastic "relevance" variable Y. IB has been applied in the vertebrate and invertebrate processing systems to characterize optimal encoding of the future motion of the external world. Other recent work has shown that the RG scheme for the dimer model could be "discovered" by a neural network attempting to solve an IB-like problem. This manuscript explores whether IB and any existing formulation of RG are formally equivalent. A class of soft-cutoff non-perturbative RG techniques are defined by families of non-deterministic coarsening maps, and hence can be formally mapped onto IB, and vice versa. For concreteness, this discussion is limited entirely to Gaussian statistics (GIB), for which IB has exact, closed-form solutions. Under this constraint, GIB has a semigroup structure, in which successive transformations remain IB-optimal. Further, the RG cutoff scheme associated with GIB can be identified. Our results suggest that IB can be used to impose a notion of "large scale" structure, such as biological function, on an RG procedure.
Collapse
Affiliation(s)
- Adam G Kline
- Department of Physics, The University of Chicago, Chicago IL 60637
| | - Stephanie E Palmer
- Department of Organismal Biology and Anatomy and Department of Physics, The University of Chicago, Chicago IL 60637
| |
Collapse
|
16
|
Gao Y, Chaudhari P. A free-energy principle for representation learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2021. [DOI: 10.1088/2632-2153/abf984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Abstract
This paper employs a formal connection of machine learning with thermodynamics to characterize the quality of learned representations for transfer learning. We discuss how information-theoretic functionals such as rate, distortion and classification loss of a model lie on a convex, so-called, equilibrium surface. We prescribe dynamical processes to traverse this surface under specific constraints; in particular we develop an iso-classification process that trades off rate and distortion to keep the classification loss unchanged. We demonstrate how this process can be used for transferring representations from a source task to a target task while keeping the classification loss constant. Experimental validation of the theoretical results is provided on image-classification datasets.
Collapse
|
17
|
Solopchuk O, Zénon A. Active sensing with artificial neural networks. Neural Netw 2021; 143:751-758. [PMID: 34482173 DOI: 10.1016/j.neunet.2021.08.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 07/11/2021] [Accepted: 08/06/2021] [Indexed: 10/20/2022]
Abstract
The fitness of behaving agents depends on their knowledge of the environment, which demands efficient exploration strategies. Active sensing formalizes exploration as reduction of uncertainty about the current state of the environment. Despite strong theoretical justifications, active sensing has had limited applicability due to difficulty in estimating information gain. Here we address this issue by proposing a linear approximation to information gain and by implementing efficient gradient-based action selection within an artificial neural network setting. We compare information gain estimation with state of the art, and validate our model on an active sensing task based on MNIST dataset. We also propose an approximation that exploits the amortized inference network, and performs equally well in certain contexts.
Collapse
Affiliation(s)
- Oleg Solopchuk
- Université catholique de Louvain, Brussels, Belgium; University of Bordeaux, Bordeaux, France.
| | | |
Collapse
|
18
|
Entropy 2021 Best Paper Award. ENTROPY 2021; 23:e23070865. [PMID: 34356406 PMCID: PMC8306394 DOI: 10.3390/e23070865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 07/05/2021] [Indexed: 11/16/2022]
Abstract
On behalf of the Editor-in-Chief, Prof [...].
Collapse
|
19
|
Song J, Zheng Y, Wang J, Zakir Ullah M, Jiao W. Multicolor image classification using the multimodal information bottleneck network (MMIB-Net) for detecting diabetic retinopathy. OPTICS EXPRESS 2021; 29:22732-22748. [PMID: 34266030 DOI: 10.1364/oe.430508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 06/26/2021] [Indexed: 06/13/2023]
Abstract
Multicolor (MC) imaging is an imaging modality that records confocal scanning laser ophthalmoscope (cSLO) fundus images, which can be used for the diabetic retinopathy (DR) detection. By utilizing this imaging technique, multiple modal images can be obtained in a single case. Additional symptomatic features can be obtained if these images are considered during the diagnosis of DR. However, few studies have been carried out to classify MC Images using deep learning methods, let alone using multi modal features for analysis. In this work, we propose a novel model which uses the multimodal information bottleneck network (MMIB-Net) to classify the MC Images for the detection of DR. Our model can extract the features of multiple modalities simultaneously while finding concise feature representations of each modality using the information bottleneck theory. MC Images classification can be achieved by picking up the combined representations and features of all modalities. In our experiments, it is shown that the proposed method can achieve an accurate classification of MC Images. Comparative experiments also demonstrate that the use of multimodality and information bottleneck improves the performance of MC Images classification. To the best of our knowledge, this is the first report of DR identification utilizing the multimodal information bottleneck convolutional neural network in MC Images.
Collapse
|
20
|
|
21
|
Sachdeva V, Mora T, Walczak AM, Palmer SE. Optimal prediction with resource constraints using the information bottleneck. PLoS Comput Biol 2021; 17:e1008743. [PMID: 33684112 PMCID: PMC7971903 DOI: 10.1371/journal.pcbi.1008743] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 03/18/2021] [Accepted: 01/27/2021] [Indexed: 11/19/2022] Open
Abstract
Responding to stimuli requires that organisms encode information about the external world. Not all parts of the input are important for behavior, and resource limitations demand that signals be compressed. Prediction of the future input is widely beneficial in many biological systems. We compute the trade-offs between representing the past faithfully and predicting the future using the information bottleneck approach, for input dynamics with different levels of complexity. For motion prediction, we show that, depending on the parameters in the input dynamics, velocity or position information is more useful for accurate prediction. We show which motion representations are easiest to re-use for accurate prediction in other motion contexts, and identify and quantify those with the highest transferability. For non-Markovian dynamics, we explore the role of long-term memory in shaping the internal representation. Lastly, we show that prediction in evolutionary population dynamics is linked to clustering allele frequencies into non-overlapping memories.
Collapse
Affiliation(s)
- Vedant Sachdeva
- Graduate Program in Biophysical Sciences, University of Chicago, Chicago, Illinois, United States of America
| | - Thierry Mora
- Laboratoire de physique de l’École normale supérieure, Centre National de la Recherche Scientifique, Paris, France
- Paris Sciences et Lettres University Paris, Paris, France
- Sorbonne Université Paris, Paris, France
- Université de Paris, Paris, France
| | - Aleksandra M. Walczak
- Laboratoire de physique de l’École normale supérieure, Centre National de la Recherche Scientifique, Paris, France
- Paris Sciences et Lettres University Paris, Paris, France
- Sorbonne Université Paris, Paris, France
- Université de Paris, Paris, France
| | - Stephanie E. Palmer
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois, United States of America
- Department of Physics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
22
|
Piasini E, Filipowicz ALS, Levine J, Gold JI. Embo: a Python package for empirical data analysis using the Information Bottleneck. JOURNAL OF OPEN RESEARCH SOFTWARE 2021; 9:10. [PMID: 37153754 PMCID: PMC10162586 DOI: 10.5334/jors.322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
We present embo, a Python package to analyze empirical data using the Information Bottleneck (IB) method and its variants, such as the Deterministic Information Bottleneck (DIB). Given two random variables X and Y, the IB finds the stochastic mapping M of X that encodes the most information about Y, subject to a constraint on the information that M is allowed to retain about X. Despite the popularity of the IB, an accessible implementation of the reference algorithm oriented towards ease of use on empirical data was missing. Embo is optimized for the common case of discrete, low-dimensional data. Embo is fast, provides a standard data-processing pipeline, offers a parallel implementation of key computational steps, and includes reasonable defaults for the method parameters. Embo is broadly applicable to different problem domains, as it can be employed with any dataset consisting in joint observations of two discrete variables. It is available from the Python Package Index (PyPI), Zenodo and GitLab.
Collapse
Affiliation(s)
- Eugenio Piasini
- Computational Neuroscience Initiative and Department of Physics and Astronomy, University of Pennsylvania
| | | | | | - Joshua I Gold
- Department of Neuroscience, University of Pennsylvania
| |
Collapse
|
23
|
Geiger BC, Kubin G. Information Bottleneck: Theory and Applications in Deep Learning. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E1408. [PMID: 33327417 PMCID: PMC7764901 DOI: 10.3390/e22121408] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 12/09/2020] [Indexed: 11/16/2022]
Abstract
The information bottleneck (IB) framework, proposed in [...].
Collapse
Affiliation(s)
| | - Gernot Kubin
- Signal Processing and Speech Communication Laboratory, Graz University of Technology, Inffeldgasse 16c, 8010 Graz, Austria;
| |
Collapse
|
24
|
Geiger BC, Fischer IS. A Comparison of Variational Bounds for the Information Bottleneck Functional. ENTROPY 2020; 22:e22111229. [PMID: 33286997 PMCID: PMC7712881 DOI: 10.3390/e22111229] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 10/19/2020] [Accepted: 10/20/2020] [Indexed: 11/18/2022]
Abstract
In this short note, we relate the variational bounds proposed in Alemi et al. (2017) and Fischer (2020) for the information bottleneck (IB) and the conditional entropy bottleneck (CEB) functional, respectively. Although the two functionals were shown to be equivalent, it was empirically observed that optimizing bounds on the CEB functional achieves better generalization performance and adversarial robustness than optimizing those on the IB functional. This work tries to shed light on this issue by showing that, in the most general setting, no ordering can be established between these variational bounds, while such an ordering can be enforced by restricting the feasible sets over which the optimizations take place. The absence of such an ordering in the general setup suggests that the variational bound on the CEB functional is either more amenable to optimization or a relevant cost function for optimization in its own regard, i.e., without justification from the IB or CEB functionals.
Collapse
|
25
|
Xiao P, Cheng S, Stankovic V, Vukobratovic D. Averaging Is Probably Not the Optimum Way of Aggregating Parameters in Federated Learning. ENTROPY 2020; 22:e22030314. [PMID: 33286088 PMCID: PMC7516771 DOI: 10.3390/e22030314] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Accepted: 03/09/2020] [Indexed: 11/16/2022]
Abstract
Federated learning is a decentralized topology of deep learning, that trains a shared model through data distributed among each client (like mobile phones, wearable devices), in order to ensure data privacy by avoiding raw data exposed in data center (server). After each client computes a new model parameter by stochastic gradient descent (SGD) based on their own local data, these locally-computed parameters will be aggregated to generate an updated global model. Many current state-of-the-art studies aggregate different client-computed parameters by averaging them, but none theoretically explains why averaging parameters is a good approach. In this paper, we treat each client computed parameter as a random vector because of the stochastic properties of SGD, and estimate mutual information between two client computed parameters at different training phases using two methods in two learning tasks. The results confirm the correlation between different clients and show an increasing trend of mutual information with training iteration. However, when we further compute the distance between client computed parameters, we find that parameters are getting more correlated while not getting closer. This phenomenon suggests that averaging parameters may not be the optimum way of aggregating trained parameters.
Collapse
Affiliation(s)
- Peng Xiao
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China;
| | - Samuel Cheng
- The School of Electrical and Computer Engineering, University of Oklahoma, Tulsa, OK 73019, USA
- Correspondence:
| | - Vladimir Stankovic
- Department of Electronic and Electrical engineering, University of Strathclyde, Glasgow G1 1XW, UK;
| | - Dejan Vukobratovic
- Faculty of Technical Sciences, University of Novi Sad, 21000 Novi Sad, Serbia;
| |
Collapse
|
26
|
Rodríguez Gálvez B, Thobaben R, Skoglund M. The Convex Information Bottleneck Lagrangian. ENTROPY 2020; 22:e22010098. [PMID: 33285873 PMCID: PMC7516537 DOI: 10.3390/e22010098] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 01/03/2020] [Accepted: 01/08/2020] [Indexed: 12/05/2022]
Abstract
The information bottleneck (IB) problem tackles the issue of obtaining relevant compressed representations T of some random variable X for the task of predicting Y. It is defined as a constrained optimization problem that maximizes the information the representation has about the task, I(T;Y), while ensuring that a certain level of compression r is achieved (i.e., I(X;T)≤r). For practical reasons, the problem is usually solved by maximizing the IB Lagrangian (i.e., LIB(T;β)=I(T;Y)−βI(X;T)) for many values of β∈[0,1]. Then, the curve of maximal I(T;Y) for a given I(X;T) is drawn and a representation with the desired predictability and compression is selected. It is known when Y is a deterministic function of X, the IB curve cannot be explored and another Lagrangian has been proposed to tackle this problem: the squared IB Lagrangian: Lsq−IB(T;βsq)=I(T;Y)−βsqI(X;T)2. In this paper, we (i) present a general family of Lagrangians which allow for the exploration of the IB curve in all scenarios; (ii) provide the exact one-to-one mapping between the Lagrange multiplier and the desired compression rate r for known IB curve shapes; and (iii) show we can approximately obtain a specific compression level with the convex IB Lagrangian for both known and unknown IB curve shapes. This eliminates the burden of solving the optimization problem for many values of the Lagrange multiplier. That is, we prove that we can solve the original constrained problem with a single optimization.
Collapse
|
27
|
Abstract
Information bottleneck (IB) is a technique for extracting information in one random variable X that is relevant for predicting another random variable Y. IB works by encoding X in a compressed “bottleneck” random variable M from which Y can be accurately decoded. However, finding the optimal bottleneck variable involves a difficult optimization problem, which until recently has been considered for only two limited cases: discrete X and Y with small state spaces, and continuous X and Y with a Gaussian joint distribution (in which case optimal encoding and decoding maps are linear). We propose a method for performing IB on arbitrarily-distributed discrete and/or continuous X and Y, while allowing for nonlinear encoding and decoding maps. Our approach relies on a novel non-parametric upper bound for mutual information. We describe how to implement our method using neural networks. We then show that it achieves better performance than the recently-proposed “variational IB” method on several real-world datasets.
Collapse
Affiliation(s)
- Artemy Kolchinsky
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA; (B.D.T.); (D.H.W.)
- Correspondence:
| | - Brendan D. Tracey
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA; (B.D.T.); (D.H.W.)
- Department of Aeronautics & Astronautics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - David H. Wolpert
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA; (B.D.T.); (D.H.W.)
- Complexity Science Hub, 1080 Vienna, Austria
- Center for Bio-Social Complex Systems, Arizona State University, Tempe, AZ 85281, USA
| |
Collapse
|