1
|
Dado T, Papale P, Lozano A, Le L, Wang F, van Gerven M, Roelfsema P, Güçlütürk Y, Güçlü U. Brain2GAN: Feature-disentangled neural encoding and decoding of visual perception in the primate brain. PLoS Comput Biol 2024; 20:e1012058. [PMID: 38709818 PMCID: PMC11098503 DOI: 10.1371/journal.pcbi.1012058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Revised: 05/16/2024] [Accepted: 04/08/2024] [Indexed: 05/08/2024] Open
Abstract
A challenging goal of neural coding is to characterize the neural representations underlying visual perception. To this end, multi-unit activity (MUA) of macaque visual cortex was recorded in a passive fixation task upon presentation of faces and natural images. We analyzed the relationship between MUA and latent representations of state-of-the-art deep generative models, including the conventional and feature-disentangled representations of generative adversarial networks (GANs) (i.e., z- and w-latents of StyleGAN, respectively) and language-contrastive representations of latent diffusion networks (i.e., CLIP-latents of Stable Diffusion). A mass univariate neural encoding analysis of the latent representations showed that feature-disentangled w representations outperform both z and CLIP representations in explaining neural responses. Further, w-latent features were found to be positioned at the higher end of the complexity gradient which indicates that they capture visual information relevant to high-level neural activity. Subsequently, a multivariate neural decoding analysis of the feature-disentangled representations resulted in state-of-the-art spatiotemporal reconstructions of visual perception. Taken together, our results not only highlight the important role of feature-disentanglement in shaping high-level neural representations underlying visual perception but also serve as an important benchmark for the future of neural coding.
Collapse
Affiliation(s)
- Thirza Dado
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Paolo Papale
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Antonio Lozano
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Lynn Le
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Feng Wang
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
| | - Marcel van Gerven
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Pieter Roelfsema
- Department of Vision and Cognition, Netherlands Institute for Neuroscience, Amsterdam, Netherlands
- Laboratory of Visual Brain Therapy, Sorbonne University, Paris, France
- Department of Integrative Neurophysiology, VU Amsterdam, Amsterdam, Netherlands
- Department of Psychiatry, Amsterdam UMC, Amsterdam, Netherlands
| | - Yağmur Güçlütürk
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Umut Güçlü
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
2
|
Meng L, Yang C. Dual-Guided Brain Diffusion Model: Natural Image Reconstruction from Human Visual Stimulus fMRI. Bioengineering (Basel) 2023; 10:1117. [PMID: 37892847 PMCID: PMC10604156 DOI: 10.3390/bioengineering10101117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 09/20/2023] [Accepted: 09/21/2023] [Indexed: 10/29/2023] Open
Abstract
The reconstruction of visual stimuli from fMRI signals, which record brain activity, is a challenging task with crucial research value in the fields of neuroscience and machine learning. Previous studies tend to emphasize reconstructing pixel-level features (contours, colors, etc.) or semantic features (object category) of the stimulus image, but typically, these properties are not reconstructed together. In this context, we introduce a novel three-stage visual reconstruction approach called the Dual-guided Brain Diffusion Model (DBDM). Initially, we employ the Very Deep Variational Autoencoder (VDVAE) to reconstruct a coarse image from fMRI data, capturing the underlying details of the original image. Subsequently, the Bootstrapping Language-Image Pre-training (BLIP) model is utilized to provide a semantic annotation for each image. Finally, the image-to-image generation pipeline of the Versatile Diffusion (VD) model is utilized to recover natural images from the fMRI patterns guided by both visual and semantic information. The experimental results demonstrate that DBDM surpasses previous approaches in both qualitative and quantitative comparisons. In particular, the best performance is achieved by DBDM in reconstructing the semantic details of the original image; the Inception, CLIP and SwAV distances are 0.611, 0.225 and 0.405, respectively. This confirms the efficacy of our model and its potential to advance visual decoding research.
Collapse
Affiliation(s)
- Lu Meng
- College of Information Science and Engineering, Northeastern University, Shenyang 110819, China;
| | | |
Collapse
|
3
|
Ozcelik F, VanRullen R. Natural scene reconstruction from fMRI signals using generative latent diffusion. Sci Rep 2023; 13:15666. [PMID: 37731047 PMCID: PMC10511448 DOI: 10.1038/s41598-023-42891-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 09/15/2023] [Indexed: 09/22/2023] Open
Abstract
In neural decoding research, one of the most intriguing topics is the reconstruction of perceived natural images based on fMRI signals. Previous studies have succeeded in re-creating different aspects of the visuals, such as low-level properties (shape, texture, layout) or high-level features (category of objects, descriptive semantics of scenes) but have typically failed to reconstruct these properties together for complex scene images. Generative AI has recently made a leap forward with latent diffusion models capable of generating high-complexity images. Here, we investigate how to take advantage of this innovative technology for brain decoding. We present a two-stage scene reconstruction framework called "Brain-Diffuser". In the first stage, starting from fMRI signals, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Variational Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model (Versatile Diffusion) conditioned on predicted multimodal (text and visual) features, to generate final reconstructed images. On the publicly available Natural Scenes Dataset benchmark, our method outperforms previous models both qualitatively and quantitatively. When applied to synthetic fMRI patterns generated from individual ROI (region-of-interest) masks, our trained model creates compelling "ROI-optimal" scenes consistent with neuroscientific knowledge. Thus, the proposed methodology can have an impact on both applied (e.g. brain-computer interface) and fundamental neuroscience.
Collapse
Affiliation(s)
- Furkan Ozcelik
- CerCo, CNRS UMR5549, Toulouse, France.
- Universite de Toulouse, Toulouse, France.
| | - Rufin VanRullen
- CerCo, CNRS UMR5549, Toulouse, France
- Universite de Toulouse, Toulouse, France
- ANITI, Toulouse, France
| |
Collapse
|
4
|
Ren Z, Li J, Xue X, Li X, Yang F, Jiao Z, Gao X. Reconstructing controllable faces from brain activity with hierarchical multiview representations. Neural Netw 2023; 166:487-500. [PMID: 37574622 DOI: 10.1016/j.neunet.2023.07.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 05/21/2023] [Accepted: 07/12/2023] [Indexed: 08/15/2023]
Abstract
Reconstructing visual experience from brain responses measured by functional magnetic resonance imaging (fMRI) is a challenging yet important research topic in brain decoding, especially it has proved more difficult to decode visually similar stimuli, such as faces. Although face attributes are known as the key to face recognition, most existing methods generally ignore how to decode facial attributes more precisely in perceived face reconstruction, which often leads to indistinguishable reconstructed faces. To solve this problem, we propose a novel neural decoding framework called VSPnet (voxel2style2pixel) by establishing hierarchical encoding and decoding networks with disentangled latent representations as media, so that to recover visual stimuli more elaborately. And we design a hierarchical visual encoder (named HVE) to pre-extract features containing both high-level semantic knowledge and low-level visual details from stimuli. The proposed VSPnet consists of two networks: Multi-branch cognitive encoder and style-based image generator. The encoder network is constructed by multiple linear regression branches to map brain signals to the latent space provided by the pre-extracted visual features and obtain representations containing hierarchical information consistent to the corresponding stimuli. We make the generator network inspired by StyleGAN to untangle the complexity of fMRI representations and generate images. And the HVE network is composed of a standard feature pyramid over a ResNet backbone. Extensive experimental results on the latest public datasets have demonstrated the reconstruction accuracy of our proposed method outperforms the state-of-the-art approaches and the identifiability of different reconstructed faces has been greatly improved. In particular, we achieve feature editing for several facial attributes in fMRI domain based on the multiview (i.e., visual stimuli and evoked fMRI) latent representations.
Collapse
Affiliation(s)
- Ziqi Ren
- School of Electronic Engineering, Xidian University, Xi'an 710071, China
| | - Jie Li
- School of Electronic Engineering, Xidian University, Xi'an 710071, China
| | - Xuetong Xue
- School of Electronic Engineering, Xidian University, Xi'an 710071, China
| | - Xin Li
- Group 42 (G42), Abu Dhabi, United Arab Emirates
| | - Fan Yang
- Group 42 (G42), Abu Dhabi, United Arab Emirates
| | - Zhicheng Jiao
- The Warren Alpert Medical School, Brown University, RI, USA; Department of Diagnostic Imaging, Rhode Island Hospital, RI, USA
| | - Xinbo Gao
- School of Electronic Engineering, Xidian University, Xi'an 710071, China.
| |
Collapse
|
5
|
Gong C, Jing C, Chen X, Pun CM, Huang G, Saha A, Nieuwoudt M, Li HX, Hu Y, Wang S. Generative AI for brain image computing and brain network computing: a review. Front Neurosci 2023; 17:1203104. [PMID: 37383107 PMCID: PMC10293625 DOI: 10.3389/fnins.2023.1203104] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 05/22/2023] [Indexed: 06/30/2023] Open
Abstract
Recent years have witnessed a significant advancement in brain imaging techniques that offer a non-invasive approach to mapping the structure and function of the brain. Concurrently, generative artificial intelligence (AI) has experienced substantial growth, involving using existing data to create new content with a similar underlying pattern to real-world data. The integration of these two domains, generative AI in neuroimaging, presents a promising avenue for exploring various fields of brain imaging and brain network computing, particularly in the areas of extracting spatiotemporal brain features and reconstructing the topological connectivity of brain networks. Therefore, this study reviewed the advanced models, tasks, challenges, and prospects of brain imaging and brain network computing techniques and intends to provide a comprehensive picture of current generative AI techniques in brain imaging. This review is focused on novel methodological approaches and applications of related new methods. It discussed fundamental theories and algorithms of four classic generative models and provided a systematic survey and categorization of tasks, including co-registration, super-resolution, enhancement, classification, segmentation, cross-modality, brain network analysis, and brain decoding. This paper also highlighted the challenges and future directions of the latest work with the expectation that future research can be beneficial.
Collapse
Affiliation(s)
- Changwei Gong
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Changhong Jing
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| | - Xuhang Chen
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Chi Man Pun
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Guoli Huang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Ashirbani Saha
- Department of Oncology and School of Biomedical Engineering, McMaster University, Hamilton, ON, Canada
| | - Martin Nieuwoudt
- Institute for Biomedical Engineering, Stellenbosch University, Stellenbosch, South Africa
| | - Han-Xiong Li
- Department of Systems Engineering, City University of Hong Kong, Hong Kong, China
| | - Yong Hu
- Department of Orthopaedics and Traumatology, The University of Hong Kong, Hong Kong, China
| | - Shuqiang Wang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Department of Computer Science, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
6
|
Marcus Lionel Brown C. Extended Mind Over Matter: Privacy Protection Is the Sine Qua Non. AJOB Neurosci 2023; 14:97-99. [PMID: 37097877 DOI: 10.1080/21507740.2023.2188283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2023]
|
7
|
Asadi N, Olson IR, Obradovic Z. A transformer model for learning spatiotemporal contextual representation in fMRI data. Netw Neurosci 2023; 7:22-47. [PMID: 37334006 PMCID: PMC10270708 DOI: 10.1162/netn_a_00281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 09/26/2022] [Indexed: 09/24/2023] Open
Abstract
Representation learning is a core component in data-driven modeling of various complex phenomena. Learning a contextually informative representation can especially benefit the analysis of fMRI data because of the complexities and dynamic dependencies present in such datasets. In this work, we propose a framework based on transformer models to learn an embedding of the fMRI data by taking the spatiotemporal contextual information in the data into account. This approach takes the multivariate BOLD time series of the regions of the brain as well as their functional connectivity network simultaneously as the input to create a set of meaningful features that can in turn be used in various downstream tasks such as classification, feature extraction, and statistical analysis. The proposed spatiotemporal framework uses the attention mechanism as well as the graph convolution neural network to jointly inject the contextual information regarding the dynamics in time series data and their connectivity into the representation. We demonstrate the benefits of this framework by applying it to two resting-state fMRI datasets, and provide further discussion on various aspects and advantages of it over a number of other commonly adopted architectures.
Collapse
Affiliation(s)
- Nima Asadi
- Department of Computer and Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA
| | - Ingrid R. Olson
- Department of Psychology and Neuroscience, College of Liberal Arts, Temple University, Philadelphia, PA, USA
- Decision Neuroscience, College of Liberal Arts, Temple University, Philadelphia, PA, USA
| | - Zoran Obradovic
- Department of Computer and Information Sciences, College of Science and Technology, Temple University, Philadelphia, PA, USA
| |
Collapse
|
8
|
Hou X, Zhao J, Zhang H. Reconstruction of perceived face images from brain activities based on multi-attribute constraints. Front Neurosci 2022; 16:1015752. [PMID: 36389231 PMCID: PMC9643433 DOI: 10.3389/fnins.2022.1015752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 10/10/2022] [Indexed: 11/24/2022] Open
Abstract
Reconstruction of perceived faces from brain signals is a hot topic in brain decoding and an important application in the field of brain-computer interfaces. Existing methods do not fully consider the multiple facial attributes represented in face images, and their different activity patterns at multiple brain regions are often ignored, which causes the reconstruction performance very poor. In the current study, we propose an algorithmic framework that efficiently combines multiple face-selective brain regions for precise multi-attribute perceived face reconstruction. Our framework consists of three modules: a multi-task deep learning network (MTDLN), which is developed to simultaneously extract the multi-dimensional face features attributed to facial expression, identity and gender from one single face image, a set of linear regressions (LR), which is built to map the relationship between the multi-dimensional face features and the brain signals from multiple brain regions, and a multi-conditional generative adversarial network (mcGAN), which is used to generate the perceived face images constrained by the predicted multi-dimensional face features. We conduct extensive fMRI experiments to evaluate the reconstruction performance of our framework both subjectively and objectively. The results show that, compared with the traditional methods, our proposed framework better characterizes the multi-attribute face features in a face image, better predicts the face features from brain signals, and achieves better reconstruction performance of both seen and unseen face images in both visual effects and quantitative assessment. Moreover, besides the state-of-the-art intra-subject reconstruction performance, our proposed framework can also realize inter-subject face reconstruction to a certain extent.
Collapse
Affiliation(s)
- Xiaoyuan Hou
- School of Engineering Medicine, Beihang University, Beijing, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Jing Zhao
- School of Engineering Medicine, Beihang University, Beijing, China
- School of Biological Science and Medical Engineering, Beihang University, Beijing, China
| | - Hui Zhang
- School of Engineering Medicine, Beihang University, Beijing, China
- Key Laboratory of Biomechanics and Mechanobiology, Ministry of Education, Beihang University, Beijing, China
- Key Laboratory of Big Data-Based Precision Medicine, Ministry of Industry and Information Technology of the People’s Republic of China, Beihang University, Beijing, China
| |
Collapse
|
9
|
Kim SG. On the encoding of natural music in computational models and human brains. Front Neurosci 2022; 16:928841. [PMID: 36203808 PMCID: PMC9531138 DOI: 10.3389/fnins.2022.928841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open
Abstract
This article discusses recent developments and advances in the neuroscience of music to understand the nature of musical emotion. In particular, it highlights how system identification techniques and computational models of music have advanced our understanding of how the human brain processes the textures and structures of music and how the processed information evokes emotions. Musical models relate physical properties of stimuli to internal representations called features, and predictive models relate features to neural or behavioral responses and test their predictions against independent unseen data. The new frameworks do not require orthogonalized stimuli in controlled experiments to establish reproducible knowledge, which has opened up a new wave of naturalistic neuroscience. The current review focuses on how this trend has transformed the domain of the neuroscience of music.
Collapse
|
10
|
Berezutskaya J, Ambrogioni L, Ramsey NF, van Gerven MAJ. Towards Naturalistic Speech Decoding from Intracranial Brain Data. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:3100-3104. [PMID: 36085779 DOI: 10.1109/embc48229.2022.9871301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Speech decoding from brain activity can enable development of brain-computer interfaces (BCIs) to restore naturalistic communication in paralyzed patients. Previous work has focused on development of decoding models from isolated speech data with a clean background and multiple repetitions of the material. In this study, we describe a novel approach to speech decoding that relies on a generative adversarial neural network (GAN) to reconstruct speech from brain data recorded during a naturalistic speech listening task (watching a movie). We compared the GAN-based approach, where reconstruction was done from the compressed latent representation of sound decoded from the brain, with several baseline models that reconstructed sound spectrogram directly. We show that the novel approach provides more accurate reconstructions compared to the baselines. These results underscore the potential of GAN models for speech decoding in naturalistic noisy environments and further advancing of BCIs for naturalistic communication. Clinical Relevance - This study presents a novel speech decoding paradigm that combines advances in deep learning, speech synthesis and neural engineering, and has the potential to advance the field of BCI for severely paralyzed individuals.
Collapse
|