1
|
Physics-informed and data-driven discovery of governing equations for complex phenomena in heterogeneous media. Phys Rev E 2024; 109:041001. [PMID: 38755895 DOI: 10.1103/physreve.109.041001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Indexed: 05/18/2024]
Abstract
Rapid evolution of sensor technology, advances in instrumentation, and progress in devising data-acquisition software and hardware are providing vast amounts of data for various complex phenomena that occur in heterogeneous media, ranging from those in atmospheric environment, to large-scale porous formations, and biological systems. The tremendous increase in the speed of scientific computing has also made it possible to emulate diverse multiscale and multiphysics phenomena that contain elements of stochasticity or heterogeneity, and to generate large volumes of numerical data for them. Thus, given a heterogeneous system with annealed or quenched disorder in which a complex phenomenon occurs, how should one analyze and model the system and phenomenon, explain the data, and make predictions for length and time scales much larger than those over which the data were collected? We divide such systems into three distinct classes. (i) Those for which the governing equations for the physical phenomena of interest, as well as data, are known, but solving the equations over large length scales and long times is very difficult. (ii) Those for which data are available, but the governing equations are only partially known, in the sense that they either contain various coefficients that must be evaluated based on the data, or that the number of degrees of freedom of the system is so large that deriving the complete equations is very difficult, if not impossible, as a result of which one must develop the governing equations with reduced dimensionality. (iii) In the third class are systems for which large amounts of data are available, but the governing equations for the phenomena of interest are not known. Several classes of physics-informed and data-driven approaches for analyzing and modeling of the three classes of systems have been emerging, which are based on machine learning, symbolic regression, the Koopman operator, the Mori-Zwanzig projection operator formulation, sparse identification of nonlinear dynamics, data assimilation combined with a neural network, and stochastic optimization and analysis. This perspective describes such methods and the latest developments in this highly important and rapidly expanding area and discusses possible future directions.
Collapse
|
2
|
LMD: Multiscale Marker Identification in Single-cell RNA-seq Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.12.566780. [PMID: 38014159 PMCID: PMC10680591 DOI: 10.1101/2023.11.12.566780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Accurate cell marker identification in single-cell RNA-seq data is crucial for understanding cellular diversity and function. An ideal marker is highly specific in identifying cells that are similar in terms of function and state. Current marker identification methods, commonly based on clustering and differential expression, capture general cell-type markers but often miss markers for subtypes or functional cell subsets, with their performance largely dependent on clustering quality. Moreover, cluster-independent approaches tend to favor genes that lack the specificity required to characterize regions within the transcriptomic space at multiple scales. Here we introduce Localized Marker Detector (LMD), a novel tool to identify "localized genes" - genes with expression profiles specific to certain groups of highly similar cells - thereby characterizing cellular diversity in a multi-resolution and fine-grained manner. LMD's strategy involves building a cell-cell affinity graph, diffusing the gene expression value across the cell graph, and assigning a score to each gene based on its diffusion dynamics. We show that LMD exhibits superior accuracy in recovering known cell-type markers in the Tabula Muris bone marrow dataset relative to other methods for marker identification. Notably, markers favored by LMD exhibit localized expression, whereas markers prioritized by other clustering-free algorithms are often dispersed in the transcriptomic space. We further group the markers suggested by LMD into functional gene modules to improve the separation of cell types and subtypes in a more fine-grained manner. These modules also identify other sources of variation, such as cell cycle status. In conclusion, LMD is a novel algorithm that can identify fine-grained markers for cell subtypes or functional states without relying on clustering or differential expression analysis. LMD exploits the complex interactions among cells and reveals cellular diversity at high resolution.
Collapse
|
3
|
White matter functional gradients and their formation in adolescence. Cereb Cortex 2023; 33:10770-10783. [PMID: 37727985 DOI: 10.1093/cercor/bhad319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 09/21/2023] Open
Abstract
It is well known that functional magnetic resonance imaging (fMRI) is a widely used tool for studying brain activity. Recent research has shown that fluctuations in fMRI data can reflect functionally meaningful patterns of brain activity within the white matter. We leveraged resting-state fMRI from an adolescent population to characterize large-scale white matter functional gradients and their formation during adolescence. The white matter showed gray-matter-like unimodal-to-transmodal and sensorimotor-to-visual gradients with specific cognitive associations and a unique superficial-to-deep gradient with nonspecific cognitive associations. We propose two mechanisms for their formation in adolescence. One is a "function-molded" mechanism that may mediate the maturation of the transmodal white matter via the transmodal gray matter. The other is a "structure-root" mechanism that may support the mutual mediation roles of the unimodal and transmodal white matter maturation during adolescence. Thus, the spatial layout of the white matter functional gradients is in concert with the gray matter functional organization. The formation of the white matter functional gradients may be driven by brain anatomical wiring and functional needs.
Collapse
|
4
|
A Tutorial on the Spectral Theory of Markov Chains. Neural Comput 2023; 35:1713-1796. [PMID: 37725706 DOI: 10.1162/neco_a_01611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 06/06/2023] [Indexed: 09/21/2023]
Abstract
Markov chains are a class of probabilistic models that have achieved widespread application in the quantitative sciences. This is in part due to their versatility, but is compounded by the ease with which they can be probed analytically. This tutorial provides an in-depth introduction to Markov chains and explores their connection to graphs and random walks. We use tools from linear algebra and graph theory to describe the transition matrices of different types of Markov chains, with a particular focus on exploring properties of the eigenvalues and eigenvectors corresponding to these matrices. The results presented are relevant to a number of methods in machine learning and data mining, which we describe at various stages. Rather than being a novel academic study in its own right, this text presents a collection of known results, together with some new concepts. Moreover, the tutorial focuses on offering intuition to readers rather than formal understanding and only assumes basic exposure to concepts from linear algebra and probability theory. It is therefore accessible to students and researchers from a wide variety of disciplines.
Collapse
|
5
|
Functional connectivity gradients of the cingulate cortex. Commun Biol 2023; 6:650. [PMID: 37337086 DOI: 10.1038/s42003-023-05029-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 06/08/2023] [Indexed: 06/21/2023] Open
Abstract
Heterogeneity of the cingulate cortex is evident in multiple dimensions including anatomy, function, connectivity, and involvement in networks and diseases. Using the recently developed functional connectivity gradient approach and resting-state functional MRI data, we found three functional connectivity gradients that captured distinct dimensions of cingulate hierarchical organization. The principal gradient exhibited a radiating organization with transitions from the middle toward both anterior and posterior parts of the cingulate cortex and was related to canonical functional networks and corresponding behavioral domains. The second gradient showed an anterior-posterior axis across the cingulate cortex and had prominent geometric distance dependence. The third gradient displayed a marked differentiation of subgenual and caudal middle with other parts of the cingulate cortex and was associated with cortical morphology. Aside from providing an updated framework for understanding the multifaceted nature of cingulate heterogeneity, the observed hierarchical organization of the cingulate cortex may constitute a novel research agenda with potential applications in basic and clinical neuroscience.
Collapse
|
6
|
Long-range functional connections mirror and link microarchitectural and cognitive hierarchies in the human brain. Cereb Cortex 2023; 33:1782-1798. [PMID: 35596951 PMCID: PMC9977370 DOI: 10.1093/cercor/bhac172] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 03/30/2022] [Accepted: 04/01/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Higher-order cognition is hypothesized to be implemented via distributed cortical networks that are linked via long-range connections. However, it is unknown how computational advantages of long-range connections reflect cortical microstructure and microcircuitry. METHODS We investigated this question by (i) profiling long-range cortical connectivity using resting-state functional magnetic resonance imaging (MRI) and cortico-cortical geodesic distance mapping, (ii) assessing how long-range connections reflect local brain microarchitecture, and (iii) examining the microarchitectural similarity of regions connected through long-range connections. RESULTS Analysis of 2 independent datasets indicated that sensory/motor areas had more clustered short-range connections, while transmodal association systems hosted distributed, long-range connections. Meta-analytical decoding suggested that this topographical difference mirrored shifts in cognitive function, from perception/action towards emotional/social processing. Analysis of myelin-sensitive in vivo MRI as well as postmortem histology and transcriptomics datasets established that gradients in functional connectivity distance are paralleled by those present in cortical microarchitecture. Notably, long-range connections were found to link spatially remote regions of association cortex with an unexpectedly similar microarchitecture. CONCLUSIONS By mapping covarying topographies of long-range functional connections and cortical microcircuits, the current work provides insights into structure-function relations in human neocortex.
Collapse
|
7
|
Unsupervised Learning of Interacting Topological Phases from Experimental Observables. FUNDAMENTAL RESEARCH 2023. [DOI: 10.1016/j.fmre.2022.12.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
|
8
|
Functional Gradients of the Cerebellum: a Review of Practical Applications. CEREBELLUM (LONDON, ENGLAND) 2022; 21:1061-1072. [PMID: 34741753 PMCID: PMC9072599 DOI: 10.1007/s12311-021-01342-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 10/25/2021] [Indexed: 11/29/2022]
Abstract
Gradient-based analyses have contributed to the description of cerebellar functional neuroanatomy. More recently, functional gradients of the cerebellum have been used as a multi-purpose tool for neuroimaging research. Here, we provide an overview of the many practical applications of cerebellar functional gradient analyses. These practical applications include examination of intra-cerebellar and cerebellar-extracerebellar organization; transformation of functional gradients into parcellations with discrete borders; projection of functional gradients calculated within cerebellar structures to other extracerebellar structures; interpretation of cerebellar neuroimaging findings using qualitative and quantitative methods; detection of differences in patient populations; and other more complex practical applications of cerebellar gradient-based analyses. This review may serve as an introduction and catalog of options for neuroscientists who wish to design and analyze imaging studies using functional gradients of the cerebellum.
Collapse
|
9
|
GeodesicEmbedding (GE): A High-Dimensional Embedding Approach for Fast Geodesic Distance Queries. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:4930-4939. [PMID: 34478373 DOI: 10.1109/tvcg.2021.3109975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
In this article, we develop a novel method for fast geodesic distance queries. The key idea is to embed the mesh into a high-dimensional space, such that the euclidean distance in the high-dimensional space can induce the geodesic distance in the original manifold surface. However, directly solving the high-dimensional embedding problem is not feasible due to the large number of variables and the fact that the embedding problem is highly nonlinear. We overcome the challenges with two novel ideas. First, instead of taking all vertices as variables, we embed only the saddle vertices, which greatly reduces the problem complexity. We then compute a local embedding for each non-saddle vertex. Second, to reduce the large approximation error resulting from the purely euclidean embedding, we propose a cascaded optimization approach that repeatedly introduces additional embedding coordinates with a non-euclidean function to reduce the approximation residual. Using the precomputation data, our approach can determine the geodesic distance between any two vertices in near-constant time. Computational testing results show that our method is more desirable than previous geodesic distance queries methods.
Collapse
|
10
|
Functional connectivity gradients of the insula to different cerebral systems. Hum Brain Mapp 2022; 44:790-800. [PMID: 36206289 PMCID: PMC9842882 DOI: 10.1002/hbm.26099] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 09/16/2022] [Accepted: 09/24/2022] [Indexed: 01/25/2023] Open
Abstract
The diverse functional roles of the insula may emerge from its heavy connectivity to an extensive network of cortical and subcortical areas. Despite several previous attempts to investigate the hierarchical organization of the insula by applying the recently developed gradient approach to insula-to-whole brain connectivity data, little is known about whether and how there is variability across connectivity gradients of the insula to different cerebral systems. Resting-state functional MRI data from 793 healthy subjects were used to discover and validate functional connectivity gradients of the insula, which were computed based on its voxel-wise functional connectivity profiles to distinct cerebral systems. We identified three primary patterns of functional connectivity gradients of the insula to distinct cerebral systems. The connectivity gradients to the higher-order transmodal associative systems, including the prefrontal, posterior parietal, temporal cortices, and limbic lobule, showed a ventroanterior-dorsal axis across the insula; those to the lower-order unimodal primary systems, including the motor, somatosensory, and occipital cortices, displayed radiating transitions from dorsoanterior toward both ventroanterior and dorsoposterior parts of the insula; the connectivity gradient to the subcortical nuclei exhibited an organization along the anterior-posterior axis of the insula. Apart from complementing and extending previous literature on the heterogeneous connectivity patterns of insula subregions, the presented framework may offer ample opportunities to refine our understanding of the role of the insula in many brain disorders.
Collapse
|
11
|
Functional Parcellation of Human Brain Using Localized Topo-Connectivity Mapping. IEEE TRANSACTIONS ON MEDICAL IMAGING 2022; 41:2670-2680. [PMID: 35442885 PMCID: PMC9844109 DOI: 10.1109/tmi.2022.3168888] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The analysis of connectivity between parcellated regions of cortex provides insights into the functional architecture of the brain at a systems level. However, the derivation of functional structures from voxel-wise analyses at finer scales remains a challenge. We propose a novel method, called localized topo-connectivity mapping with singular-value-decomposition-informed filtering (or filtered LTM), to identify and characterize voxel-wise functional structures in the human brain from resting-state fMRI data. Here we describe its mathematical formulation and provide a proof-of-concept using simulated data that allow an intuitive interpretation of the results of filtered LTM. The algorithm has also been applied to 7T fMRI data acquired as part of the Human Connectome Project to generate group-average LTM images. Generally, most of the functional structures revealed by LTM images agree in the boundaries with anatomical structures identified by T1-weighted images and fractional anisotropy maps derived from diffusion MRI. In addition, the LTM images also reveal subtle functional variations that are not apparent in the anatomical structures. To assess the performance of LTM images, the subcortical region and occipital white matter were separately parcellated. Statistical tests were performed to demonstrate that the synchronies of fMRI signals in LTM-derived functional parcels are significantly larger than those with geometric perturbations. Overall, the filtered LTM approach can serve as a tool to investigate the functional organization of the brain at the scale of individual voxels as measured in fMRI.
Collapse
|
12
|
Abstract
AbstractMultimodal neuroimaging grants a powerful window into the structure and function of the human brain at multiple scales. Recent methodological and conceptual advances have enabled investigations of the interplay between large-scale spatial trends (also referred to as gradients) in brain microstructure and connectivity, offering an integrative framework to study multiscale brain organization. Here, we share a multimodal MRI dataset for Microstructure-Informed Connectomics (MICA-MICs) acquired in 50 healthy adults (23 women; 29.54 ± 5.62 years) who underwent high-resolution T1-weighted MRI, myelin-sensitive quantitative T1 relaxometry, diffusion-weighted MRI, and resting-state functional MRI at 3 Tesla. In addition to raw anonymized MRI data, this release includes brain-wide connectomes derived from (i) resting-state functional imaging, (ii) diffusion tractography, (iii) microstructure covariance analysis, and (iv) geodesic cortical distance, gathered across multiple parcellation scales. Alongside, we share large-scale gradients estimated from each modality and parcellation scale. Our dataset will facilitate future research examining the coupling between brain microstructure, connectivity, and function. MICA-MICs is available on the Canadian Open Neuroscience Platform data portal (https://portal.conp.ca) and the Open Science Framework (https://osf.io/j532r/).
Collapse
|
13
|
Recovery of Conformational Continuum From Single-Particle Cryo-EM Images: Optimization of ManifoldEM Informed by Ground Truth. IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING 2022; 8:462-478. [PMID: 36258699 PMCID: PMC9575687 DOI: 10.1109/tci.2022.3174801] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
This work is based on the manifold-embedding approach to study biological molecules exhibiting continuous conformational changes. Previous work established a method-now termed ManifoldEM-capable of reconstructing 3D movies and accompanying free-energy landscapes from single-particle cryo-EM images of macromolecules exercising multiple conformational degrees of freedom. While ManifoldEM has proven its viability in several experimental studies, critical limitations and uncertainties have been found throughout its extended development and use. Guided by insights from studies with cryo-EM ground-truth data, simulated from atomic structures undergoing conformational changes, we have built a novel framework, ESPER, able to retrieve the free-energy landscape and respective 3D Coulomb potential maps for all states simulated. As shown by a direct comparison of ground truth vs. recovered maps, and analysis of experimental data from the 80S ribosome and ryanodine receptor, ESPER offers substantial improvements relative to the previous work.
Collapse
|
14
|
Elastic network modeling of cellular networks unveils sensor and effector genes that control information flow. PLoS Comput Biol 2022; 18:e1010181. [PMID: 35639793 PMCID: PMC9216591 DOI: 10.1371/journal.pcbi.1010181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 06/22/2022] [Accepted: 05/07/2022] [Indexed: 12/03/2022] Open
Abstract
The high-level organization of the cell is embedded in indirect relationships that connect distinct cellular processes. Existing computational approaches for detecting indirect relationships between genes typically consist of propagating abstract information through network representations of the cell. However, the selection of genes to serve as the source of propagation is inherently biased by prior knowledge. Here, we sought to derive an unbiased view of the high-level organization of the cell by identifying the genes that propagate and receive information most effectively in the cell, and the indirect relationships between these genes. To this aim, we adapted a perturbation-response scanning strategy initially developed for identifying allosteric interactions within proteins. We deployed this strategy onto an elastic network model of the yeast genetic interaction profile similarity network. This network revealed a superior propensity for information propagation relative to simulated networks with similar topology. Perturbation-response scanning identified the major distributors and receivers of information in the network, named effector and sensor genes, respectively. Effectors formed dense clusters centrally integrated into the network, whereas sensors formed loosely connected antenna-shaped clusters and contained genes with previously characterized involvement in signal transduction. We propose that indirect relationships between effector and sensor clusters represent major paths of information flow between distinct cellular processes. Genetic similarity networks for fission yeast and human displayed similarly strong propensities for information propagation and clusters of effector and sensor genes, suggesting that the global architecture enabling indirect relationships is evolutionarily conserved across species. Our results demonstrate that elastic network modeling of cellular networks constitutes a promising strategy to probe the high-level organization and cooperativity in the cell.
Collapse
|
15
|
Discovering causal structure with reproducing-kernel Hilbert space ε-machines. CHAOS (WOODBURY, N.Y.) 2022; 32:023103. [PMID: 35232043 DOI: 10.1063/5.0062829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 01/10/2022] [Indexed: 06/14/2023]
Abstract
We merge computational mechanics' definition of causal states (predictively equivalent histories) with reproducing-kernel Hilbert space (RKHS) representation inference. The result is a widely applicable method that infers causal structure directly from observations of a system's behaviors whether they are over discrete or continuous events or time. A structural representation-a finite- or infinite-state kernel ϵ-machine-is extracted by a reduced-dimension transform that gives an efficient representation of causal states and their topology. In this way, the system dynamics are represented by a stochastic (ordinary or partial) differential equation that acts on causal states. We introduce an algorithm to estimate the associated evolution operator. Paralleling the Fokker-Planck equation, it efficiently evolves causal-state distributions and makes predictions in the original data space via an RKHS functional mapping. We demonstrate these techniques, together with their predictive abilities, on discrete-time, discrete-value infinite Markov-order processes generated by finite-state hidden Markov models with (i) finite or (ii) uncountably infinite causal states and (iii) continuous-time, continuous-value processes generated by thermally driven chaotic flows. The method robustly estimates causal structure in the presence of varying external and measurement noise levels and for very high-dimensional data.
Collapse
|
16
|
Machine Learning for Structure Determination in Single-Particle Cryo-Electron Microscopy: A Systematic Review. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:452-472. [PMID: 34932487 DOI: 10.1109/tnnls.2021.3131325] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recently, single-particle cryo-electron microscopy (cryo-EM) has become an indispensable method for determining macromolecular structures at high resolution to deeply explore the relevant molecular mechanism. Its recent breakthrough is mainly because of the rapid advances in hardware and image processing algorithms, especially machine learning. As an essential support of single-particle cryo-EM, machine learning has powered many aspects of structure determination and greatly promoted its development. In this article, we provide a systematic review of the applications of machine learning in this field. Our review begins with a brief introduction of single-particle cryo-EM, followed by the specific tasks and challenges of its image processing. Then, focusing on the workflow of structure determination, we describe relevant machine learning algorithms and applications at different steps, including particle picking, 2-D clustering, 3-D reconstruction, and other steps. As different tasks exhibit distinct characteristics, we introduce the evaluation metrics for each task and summarize their dynamics of technology development. Finally, we discuss the open issues and potential trends in this promising field.
Collapse
|
17
|
Unsupervised Learning Framework With Multidimensional Scaling in Predicting Epithelial-Mesenchymal Transitions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2714-2723. [PMID: 32386162 DOI: 10.1109/tcbb.2020.2992605] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Clustering tumor metastasis samples from gene expression data at the whole genome level remains an arduous challenge, in particular, when the number of experimental samples is small and the number of genes is huge. We focus on the prediction of the epithelial-mesenchymal transition (EMT), which is an underlying mechanism of tumor metastasis, here, rather than tumor metastasis itself, to avoid confounding effects of uncertainties derived from various factors. In this paper, we propose a novel model in predicting EMT based on multidimensional scaling (MDS) strategies and integrating entropy and random matrix detection strategies to determine the optimal reduced number of dimension in low dimensional space. We verified our proposed model with the gene expression data for EMT samples of breast cancer and the experimental results demonstrated the superiority over state-of-the-art clustering methods. Furthermore, we developed a novel feature extraction method for selecting the significant genes and predicting the tumor metastasis. The source code is available at "https://github.com/yushanqiu/yushan.qiu-szu.edu.cn".
Collapse
|
18
|
Functional Territories of Human Dentate Nucleus. Cereb Cortex 2021; 30:2401-2417. [PMID: 31701117 DOI: 10.1093/cercor/bhz247] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 09/17/2019] [Indexed: 12/27/2022] Open
Abstract
Anatomical connections link the cerebellar cortex with multiple sensory, motor, association, and paralimbic cerebral areas. The majority of fibers that exit cerebellar cortex synapse in dentate nuclei (DN) before reaching extracerebellar structures such as cerebral cortex, but the functional neuroanatomy of human DN remains largely unmapped. Neuroimaging research has redefined broad categories of functional division in the human brain showing that primary processing, attentional (task positive) processing, and default-mode (task negative) processing are three central poles of neural macroscale functional organization. This broad spectrum of human neural processing categories is represented not only in the cerebral cortex, but also in the thalamus, striatum, and cerebellar cortex. Whether functional organization in DN obeys a similar set of macroscale divisions, and whether DN are yet another compartment of representation of a broad spectrum of human neural processing categories, remains unknown. Here, we show for the first time that human DN are optimally divided into three functional territories as indexed by high spatio-temporal resolution resting-state MRI in 77 healthy humans, and that these three distinct territories contribute uniquely to default-mode, salience-motor, and visual cerebral cortical networks. Our findings provide a systems neuroscience substrate for cerebellar output to influence multiple broad categories of neural control.
Collapse
|
19
|
Tetrahedral spectral feature-Based bayesian manifold learning for grey matter morphometry: Findings from the Alzheimer's disease neuroimaging initiative. Med Image Anal 2021; 72:102123. [PMID: 34214958 PMCID: PMC8316398 DOI: 10.1016/j.media.2021.102123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 03/30/2021] [Accepted: 05/26/2021] [Indexed: 11/17/2022]
Abstract
Structural and anatomical analyses of magnetic resonance imaging (MRI) data often require a reconstruction of the three-dimensional anatomy to a statistical shape model. Our prior work demonstrated the usefulness of tetrahedral spectral features for grey matter morphometry. However, most of the current methods provide a large number of descriptive shape features, but lack an unsupervised scheme to automatically extract a concise set of features with clear biological interpretations and that also carries strong statistical power. Here we introduce a new tetrahedral spectral feature-based Bayesian manifold learning framework for effective statistical analysis of grey matter morphology. We start by solving the technical issue of generating tetrahedral meshes which preserve the details of the grey matter geometry. We then derive explicit weak-form tetrahedral discretizations of the Hamiltonian operator (HO) and the Laplace-Beltrami operator (LBO). Next, the Schrödinger's equation is solved for constructing the scale-invariant wave kernel signature (SIWKS) as the shape descriptor. By solving the heat equation and utilizing the SIWKS, we design a morphometric Gaussian process (M-GP) regression framework and an active learning strategy to select landmarks as concrete shape descriptors. We evaluate the proposed system on publicly available data from the Alzheimers Disease Neuroimaging Initiative (ADNI), using subjects structural MRI covering the range from cognitively unimpaired (CU) to full blown Alzheimer's disease (AD). Our analyses suggest that the SIWKS and M-GP compare favorably with seven other baseline algorithms to obtain grey matter morphometry-based diagnoses. Our work may inspire more tetrahedral spectral feature-based Bayesian learning research in medical image analysis.
Collapse
|
20
|
Interaction Between Cerebellum and Cerebral Cortex, Evidence from Dynamic Causal Modeling. THE CEREBELLUM 2021; 21:225-233. [PMID: 34146220 DOI: 10.1007/s12311-021-01284-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 05/23/2021] [Indexed: 01/05/2023]
Abstract
The interaction of the cerebellum with cerebral cortical dynamics is still poorly understood. In this paper, dynamical causal modeling is used to examine the interaction between cerebellum and cerebral cortex as indexed by MRI resting-state functional connectivity in three large-scale networks on healthy young adults (N = 200; Human Connectome Project dataset). These networks correspond roughly to default mode, task positive, and motor as determined by prior cerebellar functional gradient analyses. We find uniform interactions within all considered networks from cerebellum to cerebral cortex, providing support for the notion of a universal cerebellar transform. Our results provide a foundation for future analyses to quantify and further investigate whether this is a property that is unique to the interactions from cerebellum to cerebral cortex.
Collapse
|
21
|
Unsupervised Learning of Non-Hermitian Topological Phases. PHYSICAL REVIEW LETTERS 2021; 126:240402. [PMID: 34213933 DOI: 10.1103/physrevlett.126.240402] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 04/10/2021] [Accepted: 05/18/2021] [Indexed: 06/13/2023]
Abstract
Non-Hermitian topological phases bear a number of exotic properties, such as the non-Hermitian skin effect and the breakdown of conventional bulk-boundary correspondence. In this Letter, we introduce an unsupervised machine learning approach to classify non-Hermitian topological phases based on diffusion maps, which are widely used in manifold learning. We find that the non-Hermitian skin effect will pose a notable obstacle, rendering the straightforward extension of unsupervised learning approaches to topological phases for Hermitian systems ineffective in clustering non-Hermitian topological phases. Through theoretical analysis and numerical simulations of two prototypical models, we show that this difficulty can be circumvented by choosing the "on-site" elements of the projective matrix as the input data. Our results provide a valuable guidance for future studies on learning non-Hermitian topological phases in an unsupervised fashion, both in theory and experiment.
Collapse
|
22
|
"Dividing and Conquering" and "Caching" in Molecular Modeling. Int J Mol Sci 2021; 22:5053. [PMID: 34068835 PMCID: PMC8126232 DOI: 10.3390/ijms22095053] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 04/26/2021] [Accepted: 04/27/2021] [Indexed: 11/17/2022] Open
Abstract
Molecular modeling is widely utilized in subjects including but not limited to physics, chemistry, biology, materials science and engineering. Impressive progress has been made in development of theories, algorithms and software packages. To divide and conquer, and to cache intermediate results have been long standing principles in development of algorithms. Not surprisingly, most important methodological advancements in more than half century of molecular modeling are various implementations of these two fundamental principles. In the mainstream classical computational molecular science, tremendous efforts have been invested on two lines of algorithm development. The first is coarse graining, which is to represent multiple basic particles in higher resolution modeling as a single larger and softer particle in lower resolution counterpart, with resulting force fields of partial transferability at the expense of some information loss. The second is enhanced sampling, which realizes "dividing and conquering" and/or "caching" in configurational space with focus either on reaction coordinates and collective variables as in metadynamics and related algorithms, or on the transition matrix and state discretization as in Markov state models. For this line of algorithms, spatial resolution is maintained but results are not transferable. Deep learning has been utilized to realize more efficient and accurate ways of "dividing and conquering" and "caching" along these two lines of algorithmic research. We proposed and demonstrated the local free energy landscape approach, a new framework for classical computational molecular science. This framework is based on a third class of algorithm that facilitates molecular modeling through partially transferable in resolution "caching" of distributions for local clusters of molecular degrees of freedom. Differences, connections and potential interactions among these three algorithmic directions are discussed, with the hope to stimulate development of more elegant, efficient and reliable formulations and algorithms for "dividing and conquering" and "caching" in complex molecular systems.
Collapse
|
23
|
Convergence of cortical types and functional motifs in the human mesiotemporal lobe. eLife 2020; 9:e60673. [PMID: 33146610 PMCID: PMC7671688 DOI: 10.7554/elife.60673] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 11/03/2020] [Indexed: 01/24/2023] Open
Abstract
The mesiotemporal lobe (MTL) is implicated in many cognitive processes, is compromised in numerous brain disorders, and exhibits a gradual cytoarchitectural transition from six-layered parahippocampal isocortex to three-layered hippocampal allocortex. Leveraging an ultra-high-resolution histological reconstruction of a human brain, our study showed that the dominant axis of MTL cytoarchitectural differentiation follows the iso-to-allocortical transition and depth-specific variations in neuronal density. Projecting the histology-derived MTL model to in-vivo functional MRI, we furthermore determined how its cytoarchitecture underpins its intrinsic effective connectivity and association to large-scale networks. Here, the cytoarchitectural gradient was found to underpin intrinsic effective connectivity of the MTL, but patterns differed along the anterior-posterior axis. Moreover, while the iso-to-allocortical gradient parametrically represented the multiple-demand relative to task-negative networks, anterior-posterior gradients represented transmodal versus unimodal networks. Our findings establish that the combination of micro- and macrostructural features allow the MTL to represent dominant motifs of whole-brain functional organisation.
Collapse
|
24
|
A multi-scale cortical wiring space links cellular architecture and functional dynamics in the human brain. PLoS Biol 2020; 18:e3000979. [PMID: 33253185 PMCID: PMC7728398 DOI: 10.1371/journal.pbio.3000979] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 12/10/2020] [Accepted: 11/02/2020] [Indexed: 12/11/2022] Open
Abstract
The vast net of fibres within and underneath the cortex is optimised to support the convergence of different levels of brain organisation. Here, we propose a novel coordinate system of the human cortex based on an advanced model of its connectivity. Our approach is inspired by seminal, but so far largely neglected models of cortico-cortical wiring established by postmortem anatomical studies and capitalises on cutting-edge in vivo neuroimaging and machine learning. The new model expands the currently prevailing diffusion magnetic resonance imaging (MRI) tractography approach by incorporation of additional features of cortical microstructure and cortico-cortical proximity. Studying several datasets and different parcellation schemes, we could show that our coordinate system robustly recapitulates established sensory-limbic and anterior-posterior dimensions of brain organisation. A series of validation experiments showed that the new wiring space reflects cortical microcircuit features (including pyramidal neuron depth and glial expression) and allowed for competitive simulations of functional connectivity and dynamics based on resting-state functional magnetic resonance imaging (rs-fMRI) and human intracranial electroencephalography (EEG) coherence. Our results advance our understanding of how cell-specific neurobiological gradients produce a hierarchical cortical wiring scheme that is concordant with increasing functional sophistication of human brain organisation. Our evaluations demonstrate the cortical wiring space bridges across scales of neural organisation and can be easily translated to single individuals.
Collapse
|
25
|
|
26
|
Abstract
Our understanding of cerebellar involvement in brain disorders has evolved from motor processing to high-level cognitive and affective processing. Recent neuroscience progress has highlighted hierarchy as a fundamental principle for the brain organization. Despite substantial research on cerebellar dysfunction in schizophrenia, there is a need to establish a neurobiological framework to better understand the co-occurrence and interaction of low- and high-level functional abnormalities of cerebellum in schizophrenia. To help to establish such a framework, we investigated the abnormalities in the distribution of sensorimotor-supramodal hierarchical processing topography in the cerebellum and cerebellar-cerebral circuits in schizophrenia using a novel gradient-based resting-state functional connectivity (FC) analysis (96 patients with schizophrenia vs 120 healthy controls). We found schizophrenia patients showed a compression of the principal motor-to-supramodal gradient. Specifically, there were increased gradient values in sensorimotor regions and decreased gradient values in supramodal regions, resulting in a shorter distance (compression) between the sensorimotor and supramodal poles of this gradient. This pattern was observed in intra-cerebellar, cerebellar-cerebral, and cerebral-cerebellar FC. Further investigation revealed hyper-connectivity between sensorimotor and cognition areas within cerebellum, between cerebellar sensorimotor and cerebral cognition areas, and between cerebellar cognition and cerebral sensorimotor areas, possibly contributing to the observed compressed pattern. These findings present a novel mechanism that may underlie the co-occurrence and interaction of low- and high-level functional abnormalities of cerebellar and cerebro-cerebellar circuits in schizophrenia. Within this framework of abnormal motor-to-supramodal organization, a cascade of impairments stemming from disrupted low-level sensorimotor system may in part account for high-level cognitive cerebellar dysfunction in schizophrenia.
Collapse
|
27
|
Two-sample statistics based on anisotropic kernels. INFORMATION AND INFERENCE : A JOURNAL OF THE IMA 2020; 9:677-719. [PMID: 32929389 DOI: 10.1093/imaiai/iaz018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Revised: 02/14/2019] [Accepted: 06/13/2019] [Indexed: 11/13/2022]
Abstract
The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between [Formula: see text] data points and a set of [Formula: see text] reference points, where [Formula: see text] can be drastically smaller than [Formula: see text]. While the proposed statistic can be viewed as a special class of Reproducing Kernel Hilbert Space MMD, the consistency of the test is proved, under mild assumptions of the kernel, as long as [Formula: see text], and a finite-sample lower bound of the testing power is obtained. Applications to flow cytometry and diffusion MRI datasets are demonstrated, which motivate the proposed approach to compare distributions.
Collapse
|
28
|
Myeloarchitecture gradients in the human insula: Histological underpinnings and association to intrinsic functional connectivity. Neuroimage 2020; 216:116859. [DOI: 10.1016/j.neuroimage.2020.116859] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 03/13/2020] [Accepted: 04/13/2020] [Indexed: 12/11/2022] Open
|
29
|
MYC Drives Temporal Evolution of Small Cell Lung Cancer Subtypes by Reprogramming Neuroendocrine Fate. Cancer Cell 2020; 38:60-78.e12. [PMID: 32473656 PMCID: PMC7393942 DOI: 10.1016/j.ccell.2020.05.001] [Citation(s) in RCA: 216] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 03/23/2020] [Accepted: 04/30/2020] [Indexed: 02/06/2023]
Abstract
Small cell lung cancer (SCLC) is a neuroendocrine tumor treated clinically as a single disease with poor outcomes. Distinct SCLC molecular subtypes have been defined based on expression of ASCL1, NEUROD1, POU2F3, or YAP1. Here, we use mouse and human models with a time-series single-cell transcriptome analysis to reveal that MYC drives dynamic evolution of SCLC subtypes. In neuroendocrine cells, MYC activates Notch to dedifferentiate tumor cells, promoting a temporal shift in SCLC from ASCL1+ to NEUROD1+ to YAP1+ states. MYC alternatively promotes POU2F3+ tumors from a distinct cell type. Human SCLC exhibits intratumoral subtype heterogeneity, suggesting that this dynamic evolution occurs in patient tumors. These findings suggest that genetics, cell of origin, and tumor cell plasticity determine SCLC subtype.
Collapse
|
30
|
Transition from Tracy–Widom to Gaussian fluctuations of extremal eigenvalues of sparse Erdős–Rényi graphs. ANN PROBAB 2020. [DOI: 10.1214/19-aop1378] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
31
|
A Branch Point on Differentiation Trajectory is the Bifurcating Event Revealed by Dynamical Network Biomarker Analysis of Single-Cell Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:366-375. [PMID: 29994127 DOI: 10.1109/tcbb.2018.2847690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The advance in single-cell profiling technologies and the development in computational algorithms provide the opportunity to reconstruct pseudo temporal trajectory with branch point of cellular development. On the other hand, theories such as dynamical network biomarkers (DNB) theory have been recently proposed to characterize the pre-transition state in biological systems. Few studies have validated whether the branch point identified in pseudo time is the critical point in dynamical system. In this study, the dynamical behavior of the branch point on the pseudo trajectory has been investigated. We study the pseudo temporal trajectories reconstructed by Wishbone and diffusion pseudotime analysis (DPT) algorithms, as well as the simulated trajectory. DNB theory is applied to justify the bifurcating event on the pseudo trajectories. Our results demonstrate that the branch point recovered by Wishbone and DPT algorithms is confirmed as a transition state in cell differentiation process by DNB theory. Furthermore, we show that an appropriate DNB group will amplify the comprehensive index of critical event as defined in DNB theory. Our study provides biological insights on pseudo trajectory with branch point in a dynamical view and also indicates that DNB theory may serve as a benchmark to check the validity of branch point.
Collapse
|
32
|
Diffusion Mapping of Eosinophil-Activation State. Cytometry A 2020; 97:253-258. [PMID: 31472007 PMCID: PMC7079009 DOI: 10.1002/cyto.a.23884] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 06/26/2019] [Accepted: 08/19/2019] [Indexed: 12/13/2022]
Abstract
Eosinophils are granular leukocytes that play a role in mediating inflammatory responses linked to infection and allergic disease. Their activation during an immune response triggers spatial reorganization and eventual cargo release from intracellular granules. Understanding this process is important in diagnosing eosinophilic disorders and in assessing treatment efficacy; however, current protocols are limited to simply quantifying the number of eosinophils within a blood sample. Given that high optical absorption and scattering by the granular structure of these cells lead to marked image features, the physical changes that occur during activation should be trackable using image analysis. Here, we present a study in which imaging flow cytometry is used to quantify eosinophil activation state, based on the extraction of 85 distinct spatial features from dark-field images formed by light scattered orthogonally to the illuminating beam. We apply diffusion mapping, a time inference method that orders cells on a trajectory based on similar image features. Analysis of exogenous cell activation using eotaxin and endogenous activation in donor samples with elevated eosinophil counts shows that cell position along the diffusion-path line correlates with activation level (99% confidence level). Thus, the diffusion mapping provides an activation metric for each cell. Assessment of activated and control populations using both this spatial image-based, activation score and the integrated side-scatter intensity shows an improved Fisher discriminant ratio rd = 0.7 for the multivariate technique compared with an rd = 0.47 for the traditional whole-cell scatter metric. © 2019 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.
Collapse
|
33
|
Using single-cell RNA sequencing to unravel cell lineage relationships in the respiratory tract. Biochem Soc Trans 2020; 48:327-336. [DOI: 10.1042/bst20191010] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 12/18/2019] [Accepted: 12/19/2019] [Indexed: 01/07/2023]
Abstract
The respiratory tract is lined by a pseudo-stratified epithelium from the nose to terminal bronchioles. This first line of defense of the lung against external stress includes five main cell types: basal, suprabasal, club, goblet and multiciliated cells, as well as rare cells such as ionocytes, neuroendocrine and tuft/brush cells. At homeostasis, this epithelium self-renews at low rate but is able of fast regeneration upon damage. Airway epithelial cell lineages during regeneration have been investigated in the mouse by genetic labeling, mainly after injuring the epithelium with noxious agents. From these approaches, basal cells have been identified as progenitors of club, goblet and multiciliated cells, but also of ionocytes and neuroendocrine cells. Single-cell RNA sequencing, coupled to lineage inference algorithms, has independently allowed the establishment of comprehensive pictures of cell lineage relationships in both mouse and human. In line with genetic tracing experiments in mouse trachea, studies using single-cell RNA sequencing (RNAseq) have shown that basal cells first differentiate into club cells, which in turn mature into goblet cells or differentiate into multiciliated cells. In the human airway epithelium, single-cell RNAseq has identified novel intermediate populations such as deuterosomal cells, ‘hybrid’ mucous-multiciliated cells and progenitors of rare cells. Novel differentiation dynamics, such as a transition from goblet to multiciliated cells have also been discovered. The future of cell lineage relationships in the respiratory tract now resides in the combination of genetic labeling approaches with single-cell RNAseq to establish, in a definitive manner, the hallmarks of cellular lineages in normal and pathological situations.
Collapse
|
34
|
Clustering-independent analysis of genomic data using spectral simplicial theory. PLoS Comput Biol 2019; 15:e1007509. [PMID: 31756191 PMCID: PMC6897424 DOI: 10.1371/journal.pcbi.1007509] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2019] [Revised: 12/06/2019] [Accepted: 10/25/2019] [Indexed: 12/20/2022] Open
Abstract
The prevailing paradigm for the analysis of biological data involves comparing groups of replicates from different conditions (e.g. control and treatment) to statistically infer features that discriminate them (e.g. differentially expressed genes). However, many situations in modern genomics such as single-cell omics experiments do not fit well into this paradigm because they lack true replicates. In such instances, spectral techniques could be used to rank features according to their degree of consistency with an underlying metric structure without the need to cluster samples. Here, we extend spectral methods for feature selection to abstract simplicial complexes and present a general framework for clustering-independent analysis. Combinatorial Laplacian scores take into account the topology spanned by the data and reduce to the ordinary Laplacian score when restricted to graphs. We demonstrate the utility of this framework with several applications to the analysis of gene expression and multi-modal genomic data. Specifically, we perform differential expression analysis in situations where samples cannot be grouped into distinct classes, and we disaggregate differentially expressed genes according to the topology of the expression space (e.g. alternative paths of differentiation). We also apply this formalism to identify genes with spatial patterns of expression using fluorescence in-situ hybridization data and to establish associations between genetic alterations and global expression patterns in large cross-sectional studies. Our results provide a unifying perspective on topological data analysis and manifold learning approaches to the analysis of large-scale biological datasets. Manifold learning methods have emerged as a way of analyzing the large high-dimensional data sets that are currently generated in many areas of science. They assume the data has been sampled from an unknown manifold which is approximated with a graph and utilize spectral graph techniques to perform unsupervised feature selection and dimensionality reduction. However, graphs provide only partial approximations to manifolds, precluding the application to features with a complex combinatorial structure. Relatedly, these methods cannot take into account the topology of the manifold. In this work, we extend spectral methods for feature selection to topological spaces built from data and present a general framework for feature selection. We present specific applications of this framework to clustering-independent analysis of gene expression and multi-modal genomic data. In particular, using these methods, we perform differential expression analysis in situations where samples cannot be grouped into distinct classes, and we disaggregate the results according to topological features of the expression space. In addition, we identify genes with spatial patterns of expression using spatially-resolved transcriptomic data and establish associations between genetic alterations and global expression patterns in large cross-sectional cancer studies.
Collapse
|
35
|
Low dimensional representations along intrinsic reaction coordinates and molecular dynamics trajectories using interatomic distance matrices. Chem Sci 2019; 10:9954-9968. [PMID: 32055352 PMCID: PMC6991188 DOI: 10.1039/c9sc02742d] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Accepted: 08/23/2019] [Indexed: 01/22/2023] Open
Abstract
Most chemical transformations (reactions or conformational changes) that are of interest to researchers have many degrees of freedom, usually too many to visualize without reducing the dimensionality of the system to include only the most important atomic motions. In this article, we describe a method of using Principal Component Analysis (PCA) for analyzing a series of molecular geometries (e.g., a reaction pathway or molecular dynamics trajectory) and determining the reduced dimensional space that captures the most structural variance in the fewest dimensions. The software written to carry out this method is called PathReducer, which permits (1) visualizing the geometries in a reduced dimensional space, (2) determining the axes that make up the reduced dimensional space, and (3) projecting the series of geometries into the low-dimensional space for visualization. We investigated two options to represent molecular structures within PathReducer: aligned Cartesian coordinates and matrices of interatomic distances. We found that interatomic distance matrices better captured non-linear motions in a smaller number of dimensions. To demonstrate the utility of PathReducer, we have carried out a number of applications where we have projected molecular dynamics trajectories into a reduced dimensional space defined by an intrinsic reaction coordinate. The visualizations provided by this analysis show that dynamic paths can differ greatly from the minimum energy pathway on a potential energy surface. Viewing intrinsic reaction coordinates and trajectories in this way provides a quick way to gather qualitative information about the pathways trajectories take relative to a minimum energy path. Given that the outputs from PCA are linear combinations of the input molecular structure coordinates (i.e., Cartesian coordinates or interatomic distances), they can be easily transferred to other types of calculations that require the definition of a reduced dimensional space (e.g., biased molecular dynamics simulations).
Collapse
|
36
|
Human Cortical Organoids Expose a Differential Function of GSK3 on Cortical Neurogenesis. Stem Cell Reports 2019; 13:847-861. [PMID: 31607568 PMCID: PMC6893153 DOI: 10.1016/j.stemcr.2019.09.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 09/12/2019] [Accepted: 09/13/2019] [Indexed: 01/08/2023] Open
Abstract
The regulation of the proliferation and polarity of neural progenitors is crucial for the development of the brain cortex. Animal studies have implicated glycogen synthase kinase 3 (GSK3) as a pivotal regulator of both proliferation and polarity, yet the functional relevance of its signaling for the unique features of human corticogenesis remains to be elucidated. We harnessed human cortical brain organoids to probe the longitudinal impact of GSK3 inhibition through multiple developmental stages. Chronic GSK3 inhibition increased the proliferation of neural progenitors and caused massive derangement of cortical tissue architecture. Single-cell transcriptome profiling revealed a direct impact on early neurogenesis and uncovered a selective role of GSK3 in the regulation of glutamatergic lineages and outer radial glia output. Our dissection of the GSK3-dependent transcriptional network in human corticogenesis underscores the robustness of the programs determining neuronal identity independent of tissue architecture. Cortical organoids recapitulate stereotypical neurogenic trajectories GSK3 inhibition disrupts neuroepithelium polarity and cortical tissue organization GSK3 activity controls oRG production and neurogenesis
Collapse
|
37
|
Abstract
Clustering large and complex data sets whose partitions may adopt arbitrary shapes remains a difficult challenge. Part of this challenge comes from the difficulty in defining a similarity measure between the data points that captures the underlying geometry of those data points. In this paper, we propose an algorithm, DCG++ that generates such a similarity measure that is data-driven and ultrametric. DCG++ uses Markov Chain Random Walks to capture the intrinsic geometry of data, scans possible scales, and combines all this information using a simple procedure that is shown to generate an ultrametric. We validate the effectiveness of this similarity measure within the context of clustering on synthetic data with complex geometry, on a real-world data set containing segmented audio records of frog calls described by mel-frequency cepstral coefficients, as well as on an image segmentation problem. The experimental results show a significant improvement on performance with the DCG-based ultrametric compared to using an empirical distance measure.
Collapse
|
38
|
Microstructural and functional gradients are increasingly dissociated in transmodal cortices. PLoS Biol 2019; 17:e3000284. [PMID: 31107870 PMCID: PMC6544318 DOI: 10.1371/journal.pbio.3000284] [Citation(s) in RCA: 210] [Impact Index Per Article: 42.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 05/31/2019] [Accepted: 05/08/2019] [Indexed: 01/10/2023] Open
Abstract
While the role of cortical microstructure in organising neural function is well established, it remains unclear how structural constraints can give rise to more flexible elements of cognition. While nonhuman primate research has demonstrated a close structure-function correspondence, the relationship between microstructure and function remains poorly understood in humans, in part because of the reliance on post mortem analyses, which cannot be directly related to functional data. To overcome this barrier, we developed a novel approach to model the similarity of microstructural profiles sampled in the direction of cortical columns. Our approach was initially formulated based on an ultra-high-resolution 3D histological reconstruction of an entire human brain and then translated to myelin-sensitive magnetic resonance imaging (MRI) data in a large cohort of healthy adults. This novel method identified a system-level gradient of microstructural differentiation traversing from primary sensory to limbic regions that followed shifts in laminar differentiation and cytoarchitectural complexity. Importantly, while microstructural and functional gradients described a similar hierarchy, they became increasingly dissociated in transmodal default mode and fronto-parietal networks. Meta-analytic decoding of these topographic dissociations highlighted involvement in higher-level aspects of cognition, such as cognitive control and social cognition. Our findings demonstrate a relative decoupling of macroscale functional from microstructural gradients in transmodal regions, which likely contributes to the flexible role these regions play in human cognition.
Collapse
|
39
|
Extracting collective motions underlying nucleosome dynamics via nonlinear manifold learning. J Chem Phys 2019; 150:054902. [PMID: 30736679 DOI: 10.1063/1.5063851] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
The identification of effective collective variables remains a challenge in molecular simulations of complex systems. Here, we use a nonlinear manifold learning technique known as the diffusion map to extract key dynamical motions from a complex biomolecular system known as the nucleosome: a DNA-protein complex consisting of a DNA segment wrapped around a disc-shaped group of eight histone proteins. We show that without any a priori information, diffusion maps can identify and extract meaningful collective variables that characterize the motion of the nucleosome complex. We find excellent agreement between the collective variables identified by the diffusion map and those obtained manually using a free energy-based analysis. Notably, diffusion maps are shown to also identify subtle features of nucleosome dynamics that did not appear in those manually specified collective variables. For example, diffusion maps identify the importance of looped conformations in which DNA bulges away from the histone complex that are important for the motion of DNA around the nucleosome. This work demonstrates that diffusion maps can be a promising tool for analyzing very large molecular systems and for identifying their characteristic slow modes.
Collapse
|
40
|
LittleBrain: A gradient-based tool for the topographical interpretation of cerebellar neuroimaging findings. PLoS One 2019; 14:e0210028. [PMID: 30650101 PMCID: PMC6334893 DOI: 10.1371/journal.pone.0210028] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 12/14/2018] [Indexed: 11/19/2022] Open
Abstract
Gradient-based approaches to brain function have recently unmasked fundamental properties of brain organization. Diffusion map embedding analysis of resting-state fMRI data revealed a primary-to-transmodal axis of cerebral cortical macroscale functional organization. The same method was recently used to analyze resting-state data within the cerebellum, revealing for the first time a sensorimotor-fugal macroscale organization principle of cerebellar function. Cerebellar gradient 1 extended from motor to non-motor task-unfocused (default-mode network) areas, and cerebellar gradient 2 isolated task-focused processing regions. Here we present a freely available and easily accessible tool that applies this new knowledge to the topographical interpretation of cerebellar neuroimaging findings. LittleBrain illustrates the relationship between cerebellar data (e.g., volumetric patient study clusters, task activation maps, etc.) and cerebellar gradients 1 and 2. Specifically, LittleBrain plots all voxels of the cerebellum in a two-dimensional scatterplot, with each axis corresponding to one of the two principal functional gradients of the cerebellum, and indicates the position of cerebellar neuroimaging data within these two dimensions. This novel method of data mapping provides alternative, gradual visualizations that complement discrete parcellation maps of cerebellar functional neuroanatomy. We present application examples to show that LittleBrain can also capture subtle, progressive aspects of cerebellar functional neuroanatomy that would be difficult to visualize using conventional mapping techniques. Download and use instructions can be found at https://xaviergp.github.io/littlebrain.
Collapse
|
41
|
|
42
|
i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics 2019; 35:2796-2800. [DOI: 10.1093/bioinformatics/btz015] [Citation(s) in RCA: 156] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2018] [Revised: 12/12/2018] [Accepted: 01/05/2019] [Indexed: 01/10/2023] Open
Abstract
Abstract
Motivation
DNA N6-methyladenine (6mA) is associated with a wide range of biological processes. Since the distribution of 6mA site in the genome is non-random, accurate identification of 6mA sites is crucial for understanding its biological functions. Although experimental methods have been proposed for this regard, they are still cost-ineffective for detecting 6mA site in genome-wide scope. Therefore, it is desirable to develop computational methods to facilitate the identification of 6mA site.
Results
In this study, a computational method called i6mA-Pred was developed to identify 6mA sites in the rice genome, in which the optimal nucleotide chemical properties obtained by the using feature selection technique were used to encode the DNA sequences. It was observed that the i6mA-Pred yielded an accuracy of 83.13% in the jackknife test. Meanwhile, the performance of i6mA-Pred was also superior to other methods.
Availability and implementation
A user-friendly web-server, i6mA-Pred is freely accessible at http://lin-group.cn/server/i6mA-Pred.
Collapse
|
43
|
Abstract
This chapter discusses the way in which dimensionality reduction algorithms such as diffusion maps and sketch-map can be used to analyze molecular dynamics trajectories. The first part discusses how these various algorithms function as well as practical issues such as landmark selection and how these algorithms can be used when the data to be analyzed comes from enhanced sampling trajectories. In the later part a comparison between the results obtained by applying various algorithms to two sets of sample data is performed and discussed. This section is then followed by a summary of how one algorithm in particular, sketch-map, has been applied to a range of problems. The chapter concludes with a discussion on the directions that we believe this field is currently moving.
Collapse
|
44
|
An Emergent Space for Distributed Data with Hidden Internal Order through Manifold Learning. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2018; 6:77402-77413. [PMID: 31179198 PMCID: PMC6553659 DOI: 10.1109/access.2018.2882777] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Manifold-learning techniques are routinely used in mining complex spatiotemporal data to extract useful, parsimonious data representations/parametrizations; these are, in turn, useful in nonlinear model identification tasks. We focus here on the case of time series data that can ultimately be modelled as a spatially distributed system (e.g. a partial differential equation, PDE), but where we do not know the space in which this PDE should be formulated. Hence, even the spatial coordinates for the distributed system themselves need to be identified - to "emerge from"-the data mining process. We will first validate this "emergent space" reconstruction for time series sampled without space labels in known PDEs; this brings up the issue of observability of physical space from temporal observation data, and the transition from spatially resolved to lumped (order-parameter-based) representations by tuning the scale of the data mining kernels. We will then present actual emergent space "discovery" illustrations. Our illustrative examples include chimera states (states of coexisting coherent and incoherent dynamics), and chaotic as well as quasiperiodic spatiotemporal dynamics, arising in partial differential equations and/or in heterogeneous networks. We also discuss how data-driven "spatial" coordinates can be extracted in ways invariant to the nature of the measuring instrument. Such gauge-invariant data mining can go beyond the fusion of heterogeneous observations of the same system, to the possible matching of apparently different systems. For an older version of this article, including other examples, see https://arxiv.org/abs/1708.05406.
Collapse
|
45
|
Single-Cell Transcriptomic Analysis of Cardiac Differentiation from Human PSCs Reveals HOPX-Dependent Cardiomyocyte Maturation. Cell Stem Cell 2018; 23:586-598.e8. [PMID: 30290179 PMCID: PMC6220122 DOI: 10.1016/j.stem.2018.09.009] [Citation(s) in RCA: 160] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2017] [Revised: 05/30/2018] [Accepted: 09/13/2018] [Indexed: 11/25/2022]
Abstract
Cardiac differentiation of human pluripotent stem cells (hPSCs) requires orchestration of dynamic gene regulatory networks during stepwise fate transitions but often generates immature cell types that do not fully recapitulate properties of their adult counterparts, suggesting incomplete activation of key transcriptional networks. We performed extensive single-cell transcriptomic analyses to map fate choices and gene expression programs during cardiac differentiation of hPSCs and identified strategies to improve in vitro cardiomyocyte differentiation. Utilizing genetic gain- and loss-of-function approaches, we found that hypertrophic signaling is not effectively activated during monolayer-based cardiac differentiation, thereby preventing expression of HOPX and its activation of downstream genes that govern late stages of cardiomyocyte maturation. This study therefore provides a key transcriptional roadmap of in vitro cardiac differentiation at single-cell resolution, revealing fundamental mechanisms underlying heart development and differentiation of hPSC-derived cardiomyocytes.
Collapse
|
46
|
Abstract
A central principle for understanding the cerebral cortex is that macroscale anatomy reflects a functional hierarchy from primary to transmodal processing. In contrast, the central axis of motor and nonmotor macroscale organization in the cerebellum remains unknown. Here we applied diffusion map embedding to resting-state data from the Human Connectome Project dataset (n = 1003), and show for the first time that cerebellar functional regions follow a gradual organization which progresses from primary (motor) to transmodal (DMN, task-unfocused) regions. A secondary axis extends from task-unfocused to task-focused processing. Further, these two principal gradients revealed novel functional properties of the well-established cerebellar double motor representation (lobules I-VI and VIII), and its relationship with the recently described triple nonmotor representation (lobules VI/Crus I, Crus II/VIIB, IX/X). Functional differences exist not only between the two motor but also between the three nonmotor representations, and second motor representation might share functional similarities with third nonmotor representation.
Collapse
|
47
|
Abstract
The airways of the lung are the primary sites of disease in asthma and cystic fibrosis. Here we study the cellular composition and hierarchy of the mouse tracheal epithelium by single-cell RNA-sequencing (scRNA-seq) and in vivo lineage tracing. We identify a rare cell type, the Foxi1+ pulmonary ionocyte; functional variations in club cells based on their location; a distinct cell type in high turnover squamous epithelial structures that we term 'hillocks'; and disease-relevant subsets of tuft and goblet cells. We developed 'pulse-seq', combining scRNA-seq and lineage tracing, to show that tuft, neuroendocrine and ionocyte cells are continually and directly replenished by basal progenitor cells. Ionocytes are the major source of transcripts of the cystic fibrosis transmembrane conductance regulator in both mouse (Cftr) and human (CFTR). Knockout of Foxi1 in mouse ionocytes causes loss of Cftr expression and disrupts airway fluid and mucus physiology, phenotypes that are characteristic of cystic fibrosis. By associating cell-type-specific expression programs with key disease genes, we establish a new cellular narrative for airways disease.
Collapse
|
48
|
|
49
|
Mapping the human DC lineage through the integration of high-dimensional techniques. Science 2017; 356:science.aag3009. [PMID: 28473638 DOI: 10.1126/science.aag3009] [Citation(s) in RCA: 373] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 04/25/2017] [Indexed: 12/16/2022]
Abstract
Dendritic cells (DC) are professional antigen-presenting cells that orchestrate immune responses. The human DC population comprises two main functionally specialized lineages, whose origins and differentiation pathways remain incompletely defined. Here, we combine two high-dimensional technologies-single-cell messenger RNA sequencing (scmRNAseq) and cytometry by time-of-flight (CyTOF)-to identify human blood CD123+CD33+CD45RA+ DC precursors (pre-DC). Pre-DC share surface markers with plasmacytoid DC (pDC) but have distinct functional properties that were previously attributed to pDC. Tracing the differentiation of DC from the bone marrow to the peripheral blood revealed that the pre-DC compartment contains distinct lineage-committed subpopulations, including one early uncommitted CD123high pre-DC subset and two CD45RA+CD123low lineage-committed subsets exhibiting functional differences. The discovery of multiple committed pre-DC populations opens promising new avenues for the therapeutic exploitation of DC subset-specific targeting.
Collapse
|
50
|
Towards a Holistic Cortical Thickness Descriptor: Heat Kernel-Based Grey Matter Morphology Signatures. Neuroimage 2017; 147:360-380. [PMID: 28033566 PMCID: PMC5303630 DOI: 10.1016/j.neuroimage.2016.12.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 12/05/2016] [Accepted: 12/07/2016] [Indexed: 11/19/2022] Open
Abstract
In this paper, we propose a heat kernel based regional shape descriptor that may be capable of better exploiting volumetric morphological information than other available methods, thereby improving statistical power on brain magnetic resonance imaging (MRI) analysis. The mechanism of our analysis is driven by the graph spectrum and the heat kernel theory, to capture the volumetric geometry information in the constructed tetrahedral meshes. In order to capture profound brain grey matter shape changes, we first use the volumetric Laplace-Beltrami operator to determine the point pair correspondence between white-grey matter and CSF-grey matter boundary surfaces by computing the streamlines in a tetrahedral mesh. Secondly, we propose multi-scale grey matter morphology signatures to describe the transition probability by random walk between the point pairs, which reflects the inherent geometric characteristics. Thirdly, a point distribution model is applied to reduce the dimensionality of the grey matter morphology signatures and generate the internal structure features. With the sparse linear discriminant analysis, we select a concise morphology feature set with improved classification accuracies. In our experiments, the proposed work outperformed the cortical thickness features computed by FreeSurfer software in the classification of Alzheimer's disease and its prodromal stage, i.e., mild cognitive impairment, on publicly available data from the Alzheimer's Disease Neuroimaging Initiative. The multi-scale and physics based volumetric structure feature may bring stronger statistical power than some traditional methods for MRI-based grey matter morphology analysis.
Collapse
|