26
|
Wen J, Sun H, Fei L, Li J, Zhang Z, Zhang B. Consensus guided incomplete multi-view spectral clustering. Neural Netw 2020; 133:207-219. [PMID: 33227665 DOI: 10.1016/j.neunet.2020.10.014] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2020] [Revised: 10/25/2020] [Accepted: 10/29/2020] [Indexed: 10/23/2022]
Abstract
Incomplete multi-view clustering which aims to solve the difficult clustering challenge on incomplete multi-view data collected from diverse domains with missing views has drawn considerable attention in recent years. In this paper, we propose a novel method, called consensus guided incomplete multi-view spectral clustering (CGIMVSC), to address the incomplete clustering problem. Specifically, CGIMVSC seeks to explore the local information within every single-view and the semantic consistent information shared by all views in a unified framework simultaneously, where the local structure is adaptively obtained from the incomplete data rather than pre-constructed via a k-nearest neighbor approach in the existing methods. Considering the semantic consistency of multiple views, CGIMVSC introduces a co-regularization constraint to minimize the disagreement between the common representation and the individual representations with respect to different views, such that all views will obtain a consensus clustering result. Experimental comparisons with some state-of-the-art methods on seven datasets validate the effectiveness of the proposed method on incomplete multi-view clustering.
Collapse
|
27
|
He Q, Laurence DW, Lee CH, Chen JS. Manifold learning based data-driven modeling for soft biological tissues. J Biomech 2020; 117:110124. [PMID: 33515902 DOI: 10.1016/j.jbiomech.2020.110124] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 07/16/2020] [Accepted: 11/03/2020] [Indexed: 02/08/2023]
Abstract
Data-driven modeling directly utilizes experimental data with machine learning techniques to predict a material's response without the necessity of using phenomenological constitutive models. Although data-driven modeling presents a promising new approach, it has yet to be extended to the modeling of large-deformation biological tissues. Herein, we extend our recent local convexity data-driven (LCDD) framework (He and Chen, 2020) to model the mechanical response of a porcine heart mitral valve posterior leaflet. The predictability of the LCDD framework by using various combinations of biaxial and pure shear training protocols are investigated, and its effectiveness is compared with a full structural, phenomenological model modified from Zhang et al. (2016) and a continuum phenomenological Fung-type model (Tong and Fung, 1976). We show that the predictivity of the proposed LCDD nonlinear solver is generally less sensitive to the type of loading protocols (biaxial and pure shear) used in the data set, while more sensitive to the insufficient coverage of the experimental data when compared to the predictivity of the two selected phenomenological models. While no pre-defined functional form in the material model is necessary in LCDD, this study reinstates the importance of having sufficiently rich data coverage in the date-driven and machine learning type of approaches. It is also shown that the proposed LCDD method is an enhancement over the earlier distance-minimization data-driven (DMDD) against noisy data. This study demonstrates that when sufficient data is available, data-driven computing can be an alternative method for modeling complex biological materials.
Collapse
|
28
|
Manifold learning for amyotrophic lateral sclerosis functional loss assessment : Development and validation of a prognosis model. J Neurol 2020; 268:825-850. [PMID: 32886252 DOI: 10.1007/s00415-020-10181-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 08/05/2020] [Accepted: 08/06/2020] [Indexed: 12/11/2022]
Abstract
Amyotrophic lateral sclerosis (ALS) is an inexorably progressive neurodegenerative condition with no effective disease-modifying therapy at present. Given the striking clinical heterogeneity of the condition, the development and validation of reliable prognostic models is a recognised research priority. We present a prognostic model for functional decline in ALS where outcome uncertainty is taken into account. Patient data were reduced and projected onto a 2D space using Uniform Manifold Approximation and Projection (UMAP), a novel non-linear dimension reduction technique. Information from 3756 patients was included. Development data were sourced from past clinical trials. Real-world population data were used as validation data. Predictors included age, gender, region of onset, symptom duration, weight at baseline, functional impairment, and estimated rate of functional loss. UMAP projection of patients showed an informative 2D data distribution. As limited data availability precluded complex model designs, the projection was divided into three zones defined by a functional impairment range probability. Zone membership allowed individual patient prediction. Patients belonging to the first zone had a probability of [Formula: see text] (± [Formula: see text]) to have an ALSFRS score over 20 at 1-year follow-up. Patients within the second zone had a probability of [Formula: see text] (± [Formula: see text]) to have an ALSFRS score between 10 and 30 at 1 year follow-up. Finally, patients within the third zone had a probability of [Formula: see text] (± [Formula: see text]) to have an ALSFRS score lower than 20 at 1 year follow-up. This approach requires a limited set of features, is easily updated, improves with additional patient data, and accounts for results uncertainty. This method could therefore be used in a clinical setting for patient stratification and outcome projection.
Collapse
|
29
|
Verma A, Engelhardt BE. A robust nonlinear low-dimensional manifold for single cell RNA-seq data. BMC Bioinformatics 2020; 21:324. [PMID: 32693778 PMCID: PMC7374962 DOI: 10.1186/s12859-020-03625-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2019] [Accepted: 06/22/2020] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Modern developments in single-cell sequencing technologies enable broad insights into cellular state. Single-cell RNA sequencing (scRNA-seq) can be used to explore cell types, states, and developmental trajectories to broaden our understanding of cellular heterogeneity in tissues and organs. Analysis of these sparse, high-dimensional experimental results requires dimension reduction. Several methods have been developed to estimate low-dimensional embeddings for filtered and normalized single-cell data. However, methods have yet to be developed for unfiltered and unnormalized count data that estimate uncertainty in the low-dimensional space. We present a nonlinear latent variable model with robust, heavy-tailed error and adaptive kernel learning to estimate low-dimensional nonlinear structure in scRNA-seq data. RESULTS Gene expression in a single cell is modeled as a noisy draw from a Gaussian process in high dimensions from low-dimensional latent positions. This model is called the Gaussian process latent variable model (GPLVM). We model residual errors with a heavy-tailed Student's t-distribution to estimate a manifold that is robust to technical and biological noise found in normalized scRNA-seq data. We compare our approach to common dimension reduction tools across a diverse set of scRNA-seq data sets to highlight our model's ability to enable important downstream tasks such as clustering, inferring cell developmental trajectories, and visualizing high throughput experiments on available experimental data. CONCLUSION We show that our adaptive robust statistical approach to estimate a nonlinear manifold is well suited for raw, unfiltered gene counts from high-throughput sequencing technologies for visualization, exploration, and uncertainty estimation of cell states.
Collapse
|
30
|
Watson JR, Gelbaum Z, Titus M, Zoch G, Wrathall D. Identifying multiscale spatio-temporal patterns in human mobility using manifold learning. PeerJ Comput Sci 2020; 6:e276. [PMID: 33816927 PMCID: PMC7924485 DOI: 10.7717/peerj-cs.276] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 04/22/2020] [Indexed: 06/12/2023]
Abstract
When, where and how people move is a fundamental part of how human societies organize around every-day needs as well as how people adapt to risks, such as economic scarcity or instability, and natural disasters. Our ability to characterize and predict the diversity of human mobility patterns has been greatly expanded by the availability of Call Detail Records (CDR) from mobile phone cellular networks. The size and richness of these datasets is at the same time a blessing and a curse: while there is great opportunity to extract useful information from these datasets, it remains a challenge to do so in a meaningful way. In particular, human mobility is multiscale, meaning a diversity of patterns of mobility occur simultaneously, which vary according to timing, magnitude and spatial extent. To identify and characterize the main spatio-temporal scales and patterns of human mobility we examined CDR data from the Orange mobile network in Senegal using a new form of spectral graph wavelets, an approach from manifold learning. This unsupervised analysis reduces the dimensionality of the data to reveal seasonal changes in human mobility, as well as mobility patterns associated with large-scale but short-term religious events. The novel insight into human mobility patterns afforded by manifold learning methods like spectral graph wavelets have clear applications for urban planning, infrastructure design as well as hazard risk management, especially as climate change alters the biophysical landscape on which people work and live, leading to new patterns of human migration around the world.
Collapse
|
31
|
Suetani H, Kitajo K. A manifold learning approach to mapping individuality of human brain oscillations through beta-divergence. Neurosci Res 2020; 156:188-196. [PMID: 32084448 DOI: 10.1016/j.neures.2020.02.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Revised: 12/25/2019] [Accepted: 01/25/2020] [Indexed: 11/18/2022]
Abstract
This paper proposes an approach for visualizing individuality and inter-individual variations of human brain oscillations measured as multichannel electroencephalographic (EEG) signals in a low-dimensional space based on manifold learning. Using a unified divergence measure between spectral densities termed the "beta-divergence", we introduce an appropriate dissimilarity measure between multichannel EEG signals. Then, t-distributed stochastic neighbor embedding (t-SNE; a state-of-the-art algorithm for manifold learning) together with the beta-divergence based distance was applied to resting state EEG signals recorded from 100 healthy subjects. We were able to obtain a fine low-dimensional visualization that enabled each subject to be identified as an isolated point cloud and that represented inter-individual variations as the relationships between such point clouds. Furthermore, we also discuss how the performance of the low-dimensional visualization depends on the beta-divergence parameter and the t-SNE hyper parameter. Finally, borrowing from the concept of locally linear embedding (LLE), we propose a method for projecting the test sample to the t-SNE space obtained from the training samples and investigate that availability.
Collapse
|
32
|
Nguyen ND, Blaby IK, Wang D. ManiNetCluster: a novel manifold learning approach to reveal the functional links between gene networks. BMC Genomics 2019; 20:1003. [PMID: 31888454 PMCID: PMC6936142 DOI: 10.1186/s12864-019-6329-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND The coordination of genomic functions is a critical and complex process across biological systems such as phenotypes or states (e.g., time, disease, organism, environmental perturbation). Understanding how the complexity of genomic function relates to these states remains a challenge. To address this, we have developed a novel computational method, ManiNetCluster, which simultaneously aligns and clusters gene networks (e.g., co-expression) to systematically reveal the links of genomic function between different conditions. Specifically, ManiNetCluster employs manifold learning to uncover and match local and non-linear structures among networks, and identifies cross-network functional links. RESULTS We demonstrated that ManiNetCluster better aligns the orthologous genes from their developmental expression profiles across model organisms than state-of-the-art methods (p-value <2.2×10-16). This indicates the potential non-linear interactions of evolutionarily conserved genes across species in development. Furthermore, we applied ManiNetCluster to time series transcriptome data measured in the green alga Chlamydomonas reinhardtii to discover the genomic functions linking various metabolic processes between the light and dark periods of a diurnally cycling culture. We identified a number of genes putatively regulating processes across each lighting regime. CONCLUSIONS ManiNetCluster provides a novel computational tool to uncover the genes linking various functions from different networks, providing new insight on how gene functions coordinate across different conditions. ManiNetCluster is publicly available as an R package at https://github.com/daifengwanglab/ManiNetCluster.
Collapse
|
33
|
Seok HS. Performance comparison of dimensionality reduction methods on RNA-Seq data from the GTEx project. Genes Genomics 2019; 42:225-234. [PMID: 31833048 DOI: 10.1007/s13258-019-00896-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 11/22/2019] [Indexed: 11/25/2022]
Abstract
BACKGROUND One of the apparent characteristics of bioinformatics data is the combination of very large number of features and relatively small number of samples. The vast number of features makes intuitive understanding of a target domain difficult. Dimensionality reduction or manifold learning has potential to circumvent this obstacle, but restricted methods have been preferred. OBJECTIVE The objective of this study is to observe the characteristics of various dimensionality reduction methods-locally linear embedding (LLE), multi-dimensional scaling (MDS), principal component analysis (PCA), spectral embedding (SE), and t-distributed Stochastic Neighbor Embedding (t-SNE)-on the RNA-Seq dataset from the genotype-tissue expression (GTEx) project. RESULTS The characteristics of the dimensionality reduction methods are observed on the nine groups of three different tissues in the reduced space with dimensionality of two, three, and four. The visualization results report that each dimensionality reduction method produces a very distinct reduced space. The quantitative results are obtained as the performance of k-means clustering. Clustering in the reduced space from non-linear methods such as LLE, t-SNE and SE achieved better results than in the reduced space produced by linear methods like PCA and MDS. CONCLUSIONS The experimental results recommend the application of both linear and non-linear dimensionality reduction methods on the target data for grasping the underlying characteristics of the datasets intuitively.
Collapse
|
34
|
A manifold learning regularization approach to enhance 3D CT image-based lung nodule classification. Int J Comput Assist Radiol Surg 2019; 15:287-295. [PMID: 31768885 DOI: 10.1007/s11548-019-02097-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2019] [Accepted: 11/16/2019] [Indexed: 02/07/2023]
Abstract
PURPOSE Diagnosis of lung cancer requires radiologists to review every lung nodule in CT images. Such a process can be very time-consuming, and the accuracy is affected by many factors, such as experience of radiologists and available diagnosis time. To address this problem, we proposed to develop a deep learning-based system to automatically classify benign and malignant lung nodules. METHODS The proposed method automatically determines benignity or malignancy given the 3D CT image patch of a lung nodule to assist diagnosis process. Motivated by the fact that real structure among data is often embedded on a low-dimensional manifold, we developed a novel manifold regularized classification deep neural network (MRC-DNN) to perform classification directly based on the manifold representation of lung nodule images. The concise manifold representation revealing important data structure is expected to benefit the classification, while the manifold regularization enforces strong, but natural constraints on network training, preventing over-fitting. RESULTS The proposed method achieves accurate manifold learning with reconstruction error of ~ 30 HU on real lung nodule CT image data. In addition, the classification accuracy on testing data is 0.90 with sensitivity of 0.81 and specificity of 0.95, which outperforms state-of-the-art deep learning methods. CONCLUSION The proposed MRC-DNN facilitates an accurate manifold learning approach for lung nodule classification based on 3D CT images. More importantly, MRC-DNN suggests a new and effective idea of enforcing regularization for network training, possessing the potential impact to a board range of applications.
Collapse
|
35
|
Kinalis S, Nielsen FC, Winther O, Bagger FO. Deconvolution of autoencoders to learn biological regulatory modules from single cell mRNA sequencing data. BMC Bioinformatics 2019; 20:379. [PMID: 31286861 PMCID: PMC6615267 DOI: 10.1186/s12859-019-2952-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Accepted: 06/13/2019] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Unsupervised machine learning methods (deep learning) have shown their usefulness with noisy single cell mRNA-sequencing data (scRNA-seq), where the models generalize well, despite the zero-inflation of the data. A class of neural networks, namely autoencoders, has been useful for denoising of single cell data, imputation of missing values and dimensionality reduction. RESULTS Here, we present a striking feature with the potential to greatly increase the usability of autoencoders: With specialized training, the autoencoder is not only able to generalize over the data, but also to tease apart biologically meaningful modules, which we found encoded in the representation layer of the network. Our model can, from scRNA-seq data, delineate biological meaningful modules that govern a dataset, as well as give information as to which modules are active in each single cell. Importantly, most of these modules can be explained by known biological functions, as provided by the Hallmark gene sets. CONCLUSIONS We discover that tailored training of an autoencoder makes it possible to deconvolute biological modules inherent in the data, without any assumptions. By comparisons with gene signatures of canonical pathways we see that the modules are directly interpretable. The scope of this discovery has important implications, as it makes it possible to outline the drivers behind a given effect of a cell. In comparison with other dimensionality reduction methods, or supervised models for classification, our approach has the benefit of both handling well the zero-inflated nature of scRNA-seq, and validating that the model captures relevant information, by establishing a link between input and decoded data. In perspective, our model in combination with clustering methods is able to provide information about which subtype a given single cell belongs to, as well as which biological functions determine that membership.
Collapse
|
36
|
Cui Z, Gao YL, Liu JX, Dai LY, Yuan SS. L 2,1-GRMF: an improved graph regularized matrix factorization method to predict drug-target interactions. BMC Bioinformatics 2019; 20:287. [PMID: 31182006 PMCID: PMC6557743 DOI: 10.1186/s12859-019-2768-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Background Predicting drug-target interactions is time-consuming and expensive. It is important to present the accuracy of the calculation method. There are many algorithms to predict global interactions, some of which use drug-target networks for prediction (ie, a bipartite graph of bound drug pairs and targets known to interact). Although these algorithms can predict some drug-target interactions to some extent, there is little effect for some new drugs or targets that have no known interaction. Results Since the datasets are usually located at or near low-dimensional nonlinear manifolds, we propose an improved GRMF (graph regularized matrix factorization) method to learn these flow patterns in combination with the previous matrix-decomposition method. In addition, we use one of the pre-processing steps previously proposed to improve the accuracy of the prediction. Conclusions Cross-validation is used to evaluate our method, and simulation experiments are used to predict new interactions. In most cases, our method is superior to other methods. Finally, some examples of new drugs and new targets are predicted by performing simulation experiments. And the improved GRMF method can better predict the remaining drug-target interactions.
Collapse
|
37
|
Gadd C, Xing W, Nezhad MM, Shah AA. A Surrogate Modelling Approach Based on Nonlinear Dimension Reduction for Uncertainty Quantification in Groundwater Flow Models. Transp Porous Media 2019; 126:39-77. [PMID: 30872876 PMCID: PMC6390720 DOI: 10.1007/s11242-018-1065-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2017] [Accepted: 04/13/2018] [Indexed: 11/15/2022]
Abstract
In this paper, we develop a surrogate modelling approach for capturing the output field (e.g. the pressure head) from groundwater flow models involving a stochastic input field (e.g. the hydraulic conductivity). We use a Karhunen–Loève expansion for a log-normally distributed input field and apply manifold learning (local tangent space alignment) to perform Gaussian process Bayesian inference using Hamiltonian Monte Carlo in an abstract feature space, yielding outputs for arbitrary unseen inputs. We also develop a framework for forward uncertainty quantification in such problems, including analytical approximations of the mean of the marginalized distribution (with respect to the inputs). To sample from the distribution, we present Monte Carlo approach. Two examples are presented to demonstrate the accuracy of our approach: a Darcy flow model with contaminant transport in 2-d and a Richards equation model in 3-d.
Collapse
|
38
|
Li B, Fan ZT, Zhang XL, Huang DS. Robust dimensionality reduction via feature space to feature space distance metric learning. Neural Netw 2019; 112:1-14. [PMID: 30716617 DOI: 10.1016/j.neunet.2019.01.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 12/26/2018] [Accepted: 01/07/2019] [Indexed: 11/29/2022]
Abstract
Images are often represented as vectors with high dimensions when involved in classification. As a result, dimensionality reduction methods have to be developed to avoid the curse of dimensionality. Among them, Laplacian eigenmaps (LE) have attracted widespread concentrations. In the original LE, point to point (P2P) distance metric is often adopted for manifold learning. Unfortunately, they show few impacts on robustness to noises. In this paper, a novel supervised dimensionality reduction method, named feature space to feature space distance metric learning (FSDML), is presented. For any point, it can construct a feature space spanned by its k intra-class nearest neighbors, which results in a local projection on its nearest feature space. Thus feature space to feature space (S2S) distance metric will be defined to Euclidean distance between two corresponding projections. On one hand, the proposed S2S distance metric displays superiority on robustness by the local projection. On the other hand, the projection on the nearest feature space contributes to fully mining local geometry information hidden in the original data. Moreover, both class label similarity and dissimilarity are also measured, based on which an intra-class graph and an inter-class graph will be individually modeled. Finally, a subspace can be found for classification by maximizing S2S based manifold to manifold distance and preserving S2S based locality of manifolds, simultaneously. Compared to some state-of-art dimensionality reduction methods, experiments validate the proposed method's performance either on synthesized data sets or on benchmark data sets.
Collapse
|
39
|
Ouyang J, Liang Z, Chen C, Fu Z, Zhang Y, Liu H. Cryo-electron microscope image denoising based on the geodesic distance. BMC STRUCTURAL BIOLOGY 2018; 18:18. [PMID: 30554569 PMCID: PMC6296045 DOI: 10.1186/s12900-018-0094-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 10/30/2018] [Indexed: 11/18/2022]
Abstract
Background To perform a three-dimensional (3-D) reconstruction of electron cryomicroscopy (cryo-EM) images of viruses, it is necessary to determine the similarity of image blocks of the two-dimensional (2-D) projections of the virus. The projections containing high resolution information are typically very noisy. Instead of the traditional Euler metric, this paper proposes a new method, based on the geodesic metric, to measure the similarity of blocks. Results Our method is a 2-D image denoising approach. A data set of 2243 cytoplasmic polyhedrosis virus (CPV) capsid particle images in different orientations was used to test the proposed method. Relative to Block-matching and three-dimensional filtering (BM3D), Stein’s unbiased risk estimator (SURE), Bayes shrink and K-means singular value decomposition (K-SVD), the experimental results show that the proposed method can achieve a peak signal-to-noise ratio (PSNR) of 45.65. The method can remove the noise from the cryo-EM image and improve the accuracy of particle picking. Conclusions The main contribution of the proposed model is to apply the geodesic distance to measure the similarity of image blocks. We conclude that manifold learning methods can effectively eliminate the noise of the cryo-EM image and improve the accuracy of particle picking.
Collapse
|
40
|
Thought Chart: tracking the thought with manifold learning during emotion regulation. Brain Inform 2018; 5:7. [PMID: 30022317 PMCID: PMC6170936 DOI: 10.1186/s40708-018-0085-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Accepted: 07/12/2018] [Indexed: 11/21/2022] Open
Abstract
The Nash embedding theorem demonstrates that any compact manifold can be isometrically embedded in a Euclidean space. Assuming the complex brain states form a high-dimensional manifold in a topological space, we propose a manifold learning framework, termed Thought Chart, to reconstruct and visualize the manifold in a low-dimensional space. Furthermore, it serves as a data-driven approach to discover the underlying dynamics when the brain is engaged in a series of emotion and cognitive regulation tasks. EEG-based temporal dynamic functional connectomes are created based on 20 psychiatrically healthy participants’ EEG recordings during resting state and an emotion regulation task. Graph dissimilarity space embedding was applied to all the dynamic EEG connectomes. In order to visualize the learned manifold in a lower dimensional space, local neighborhood information is reconstructed via k-nearest neighbor-based nonlinear dimensionality reduction (NDR) and epsilon distance-based NDR. We showed that two neighborhood constructing approaches of NDR embed the manifold in a two-dimensional space, which we named Thought Chart. In Thought Chart, different task conditions represent distinct trajectories. Properties such as the distribution or average length in the 2-D space may serve as useful parameters to explore the underlying cognitive load and emotion processing during the complex task. In sum, this framework is a novel data-driven approach to the learning and visualization of underlying neurophysiological dynamics of complex functional brain data.
Collapse
|
41
|
Bermudez C, Plassard AJ, Davis TL, Newton AT, Resnick SM, Landman BA. Learning Implicit Brain MRI Manifolds with Deep Learning. PROCEEDINGS OF SPIE--THE INTERNATIONAL SOCIETY FOR OPTICAL ENGINEERING 2018; 10574:105741L. [PMID: 29887659 PMCID: PMC5990281 DOI: 10.1117/12.2293515] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
An important task in image processing and neuroimaging is to extract quantitative information from the acquired images in order to make observations about the presence of disease or markers of development in populations. Having a low-dimensional manifold of an image allows for easier statistical comparisons between groups and the synthesis of group representatives. Previous studies have sought to identify the best mapping of brain MRI to a low-dimensional manifold, but have been limited by assumptions of explicit similarity measures. In this work, we use deep learning techniques to investigate implicit manifolds of normal brains and generate new, high-quality images. We explore implicit manifolds by addressing the problems of image synthesis and image denoising as important tools in manifold learning. First, we propose the unsupervised synthesis of T1-weighted brain MRI using a Generative Adversarial Network (GAN) by learning from 528 examples of 2D axial slices of brain MRI. Synthesized images were first shown to be unique by performing a cross-correlation with the training set. Real and synthesized images were then assessed in a blinded manner by two imaging experts providing an image quality score of 1-5. The quality score of the synthetic image showed substantial overlap with that of the real images. Moreover, we use an autoencoder with skip connections for image denoising, showing that the proposed method results in higher PSNR than FSL SUSAN after denoising. This work shows the power of artificial networks to synthesize realistic imaging data, which can be used to improve image processing techniques and provide a quantitative framework to structural changes in the brain.
Collapse
|
42
|
Ding P, Luo J, Liang C, Xiao Q, Cao B. Human disease MiRNA inference by combining target information based on heterogeneous manifolds. J Biomed Inform 2018; 80:26-36. [PMID: 29481877 DOI: 10.1016/j.jbi.2018.02.013] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 02/11/2018] [Accepted: 02/21/2018] [Indexed: 12/12/2022]
Abstract
The emergence of network medicine has provided great insight into the identification of disease-related molecules, which could help with the development of personalized medicine. However, the state-of-the-art methods could neither simultaneously consider target information and the known miRNA-disease associations nor effectively explore novel gene-disease associations as a by-product during the process of inferring disease-related miRNAs. Computational methods incorporating multiple sources of information offer more opportunities to infer disease-related molecules, including miRNAs and genes in heterogeneous networks at a system level. In this study, we developed a novel algorithm, named inference of Disease-related MiRNAs based on Heterogeneous Manifold (DMHM), to accurately and efficiently identify miRNA-disease associations by integrating multi-omics data. Graph-based regularization was utilized to obtain a smooth function on the data manifold, which constitutes the main principle of DMHM. The novelty of this framework lies in the relatedness between diseases and miRNAs, which are measured via heterogeneous manifolds on heterogeneous networks integrating target information. To demonstrate the effectiveness of DMHM, we conducted comprehensive experiments based on HMDD datasets and compared DMHM with six state-of-the-art methods. Experimental results indicated that DMHM significantly outperformed the other six methods under fivefold cross validation and de novo prediction tests. Case studies have further confirmed the practical usefulness of DMHM.
Collapse
|
43
|
Qian P, Xi C, Xu M, Jiang Y, Su KH, Wang S, Muzic RF. SSC-EKE: Semi-Supervised Classification with Extensive Knowledge Exploitation. Inf Sci (N Y) 2018; 422:51-76. [PMID: 29628529 PMCID: PMC5881956 DOI: 10.1016/j.ins.2017.08.093] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We introduce a new, semi-supervised classification method that extensively exploits knowledge. The method has three steps. First, the manifold regularization mechanism, adapted from the Laplacian support vector machine (LapSVM), is adopted to mine the manifold structure embedded in all training data, especially in numerous label-unknown data. Meanwhile, by converting the labels into pairwise constraints, the pairwise constraint regularization formula (PCRF) is designed to compensate for the few but valuable labelled data. Second, by further combining the PCRF with the manifold regularization, the precise manifold and pairwise constraint jointly regularized formula (MPCJRF) is achieved. Third, by incorporating the MPCJRF into the framework of the conventional SVM, our approach, referred to as semi-supervised classification with extensive knowledge exploitation (SSC-EKE), is developed. The significance of our research is fourfold: 1) The MPCJRF is an underlying adjustment, with respect to the pairwise constraints, to the graph Laplacian enlisted for approximating the potential data manifold. This type of adjustment plays the correction role, as an unbiased estimation of the data manifold is difficult to obtain, whereas the pairwise constraints, converted from the given labels, have an overall high confidence level. 2) By transforming the values of the two terms in the MPCJRF such that they have the same range, with a trade-off factor varying within the invariant interval [0, 1), the appropriate impact of the pairwise constraints to the graph Laplacian can be self-adaptively determined. 3) The implication regarding extensive knowledge exploitation is embodied in SSC-EKE. That is, the labelled examples are used not only to control the empirical risk but also to constitute the MPCJRF. Moreover, all data, both labelled and unlabelled, are recruited for the model smoothness and manifold regularization. 4) The complete framework of SSC-EKE organically incorporates multiple theories, such as joint manifold and pairwise constraint-based regularization, smoothness in the reproducing kernel Hilbert space, empirical risk minimization, and spectral methods, which facilitates the preferable classification accuracy as well as the generalizability of SSC-EKE.
Collapse
|
44
|
Zhang Z, Jia L, Zhang M, Li B, Zhang L, Li F. Discriminative clustering on manifold for adaptive transductive classification. Neural Netw 2017; 94:260-273. [PMID: 28822323 DOI: 10.1016/j.neunet.2017.07.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Revised: 07/18/2017] [Accepted: 07/21/2017] [Indexed: 11/30/2022]
Abstract
In this paper, we mainly propose a novel adaptive transductive label propagation approach by joint discriminative clustering on manifolds for representing and classifying high-dimensional data. Our framework seamlessly combines the unsupervised manifold learning, discriminative clustering and adaptive classification into a unified model. Also, our method incorporates the adaptive graph weight construction with label propagation. Specifically, our method is capable of propagating label information using adaptive weights over low-dimensional manifold features, which is different from most existing studies that usually predict the labels and construct the weights in the original Euclidean space. For transductive classification by our formulation, we first perform the joint discriminative K-means clustering and manifold learning to capture the low-dimensional nonlinear manifolds. Then, we construct the adaptive weights over the learnt manifold features, where the adaptive weights are calculated through performing the joint minimization of the reconstruction errors over features and soft labels so that the graph weights can be joint-optimal for data representation and classification. Using the adaptive weights, we can easily estimate the unknown labels of samples. After that, our method returns the updated weights for further updating the manifold features. Extensive simulations on image classification and segmentation show that our proposed algorithm can deliver the state-of-the-art performance on several public datasets.
Collapse
|
45
|
Zimmer VA, Glocker B, Hahner N, Eixarch E, Sanroma G, Gratacós E, Rueckert D, González Ballester MÁ, Piella G. Learning and combining image neighborhoods using random forests for neonatal brain disease classification. Med Image Anal 2017; 42:189-199. [PMID: 28818743 DOI: 10.1016/j.media.2017.08.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2016] [Revised: 08/01/2017] [Accepted: 08/08/2017] [Indexed: 12/25/2022]
Abstract
It is challenging to characterize and classify normal and abnormal brain development during early childhood. To reduce the complexity of heterogeneous data population, manifold learning techniques are increasingly applied, which find a low-dimensional representation of the data, while preserving all relevant information. The neighborhood definition used for constructing manifold representations of the population is crucial for preserving the similarity structure and it is highly application dependent. The recently proposed neighborhood approximation forests learn a neighborhood structure in a dataset based on a user-defined distance. We propose a framework to learn multiple pairwise distances in a population of brain images and to combine them in an unsupervised manner optimally in a manifold learning step. Unlike other methods that only use a univariate distance measure, our method allows for a natural combination of multiple distances from heterogeneous sources. As a result, it yields a representation of the population that preserves the multiple distances. Furthermore, our method also selects the most predictive features associated with the distances. We evaluate our method in neonatal magnetic resonance images of three groups (term controls, patients affected by intrauterine growth restriction and mild isolated ventriculomegaly). We show that combining multiple distances related to the condition improves the overall characterization and classification of the three clinical groups compared to the use of single distances and classical unsupervised manifold learning.
Collapse
|
46
|
Welch JD, Hartemink AJ, Prins JF. MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics. Genome Biol 2017; 18:138. [PMID: 28738873 PMCID: PMC5525279 DOI: 10.1186/s13059-017-1269-0] [Citation(s) in RCA: 93] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 07/05/2017] [Indexed: 12/30/2022] Open
Abstract
Single cell experimental techniques reveal transcriptomic and epigenetic heterogeneity among cells, but how these are related is unclear. We present MATCHER, an approach for integrating multiple types of single cell measurements. MATCHER uses manifold alignment to infer single cell multi-omic profiles from transcriptomic and epigenetic measurements performed on different cells of the same type. Using scM&T-seq and sc-GEM data, we confirm that MATCHER accurately predicts true single cell correlations between DNA methylation and gene expression without using known cell correspondences. MATCHER also reveals new insights into the dynamic interplay between the transcriptome and epigenome in single embryonic stem cells and induced pluripotent stem cells.
Collapse
|
47
|
Alanis-Lobato G, Mier P, Andrade-Navarro MA. Manifold learning and maximum likelihood estimation for hyperbolic network embedding. APPLIED NETWORK SCIENCE 2016; 1:10. [PMID: 30533502 PMCID: PMC6245200 DOI: 10.1007/s41109-016-0013-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 10/25/2016] [Indexed: 05/23/2023]
Abstract
The Popularity-Similarity (PS) model sustains that clustering and hierarchy, properties common to most networks representing complex systems, are the result of an optimisation process in which nodes seek to form ties, not only with the most connected (popular) system components, but also with those that are similar to them. This model has a geometric interpretation in hyperbolic space, where distances between nodes abstract popularity-similarity trade-offs and the formation of scale-free and strongly clustered networks can be accurately described. Current methods for mapping networks to hyperbolic space are based on maximum likelihood estimations or manifold learning. The former approach is very accurate but slow; the latter improves efficiency at the cost of accuracy. Here, we analyse the strengths and limitations of both strategies and assess the advantages of combining them to efficiently embed big networks, allowing for their examination from a geometric perspective. Our evaluations in artificial and real networks support the idea that hyperbolic distance constraints play a significant role in the formation of edges between nodes. This means that challenging problems in network science, like link prediction or community detection, could be more easily addressed under this geometric framework.
Collapse
|
48
|
Tong C, Shi X, Lan T. Statistical process monitoring based on orthogonal multi-manifold projections and a novel variable contribution analysis. ISA TRANSACTIONS 2016; 65:407-417. [PMID: 27435000 DOI: 10.1016/j.isatra.2016.06.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Revised: 06/02/2016] [Accepted: 06/30/2016] [Indexed: 06/06/2023]
Abstract
Multivariate statistical methods have been widely applied to develop data-based process monitoring models. Recently, a multi-manifold projections (MMP) algorithm was proposed for modeling and monitoring chemical industrial processes, the MMP is an effective tool for preserving the global and local geometric structure of the original data space in the reduced feature subspace, but it does not provide orthogonal basis functions for data reconstruction. Recognition of this issue, an improved version of MMP algorithm named orthogonal MMP (OMMP) is formulated. Based on the OMMP model, a further processing step and a different monitoring index are proposed to model and monitor the variation in the residual subspace. Additionally, a novel variable contribution analysis is presented for fault diagnosis by integrating the nearest in-control neighbor calculation and reconstruction-based contribution analysis. The validity and superiority of the proposed fault detection and diagnosis strategy are then validated through case studies on the Tennessee Eastman benchmark process.
Collapse
|
49
|
Xie L, Pluta JB, Das SR, Wisse LEM, Wang H, Mancuso L, Kliot D, Avants BB, Ding SL, Manjón JV, Wolk DA, Yushkevich PA. Multi-template analysis of human perirhinal cortex in brain MRI: Explicitly accounting for anatomical variability. Neuroimage 2016; 144:183-202. [PMID: 27702610 DOI: 10.1016/j.neuroimage.2016.09.070] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2016] [Revised: 09/28/2016] [Accepted: 09/30/2016] [Indexed: 01/05/2023] Open
Abstract
RATIONAL The human perirhinal cortex (PRC) plays critical roles in episodic and semantic memory and visual perception. The PRC consists of Brodmann areas 35 and 36 (BA35, BA36). In Alzheimer's disease (AD), BA35 is the first cortical site affected by neurofibrillary tangle pathology, which is closely linked to neural injury in AD. Large anatomical variability, manifested in the form of different cortical folding and branching patterns, makes it difficult to segment the PRC in MRI scans. Pathology studies have found that in ~97% of specimens, the PRC falls into one of three discrete anatomical variants. However, current methods for PRC segmentation and morphometry in MRI are based on single-template approaches, which may not be able to accurately model these discrete variants METHODS: A multi-template analysis pipeline that explicitly accounts for anatomical variability is used to automatically label the PRC and measure its thickness in T2-weighted MRI scans. The pipeline uses multi-atlas segmentation to automatically label medial temporal lobe cortices including entorhinal cortex, PRC and the parahippocampal cortex. Pairwise registration between label maps and clustering based on residual dissimilarity after registration are used to construct separate templates for the anatomical variants of the PRC. An optimal path of deformations linking these templates is used to establish correspondences between all the subjects. Experimental evaluation focuses on the ability of single-template and multi-template analyses to detect differences in the thickness of medial temporal lobe cortices between patients with amnestic mild cognitive impairment (aMCI, n=41) and age-matched controls (n=44). RESULTS The proposed technique is able to generate templates that recover the three dominant discrete variants of PRC and establish more meaningful correspondences between subjects than a single-template approach. The largest reduction in thickness associated with aMCI, in absolute terms, was found in left BA35 using both regional and summary thickness measures. Further, statistical maps of regional thickness difference between aMCI and controls revealed different patterns for the three anatomical variants.
Collapse
|
50
|
Baumgartner CF, Kolbitsch C, McClelland JR, Rueckert D, King AP. Autoadaptive motion modelling for MR-based respiratory motion estimation. Med Image Anal 2016; 35:83-100. [PMID: 27343436 DOI: 10.1016/j.media.2016.06.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 04/22/2016] [Accepted: 06/07/2016] [Indexed: 10/21/2022]
Abstract
Respiratory motion poses significant challenges in image-guided interventions. In emerging treatments such as MR-guided HIFU or MR-guided radiotherapy, it may cause significant misalignments between interventional road maps obtained pre-procedure and the anatomy during the treatment, and may affect intra-procedural imaging such as MR-thermometry. Patient specific respiratory motion models provide a solution to this problem. They establish a correspondence between the patient motion and simpler surrogate data which can be acquired easily during the treatment. Patient motion can then be estimated during the treatment by acquiring only the simpler surrogate data. In the majority of classical motion modelling approaches once the correspondence between the surrogate data and the patient motion is established it cannot be changed unless the model is recalibrated. However, breathing patterns are known to significantly change in the time frame of MR-guided interventions. Thus, the classical motion modelling approach may yield inaccurate motion estimations when the relation between the motion and the surrogate data changes over the duration of the treatment and frequent recalibration may not be feasible. We propose a novel methodology for motion modelling which has the ability to automatically adapt to new breathing patterns. This is achieved by choosing the surrogate data in such a way that it can be used to estimate the current motion in 3D as well as to update the motion model. In particular, in this work, we use 2D MR slices from different slice positions to build as well as to apply the motion model. We implemented such an autoadaptive motion model by extending our previous work on manifold alignment. We demonstrate a proof-of-principle of the proposed technique on cardiac gated data of the thorax and evaluate its adaptive behaviour on realistic synthetic data containing two breathing types generated from 6 volunteers, and real data from 4 volunteers. On synthetic data the autoadaptive motion model yielded 21.45% more accurate motion estimations compared to a non-adaptive motion model 10 min after a change in breathing pattern. On real data we demonstrated the method's ability to maintain motion estimation accuracy despite a drift in the respiratory baseline. Due to the cardiac gating of the imaging data, the method is currently limited to one update per heart beat and the calibration requires approximately 12 min of scanning. Furthermore, the method has a prediction latency of 800 ms. These limitations may be overcome in future work by altering the acquisition protocol.
Collapse
|