1
|
Zhu X, Li J, Li HD, Xie M, Wang J. Sc-GPE: A Graph Partitioning-Based Cluster Ensemble Method for Single-Cell. Front Genet 2020; 11:604790. [PMID: 33384718 PMCID: PMC7770236 DOI: 10.3389/fgene.2020.604790] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 11/23/2020] [Indexed: 01/23/2023] Open
Abstract
Clustering is an efficient way to analyze single-cell RNA sequencing data. It is commonly used to identify cell types, which can help in understanding cell differentiation processes. However, different clustering results can be obtained from different single-cell clustering methods, sometimes including conflicting conclusions, and biologists will often fail to get the right clustering results and interpret the biological significance. The cluster ensemble strategy can be an effective solution for the problem. As the graph partitioning-based clustering methods are good at clustering single-cell, we developed Sc-GPE, a novel cluster ensemble method combining five single-cell graph partitioning-based clustering methods. The five methods are SNN-cliq, PhenoGraph, SC3, SSNN-Louvain, and MPGS-Louvain. In Sc-GPE, a consensus matrix is constructed based on the five clustering solutions by calculating the probability that the cell pairs are divided into the same cluster. It solved the problem in the hypergraph-based ensemble approach, including the different cluster labels that were assigned in the individual clustering method, and it was difficult to find the corresponding cluster labels across all methods. Then, to distinguish the different importance of each method in a clustering ensemble, a weighted consensus matrix was constructed by designing an importance score strategy. Finally, hierarchical clustering was performed on the weighted consensus matrix to cluster cells. To evaluate the performance, we compared Sc-GPE with the individual clustering methods and the state-of-the-art SAME-clustering on 12 single-cell RNA-seq datasets. The results show that Sc-GPE obtained the best average performance, and achieved the highest NMI and ARI value in five datasets.
Collapse
Affiliation(s)
- Xiaoshu Zhu
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China.,Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Jian Li
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Hong-Dong Li
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| | - Miao Xie
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Jianxin Wang
- Hunan Provincial Key Laboratory on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
2
|
Lipowski A, Ferreira AL, Lipowska D. Cluster Structure of Optimal Solutions in Bipartitioning of Small Worlds. Entropy (Basel) 2020; 22:E1319. [PMID: 33287084 DOI: 10.3390/e22111319] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 11/16/2020] [Accepted: 11/17/2020] [Indexed: 12/02/2022]
Abstract
Using simulated annealing, we examine a bipartitioning of small worlds obtained by adding a fraction of randomly chosen links to a one-dimensional chain or a square lattice. Models defined on small worlds typically exhibit a mean-field behavior, regardless of the underlying lattice. Our work demonstrates that the bipartitioning of small worlds does depend on the underlying lattice. Simulations show that for one-dimensional small worlds, optimal partitions are finite size clusters for any fraction of additional links. In the two-dimensional case, we observe two regimes: when the fraction of additional links is sufficiently small, the optimal partitions have a stripe-like shape, which is lost for a larger number of additional links as optimal partitions become disordered. Some arguments, which interpret additional links as thermal excitations and refer to the thermodynamics of Ising models, suggest a qualitative explanation of such a behavior. The histogram of overlaps suggests that a replica symmetry is broken in a one-dimensional small world. In the two-dimensional case, the replica symmetry seems to hold, but with some additional degeneracy of stripe-like partitions.
Collapse
|
3
|
Riehl JR, Zimmerman MI, Singh MF, Bowman GR, Ching S. Computing and optimizing over all fixed-points of discrete systems on large networks. J R Soc Interface 2020; 17:20200126. [PMID: 32900299 DOI: 10.1098/rsif.2020.0126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Equilibria, or fixed points, play an important role in dynamical systems across various domains, yet finding them can be computationally challenging. Here, we show how to efficiently compute all equilibrium points of discrete-valued, discrete-time systems on sparse networks. Using graph partitioning, we recursively decompose the original problem into a set of smaller, simpler problems that are easy to compute, and whose solutions combine to yield the full equilibrium set. This makes it possible to find the fixed points of systems on arbitrarily large networks meeting certain criteria. This approach can also be used without computing the full equilibrium set, which may grow very large in some cases. For example, one can use this method to check the existence and total number of equilibria, or to find equilibria that are optimal with respect to a given cost function. We demonstrate the potential capabilities of this approach with examples in two scientific domains: computing the number of fixed points in brain networks and finding the minimal energy conformations of lattice-based protein folding models.
Collapse
Affiliation(s)
- James R Riehl
- Department of Electrical and Systems Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, MO 63130, USA
| | - Maxwell I Zimmerman
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 S Euclid Avenue, St Louis, MO 63110, USA
| | - Matthew F Singh
- Department of Electrical and Systems Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, MO 63130, USA
| | - Gregory R Bowman
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 S Euclid Avenue, St Louis, MO 63110, USA
| | - ShiNung Ching
- Department of Electrical and Systems Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, MO 63130, USA.,Department of Biomedical Engineering, Washington University in St Louis, 1 Brookings Drive, St Louis, MO 63130, USA.,Division of Biology and Biomedical Sciences, Washington University School of Medicine, 660 S Euclid Avenue, St Louis, MO 63110, USA
| |
Collapse
|
4
|
Kumar D, Gupta J, Raha S. Partitioning a reaction-diffusion ecological network for dynamic stability. Proc Math Phys Eng Sci 2019; 475:20180524. [PMID: 31007543 DOI: 10.1098/rspa.2018.0524] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 02/14/2019] [Indexed: 11/12/2022] Open
Abstract
The loss of dispersal connections between habitat patches may destabilize populations in a patched ecological network. This work studies the stability of populations when one or more communication links is removed. An example is finding the alignment of a highway through a patched forest containing a network of metapopulations in the patches. This problem is modelled as that of finding a stable cut of the graph induced by the metapopulations network, where nodes represent the habitat patches and the weighted edges model the dispersal between habitat patches. A reaction-diffusion system on the graph models the dynamics of the predator-prey system over the patched ecological network. The graph Laplacian's Fiedler value, which indicates the well-connectedness of the graph, is shown to affect the stability of the metapopulations. We show that, when the Fiedler value is sufficiently large, the removal of edges without destabilizing the dynamics of the network is possible. We give an exhaustive graph partitioning procedure, which is suitable for smaller networks and uses the criterion for both the local and global stability of populations in partitioned networks. A heuristic graph bisection algorithm that preserves the preassigned lower bound for the Fiedler value is proposed for larger networks and is illustrated with examples.
Collapse
Affiliation(s)
- Dinesh Kumar
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore 560012, India
| | - Jatin Gupta
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore 560012, India
| | - Soumyendu Raha
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore 560012, India
| |
Collapse
|
5
|
Jain S, Bayrak CS, Petingi L, Schlick T. Dual Graph Partitioning Highlights a Small Group of Pseudoknot-Containing RNA Submotifs. Genes (Basel) 2018; 9:E371. [PMID: 30044451 PMCID: PMC6115904 DOI: 10.3390/genes9080371] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 06/26/2018] [Accepted: 06/26/2018] [Indexed: 12/31/2022] Open
Abstract
RNA molecules are composed of modular architectural units that define their unique structural and functional properties. Characterization of these building blocks can help interpret RNA structure/function relationships. We present an RNA secondary structure motif and submotif library using dual graph representation and partitioning. Dual graphs represent RNA helices as vertices and loops as edges. Unlike tree graphs, dual graphs can represent RNA pseudoknots (intertwined base pairs). For a representative set of RNA structures, we construct dual graphs from their secondary structures, and apply our partitioning algorithm to identify non-separable subgraphs (or blocks) without breaking pseudoknots. We report 56 subgraph blocks up to nine vertices; among them, 22 are frequently occurring, 15 of which contain pseudoknots. We then catalog atomic fragments corresponding to the subgraph blocks to define a library of building blocks that can be used for RNA design, which we call RAG-3Dual, as we have done for tree graphs. As an application, we analyze the distribution of these subgraph blocks within ribosomal RNAs of various prokaryotic and eukaryotic species to identify common subgraphs and possible ancestry relationships. Other applications of dual graph partitioning and motif library can be envisioned for RNA structure analysis and design.
Collapse
Affiliation(s)
- Swati Jain
- Department of Chemistry, New York University, New York, NY 10003, USA.
| | - Cigdem S Bayrak
- Department of Chemistry, New York University, New York, NY 10003, USA.
| | - Louis Petingi
- Computer Science Department, College of Staten Island, City University of New York, Staten Island, New York, NY 10314, USA.
| | - Tamar Schlick
- Department of Chemistry, New York University, New York, NY 10003, USA.
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA.
- NYU-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 3663, China.
| |
Collapse
|
6
|
Huang D, Xu C, Zhao D, Song W, He Q. A Multi-Objective Partition Method for Marine Sensor Networks Based on Degree of Event Correlation. Sensors (Basel) 2017; 17:E2168. [PMID: 28934126 DOI: 10.3390/s17102168] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 08/25/2017] [Accepted: 09/15/2017] [Indexed: 11/17/2022]
Abstract
Existing marine sensor networks acquire data from sea areas that are geographically divided, and store the data independently in their affiliated sea area data centers. In the case of marine events across multiple sea areas, the current network structure needs to retrieve data from multiple data centers, and thus severely affects real-time decision making. In this study, in order to provide a fast data retrieval service for a marine sensor network, we use all the marine sensors as the vertices, establish the edge based on marine events, and abstract the marine sensor network as a graph. Then, we construct a multi-objective balanced partition method to partition the abstract graph into multiple regions and store them in the cloud computing platform. This method effectively increases the correlation of the sensors and decreases the retrieval cost. On this basis, an incremental optimization strategy is designed to dynamically optimize existing partitions when new sensors are added into the network. Experimental results show that the proposed method can achieve the optimal layout for distributed storage in the process of disaster data retrieval in the China Sea area, and effectively optimize the result of partitions when new buoys are deployed, which eventually will provide efficient data access service for marine events.
Collapse
|
7
|
Nguyen L, Tosun AB, Fine JL, Lee AV, Taylor DL, Chennubhotla SC. Spatial Statistics for Segmenting Histological Structures in H&E Stained Tissue Images. IEEE Trans Med Imaging 2017; 36:1522-1532. [PMID: 28328502 PMCID: PMC5498226 DOI: 10.1109/tmi.2017.2681519] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Segmenting a broad class of histological structures in transmitted light and/or fluorescence-based images is a prerequisite for determining the pathological basis of cancer, elucidating spatial interactions between histological structures in tumor microenvironments (e.g., tumor infiltrating lymphocytes), facilitating precision medicine studies with deep molecular profiling, and providing an exploratory tool for pathologists. This paper focuses on segmenting histological structures in hematoxylin- and eosin-stained images of breast tissues, e.g., invasive carcinoma, carcinoma in situ, atypical and normal ducts, adipose tissue, and lymphocytes. We propose two graph-theoretic segmentation methods based on local spatial color and nuclei neighborhood statistics. For benchmarking, we curated a data set of 232 high-power field breast tissue images together with expertly annotated ground truth. To accurately model the preference for histological structures (ducts, vessels, tumor nets, adipose, etc.) over the remaining connective tissue and non-tissue areas in ground truth annotations, we propose a new region-based score for evaluating segmentation algorithms. We demonstrate the improvement of our proposed methods over the state-of-the-art algorithms in both region- and boundary-based performance measures.
Collapse
|
8
|
Abstract
One of the primary challenges in single particle reconstruction with cryo-electron microscopy is to find a three-dimensional model of a molecule using its noisy two-dimensional projection-images. As the imaging orientations of the projection-images are unknown, we suggest a common-lines-based method to simultaneously estimate the imaging orientations of all images that is independent of the distribution of the orientations. Since the relative orientation of each pair of images may only be estimated up to a two-way handedness ambiguity, we suggest an efficient procedure to consistently assign the same handedness to all relative orientations. This is achieved by casting the handedness assignment problem as a graph-partitioning problem. Once a consistent handedness of all relative orientations is determined, the orientations corresponding to all projection-images are determined simultaneously, thus rendering the method robust to noise. Our proposed method has also the advantage of allowing one to incorporate confidence information regarding the trustworthiness of each relative orientation in a natural manner. We demonstrate the efficacy of our approach using simulated clean and noisy data.
Collapse
Affiliation(s)
- Gabi Pragier
- Department of Applied Mathematics, School of Mathematical Sciences, Tel-Aviv University, Tel Aviv 6997801, Israel
| | - Ido Greenberg
- Department of Applied Mathematics, School of Mathematical Sciences, Tel-Aviv University, Tel Aviv 6997801, Israel
| | - Xiuyuan Cheng
- Applied Mathematics Program, Yale University, New Haven, CT 06520 USA
| | - Yoel Shkolnisky
- Department of Applied Mathematics, School of Mathematical Sciences, Tel-Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|