451
|
Šubelj L, van Eck NJ, Waltman L. Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods. PLoS One 2016; 11:e0154404. [PMID: 27124610 PMCID: PMC4849655 DOI: 10.1371/journal.pone.0154404] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Accepted: 04/13/2016] [Indexed: 11/19/2022] Open
Abstract
Clustering methods are applied regularly in the bibliometric literature to identify research areas or scientific fields. These methods are for instance used to group publications into clusters based on their relations in a citation network. In the network science literature, many clustering methods, often referred to as graph partitioning or community detection techniques, have been developed. Focusing on the problem of clustering the publications in a citation network, we present a systematic comparison of the performance of a large number of these clustering methods. Using a number of different citation networks, some of them relatively small and others very large, we extensively study the statistical properties of the results provided by different methods. In addition, we also carry out an expert-based assessment of the results produced by different methods. The expert-based assessment focuses on publications in the field of scientometrics. Our findings seem to indicate that there is a trade-off between different properties that may be considered desirable for a good clustering of publications. Overall, map equation methods appear to perform best in our analysis, suggesting that these methods deserve more attention from the bibliometric community.
Collapse
Affiliation(s)
- Lovro Šubelj
- University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia
- * E-mail:
| | - Nees Jan van Eck
- Leiden University, Centre for Science and Technology Studies, Leiden, Netherlands
| | - Ludo Waltman
- Leiden University, Centre for Science and Technology Studies, Leiden, Netherlands
| |
Collapse
|
452
|
Abstract
Detecting communities or clusters in a real-world, networked system is of considerable interest in various fields such as sociology, biology, physics, engineering science, and interdisciplinary subjects, with significant efforts devoted in recent years. Many existing algorithms are only designed to identify the composition of communities, but not the structures. Whereas we believe that the local structures of communities can also shed important light on their detection. In this work, we develop a simple yet effective approach that simultaneously uncovers communities and their centers. The idea is based on the premise that organization of a community generally can be viewed as a high-density node surrounded by neighbors with lower densities, and community centers reside far apart from each other. We propose so-called “community centrality” to quantify likelihood of a node being the community centers in such a landscape, and then propagate multiple, significant center likelihood throughout the network via a diffusion process. Our approach is an efficient linear algorithm, and has demonstrated superior performance on a wide spectrum of synthetic and real world networks especially those with sparse connections amongst the community centers.
Collapse
|
453
|
Greenbaum G, Templeton AR, Bar-David S. Inference and Analysis of Population Structure Using Genetic Data and Network Theory. Genetics 2016; 202:1299-312. [PMID: 26888080 PMCID: PMC4905528 DOI: 10.1534/genetics.115.182626] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 02/03/2016] [Indexed: 11/18/2022] Open
Abstract
Clustering individuals to subpopulations based on genetic data has become commonplace in many genetic studies. Inference about population structure is most often done by applying model-based approaches, aided by visualization using distance-based approaches such as multidimensional scaling. While existing distance-based approaches suffer from a lack of statistical rigor, model-based approaches entail assumptions of prior conditions such as that the subpopulations are at Hardy-Weinberg equilibria. Here we present a distance-based approach for inference about population structure using genetic data by defining population structure using network theory terminology and methods. A network is constructed from a pairwise genetic-similarity matrix of all sampled individuals. The community partition, a partition of a network to dense subgraphs, is equated with population structure, a partition of the population to genetically related groups. Community-detection algorithms are used to partition the network into communities, interpreted as a partition of the population to subpopulations. The statistical significance of the structure can be estimated by using permutation tests to evaluate the significance of the partition's modularity, a network theory measure indicating the quality of community partitions. To further characterize population structure, a new measure of the strength of association (SA) for an individual to its assigned community is presented. The strength of association distribution (SAD) of the communities is analyzed to provide additional population structure characteristics, such as the relative amount of gene flow experienced by the different subpopulations and identification of hybrid individuals. Human genetic data and simulations are used to demonstrate the applicability of the analyses. The approach presented here provides a novel, computationally efficient model-free method for inference about population structure that does not entail assumption of prior conditions. The method is implemented in the software NetStruct (available at https://giligreenbaum.wordpress.com/software/).
Collapse
Affiliation(s)
- Gili Greenbaum
- Department of Solar Energy and Environmental Physics, Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, 84990 Midreshet Ben-Gurion, Israel Mitrani Department of Desert Ecology, Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, 84990 Midreshet Ben-Gurion, Israel
| | - Alan R Templeton
- Department of Biology, Washington University, St. Louis, Missouri 63130 Department of Evolutionary and Environmental Ecology, University of Haifa, 31905 Haifa, Israel
| | - Shirli Bar-David
- Mitrani Department of Desert Ecology, Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, 84990 Midreshet Ben-Gurion, Israel
| |
Collapse
|
454
|
Hu Y, Yang B. Characterizing the structure of large real networks to improve community detection. Neural Comput Appl 2016. [DOI: 10.1007/s00521-016-2264-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
455
|
Tripathi S, Moutari S, Dehmer M, Emmert-Streib F. Comparison of module detection algorithms in protein networks and investigation of the biological meaning of predicted modules. BMC Bioinformatics 2016; 17:129. [PMID: 26987731 PMCID: PMC4797184 DOI: 10.1186/s12859-016-0979-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 03/06/2016] [Indexed: 01/22/2023] Open
Abstract
Background It is generally acknowledged that a functional understanding of a biological system can only be obtained by an understanding of the collective of molecular interactions in form of biological networks. Protein networks are one particular network type of special importance, because proteins form the functional base units of every biological cell. On a mesoscopic level of protein networks, modules are of significant importance because these building blocks may be the next elementary functional level above individual proteins allowing to gain insight into fundamental organizational principles of biological cells. Results In this paper, we provide a comparative analysis of five popular and four novel module detection algorithms. We study these module prediction methods for simulated benchmark networks as well as 10 biological protein interaction networks (PINs). A particular focus of our analysis is placed on the biological meaning of the predicted modules by utilizing the Gene Ontology (GO) database as gold standard for the definition of biological processes. Furthermore, we investigate the robustness of the results by perturbing the PINs simulating in this way our incomplete knowledge of protein networks. Conclusions Overall, our study reveals that there is a large heterogeneity among the different module prediction algorithms if one zooms-in the biological level of biological processes in the form of GO terms and all methods are severely affected by a slight perturbation of the networks. However, we also find pathways that are enriched in multiple modules, which could provide important information about the hierarchical organization of the system.
Collapse
Affiliation(s)
- Shailesh Tripathi
- Predictive Medicine and Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland
| | - Salissou Moutari
- Centre for Statistical Science and Operational Research, School of Mathematics and Physics, Queen's University Belfast, Belfast, UK
| | - Matthias Dehmer
- Institute for Theoretical Informatics, Mathematics and Operations Research, Department of Computer Science, Universität der Bundeswehr München, Munich, Germany
| | - Frank Emmert-Streib
- Predictive Medicine and Analytics Lab, Department of Signal Processing, Tampere University of Technology, Tampere, Finland. .,Institute of Biosciences and Medical Technology, Tampere, Finland.
| |
Collapse
|
456
|
Sun H, Liu J, Huang J, Wang G, Jia X, Song Q. LinkLPA: A Link-Based Label Propagation Algorithm for Overlapping Community Detection in Networks. Comput Intell 2016. [DOI: 10.1111/coin.12087] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Affiliation(s)
- Heli Sun
- Department of Computer Science Technology; Xi'an Jiaotong University; Xi'an China
| | - Jiao Liu
- Department of Computer Science Technology; Xi'an Jiaotong University; Xi'an China
| | | | - Guangtao Wang
- Department of Computer Science Technology; Xi'an Jiaotong University; Xi'an China
| | - Xiaolin Jia
- Department of Computer Science Technology; Xi'an Jiaotong University; Xi'an China
| | - Qinbao Song
- Department of Computer Science Technology; Xi'an Jiaotong University; Xi'an China
| |
Collapse
|
457
|
Nearest Neighbor Search in the Metric Space of a Complex Network for Community Detection. INFORMATION 2016. [DOI: 10.3390/info7010017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
458
|
Discovering communities in complex networks by edge label propagation. Sci Rep 2016; 6:22470. [PMID: 26926830 PMCID: PMC4772381 DOI: 10.1038/srep22470] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2015] [Accepted: 02/16/2016] [Indexed: 12/31/2022] Open
Abstract
The discovery of the community structure of real-world networks is still an open problem. Many methods have been proposed to shed light on this problem, and most of these have focused on discovering node community. However, link community is also a powerful framework for discovering overlapping communities. Here we present a novel edge label propagation algorithm (ELPA), which combines the natural advantage of link communities with the efficiency of the label propagation algorithm (LPA). ELPA can discover both link communities and node communities. We evaluated ELPA on both synthetic and real-world networks, and compared it with five state-of-the-art methods. The results demonstrate that ELPA performs competitively with other algorithms.
Collapse
|
459
|
Zhou B. Applying the Clique Percolation Method to analyzing cross-market branch banking network structure: the case of Illinois. SOCIAL NETWORK ANALYSIS AND MINING 2016. [DOI: 10.1007/s13278-016-0318-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
460
|
Villandre L, Stephens DA, Labbe A, Günthard HF, Kouyos R, Stadler T. Assessment of Overlap of Phylogenetic Transmission Clusters and Communities in Simple Sexual Contact Networks: Applications to HIV-1. PLoS One 2016; 11:e0148459. [PMID: 26863322 PMCID: PMC4749335 DOI: 10.1371/journal.pone.0148459] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 01/18/2016] [Indexed: 02/06/2023] Open
Abstract
Background Transmission patterns of sexually-transmitted infections (STIs) could relate to the structure of the underlying sexual contact network, whose features are therefore of interest to clinicians. Conventionally, we represent sexual contacts in a population with a graph, that can reveal the existence of communities. Phylogenetic methods help infer the history of an epidemic and incidentally, may help detecting communities. In particular, phylogenetic analyses of HIV-1 epidemics among men who have sex with men (MSM) have revealed the existence of large transmission clusters, possibly resulting from within-community transmissions. Past studies have explored the association between contact networks and phylogenies, including transmission clusters, producing conflicting conclusions about whether network features significantly affect observed transmission history. As far as we know however, none of them thoroughly investigated the role of communities, defined with respect to the network graph, in the observation of clusters. Methods The present study investigates, through simulations, community detection from phylogenies. We simulate a large number of epidemics over both unweighted and weighted, undirected random interconnected-islands networks, with islands corresponding to communities. We use weighting to modulate distance between islands. We translate each epidemic into a phylogeny, that lets us partition our samples of infected subjects into transmission clusters, based on several common definitions from the literature. We measure similarity between subjects’ island membership indices and transmission cluster membership indices with the adjusted Rand index. Results and Conclusion Analyses reveal modest mean correspondence between communities in graphs and phylogenetic transmission clusters. We conclude that common methods often have limited success in detecting contact network communities from phylogenies. The rarely-fulfilled requirement that network communities correspond to clades in the phylogeny is their main drawback. Understanding the link between transmission clusters and communities in sexual contact networks could help inform policymaking to curb HIV incidence in MSMs.
Collapse
Affiliation(s)
- Luc Villandre
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Québec, Canada
| | - David A. Stephens
- Department of Mathematics and Statistics, McGill University, Montréal, Québec, Canada
| | - Aurelie Labbe
- Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montréal, Québec, Canada
- Department of Psychiatry, Douglas Mental Health University Institute, Montréal, Québec, Canada
| | - Huldrych F. Günthard
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, Zurich, Kanton Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| | - Roger Kouyos
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, Zurich, Kanton Zurich, Switzerland
- Institute of Medical Virology, University of Zurich, Zurich, Switzerland
| | - Tanja Stadler
- Department of Biosystems Science and Engineering, ETH Zürich, Basel, Basel-Landschaft, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- * E-mail:
| | | |
Collapse
|
461
|
Atzmueller M, Doerfel S, Mitzlaff F. Description-oriented community detection using exhaustive subgroup discovery. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2015.05.008] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
462
|
Wang M, Zuo W, Wang Y. An improved density peaks-based clustering method for social circle discovery in social networks. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.11.091] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
463
|
Danila B. Comprehensive spectral approach for community structure analysis on complex networks. Phys Rev E 2016; 93:022301. [PMID: 26986346 DOI: 10.1103/physreve.93.022301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Indexed: 06/05/2023]
Abstract
A simple but efficient spectral approach for analyzing the community structure of complex networks is introduced. It works the same way for all types of networks, by spectrally splitting the adjacency matrix into a "unipartite" and a "multipartite" component. These two matrices reveal the structure of the network from different perspectives and can be analyzed at different levels of detail. Their entries, or the entries of their lower-rank approximations, provide measures of the affinity or antagonism between the nodes that highlight the communities and the "gateway" links that connect them together. An algorithm is then proposed to achieve the automatic assignment of the nodes to communities based on the information provided by either matrix. This algorithm naturally generates overlapping communities but can also be tuned to eliminate the overlaps.
Collapse
Affiliation(s)
- Bogdan Danila
- Science Department, BMCC, The City University of New York, 199 Chambers St, New York, New York 10007-1047, USA
| |
Collapse
|
464
|
|
465
|
Peng L, Carvalho L. Bayesian degree-corrected stochastic blockmodels for community detection. Electron J Stat 2016. [DOI: 10.1214/16-ejs1163] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
466
|
|
467
|
|
468
|
Yoshida T, Yamada Y. A Community Structure-Based Approach for Network Immunization. Comput Intell 2015. [DOI: 10.1111/coin.12082] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Tetsuya Yoshida
- Graduate School of Humanities Sciences; Nara Women's University; Japan
| | - Yuu Yamada
- Graduate School of Information Science and Technology; Hokkaido University; Japan
| |
Collapse
|
469
|
Žalik KR. Maximal Neighbor Similarity Reveals Real Communities in Networks. Sci Rep 2015; 5:18374. [PMID: 26680448 PMCID: PMC4683394 DOI: 10.1038/srep18374] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Accepted: 11/16/2015] [Indexed: 11/16/2022] Open
Abstract
An important problem in the analysis of network data is the detection of groups of densely interconnected nodes also called modules or communities. Community structure reveals functions and organizations of networks. Currently used algorithms for community detection in large-scale real-world networks are computationally expensive or require a priori information such as the number or sizes of communities or are not able to give the same resulting partition in multiple runs. In this paper we investigate a simple and fast algorithm that uses the network structure alone and requires neither optimization of pre-defined objective function nor information about number of communities. We propose a bottom up community detection algorithm in which starting from communities consisting of adjacent pairs of nodes and their maximal similar neighbors we find real communities. We show that the overall advantage of the proposed algorithm compared to the other community detection algorithms is its simple nature, low computational cost and its very high accuracy in detection communities of different sizes also in networks with blurred modularity structure consisting of poorly separated communities. All communities identified by the proposed method for facebook network and E-Coli transcriptional regulatory network have strong structural and functional coherence.
Collapse
Affiliation(s)
- Krista Rizman Žalik
- University of Maribor, Faculty of Electrical Engineering and Computer Science, Slovenia
| |
Collapse
|
470
|
Cantini L, Medico E, Fortunato S, Caselle M. Detection of gene communities in multi-networks reveals cancer drivers. Sci Rep 2015; 5:17386. [PMID: 26639632 PMCID: PMC4671005 DOI: 10.1038/srep17386] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Accepted: 10/29/2015] [Indexed: 12/25/2022] Open
Abstract
We propose a new multi-network-based strategy to integrate different layers of genomic information and use them in a coordinate way to identify driving cancer genes. The multi-networks that we consider combine transcription factor co-targeting, microRNA co-targeting, protein-protein interaction and gene co-expression networks. The rationale behind this choice is that gene co-expression and protein-protein interactions require a tight coregulation of the partners and that such a fine tuned regulation can be obtained only combining both the transcriptional and post-transcriptional layers of regulation. To extract the relevant biological information from the multi-network we studied its partition into communities. To this end we applied a consensus clustering algorithm based on state of art community detection methods. Even if our procedure is valid in principle for any pathology in this work we concentrate on gastric, lung, pancreas and colorectal cancer and identified from the enrichment analysis of the multi-network communities a set of candidate driver cancer genes. Some of them were already known oncogenes while a few are new. The combination of the different layers of information allowed us to extract from the multi-network indications on the regulatory pattern and functional role of both the already known and the new candidate driver genes.
Collapse
Affiliation(s)
- Laura Cantini
- Università di Torino, Department of Oncology, Candiolo, Italy
- Politecnico di Torino, Department of Control and Computer Engineering, Torino, Italy
- Istituto Nazionale Biostrutture e Biosistemi - Consorzio Interuniversitario, Viale delle Medaglie d’Oro, 305 - 00136 Roma, Italy
| | - Enzo Medico
- Università di Torino, Department of Oncology, Candiolo, Italy
- Candiolo Cancer Institute, FPO IRCCS, Candiolo, Italy
| | - Santo Fortunato
- Department of Computer Science, Aalto University School of Science, Aalto, Finland
| | - Michele Caselle
- Università di Torino, Department of Physics and INFN, Torino, Italy
| |
Collapse
|
471
|
Chen Z, Xie Z, Zhang Q. Community detection based on local topological information and its application in power grid. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.04.093] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
472
|
Wang W, Street WN. Modeling influence diffusion to uncover influence centrality and community structure in social networks. SOCIAL NETWORK ANALYSIS AND MINING 2015. [DOI: 10.1007/s13278-015-0254-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
473
|
|
474
|
Li HJ. The comparison of significance of fuzzy community partition across optimization methods. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2015. [DOI: 10.3233/ifs-151974] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
475
|
Gaiteri C, Chen M, Szymanski B, Kuzmin K, Xie J, Lee C, Blanche T, Chaibub Neto E, Huang SC, Grabowski T, Madhyastha T, Komashko V. Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering. Sci Rep 2015; 5:16361. [PMID: 26549511 PMCID: PMC4637843 DOI: 10.1038/srep16361] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 10/02/2015] [Indexed: 11/29/2022] Open
Abstract
Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clustering methods detect non-overlapping communities. These detected communities may also be unstable and difficult to replicate, because traditional methods are sensitive to noise and parameter settings. These aspects of traditional clustering methods limit our ability to detect biological communities, and therefore our ability to understand biological functions. To address these limitations and detect robust overlapping biological communities, we propose an unorthodox clustering method called SpeakEasy which identifies communities using top-down and bottom-up approaches simultaneously. Specifically, nodes join communities based on their local connections, as well as global information about the network structure. This method can quantify the stability of each community, automatically identify the number of communities, and quickly cluster networks with hundreds of thousands of nodes. SpeakEasy shows top performance on synthetic clustering benchmarks and accurately identifies meaningful biological communities in a range of datasets, including: gene microarrays, protein interactions, sorted cell populations, electrophysiology and fMRI brain imaging.
Collapse
Affiliation(s)
- Chris Gaiteri
- Rush University Medical Center, Alzheimer's Disease Center, Chicago, IL.,Allen Institute for Brain Science, Modeling, Analysis and Theory Group, Seattle, WA
| | - Mingming Chen
- Rennselaer Polytechnic Institute, Department of Computer Science, Troy, NY
| | - Boleslaw Szymanski
- Rennselaer Polytechnic Institute, Department of Computer Science, Troy, NY.,Społeczna Akademia Nauk, Łódź, Poland
| | - Konstantin Kuzmin
- Rennselaer Polytechnic Institute, Department of Computer Science, Troy, NY
| | - Jierui Xie
- Rennselaer Polytechnic Institute, Department of Computer Science, Troy, NY.,Samsung Research America, San Jose, CA
| | - Changkyu Lee
- Allen Institute for Brain Science, Modeling, Analysis and Theory Group, Seattle, WA
| | - Timothy Blanche
- Allen Institute for Brain Science, Modeling, Analysis and Theory Group, Seattle, WA
| | | | - Su-Chun Huang
- University of Washington, Department of Neurology, Seattle, WA
| | - Thomas Grabowski
- University of Washington, Department of Neurology, Seattle, WA.,University of Washington, Department of Radiology, Seattle, WA
| | - Tara Madhyastha
- University of Washington, Department of Radiology, Seattle, WA
| | | |
Collapse
|
476
|
|
477
|
Mangin B, Sandron F, Henry K, Devaux B, Willems G, Devaux P, Goudemand E. Breeding patterns and cultivated beets origins by genetic diversity and linkage disequilibrium analyses. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2015; 128:2255-2271. [PMID: 26239407 DOI: 10.1007/s00122-015-2582-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 07/10/2015] [Indexed: 06/04/2023]
Abstract
Genetic diversity in worldwide population of beets is strongly affected by the domestication history, and the comparison of linkage disequilibrium in worldwide and elite populations highlights strong selection pressure. Genetic relationships and linkage disequilibrium (LD) were evaluated in a set of 2035 worldwide beet accessions and in another of 1338 elite sugar beet lines, using 320 and 769 single nucleotide polymorphisms, respectively. The structures of the populations were analyzed using four different approaches. Within the worldwide population, three of the methods gave a very coherent picture of the population structure. Fodder beet and sugar beet accessions were grouped together, separated from garden beets and sea beets, reflecting well the origins of beet domestication. The structure of the elite panel, however, was less stable between clustering methods, which was probably because of the high level of genetic mixing in breeding programs. For the linkage disequilibrium analysis, the usual measure (r (2)) was used, and compared with others that correct for population structure and relatedness (r S (2) , r V (2) , r VS (2)). The LD as measured by r (2) persisted beyond 10 cM within the elite panel and fell below 0.1 after less than 2 cM in the worldwide population, for almost all chromosomes. With correction for relatedness, LD decreased under 0.1 by 1 cM for almost all chromosomes in both populations, except for chromosomes 3 and 9 within the elite panel. In these regions, the larger extent of LD could be explained by strong selection pressure.
Collapse
Affiliation(s)
- Brigitte Mangin
- INRA, Laboratoire des Interactions Plantes-Microorganismes (LIPM), UMR441, 31326, Castanet-Tolosan, France
- CNRS, Laboratoire des Interactions Plantes-Microorganismes (LIPM), UMR2594, 31326, Castanet-Tolosan, France
- INRA, Mathématique et Informatique Appliquées de Toulouse (MIAT), UR875, 31326, Castanet-Tolosan, France
| | - Florian Sandron
- INRA, Mathématique et Informatique Appliquées de Toulouse (MIAT), UR875, 31326, Castanet-Tolosan, France
| | - Karine Henry
- S.A.S. Florimond-Desprez Veuve and Fils, BP41, 59242, Cappelle-en-Pévèle, France
| | - Brigitte Devaux
- S.A.S. Florimond-Desprez Veuve and Fils, BP41, 59242, Cappelle-en-Pévèle, France
| | - Glenda Willems
- SESVanderHave, Industriepark Soldatenplein Zone 2/Nr 15, 3300, Tienen, Belgium
| | - Pierre Devaux
- S.A.S. Florimond-Desprez Veuve and Fils, BP41, 59242, Cappelle-en-Pévèle, France
| | - Ellen Goudemand
- S.A.S. Florimond-Desprez Veuve and Fils, BP41, 59242, Cappelle-en-Pévèle, France.
| |
Collapse
|
478
|
Liu K, Huang J, Sun H, Wan M, Qi Y, Li H. Label propagation based evolutionary clustering for detecting overlapping and non-overlapping communities in dynamic networks. Knowl Based Syst 2015. [DOI: 10.1016/j.knosys.2015.08.015] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
479
|
Traag VA. Faster unfolding of communities: speeding up the Louvain algorithm. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:032801. [PMID: 26465522 DOI: 10.1103/physreve.92.032801] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Indexed: 06/05/2023]
Abstract
Many complex networks exhibit a modular structure of densely connected groups of nodes. Usually, such a modular structure is uncovered by the optimization of some quality function. Although flawed, modularity remains one of the most popular quality functions. The Louvain algorithm was originally developed for optimizing modularity, but has been applied to a variety of methods. As such, speeding up the Louvain algorithm enables the analysis of larger graphs in a shorter time for various methods. We here suggest to consider moving nodes to a random neighbor community, instead of the best neighbor community. Although incredibly simple, it reduces the theoretical runtime complexity from O(m) to O(nlog〈k〉) in networks with a clear community structure. In benchmark networks, it speeds up the algorithm roughly 2-3 times, while in some real networks it even reaches 10 times faster runtimes. This improvement is due to two factors: (1) a random neighbor is likely to be in a "good" community and (2) random neighbors are likely to be hubs, helping the convergence. Finally, the performance gain only slightly diminishes the quality, especially for modularity, thus providing a good quality-performance ratio. However, these gains are less pronounced, or even disappear, for some other measures such as significance or surprise.
Collapse
Affiliation(s)
- V A Traag
- Royal Netherlands Institute of Southeast Asian and Caribbean Studies, Reuvensplaats 2, 2311 BE Leiden, the Netherlands and e-Humanities group, Royal Netherlands Academy of Arts and Sciences, Joan Muyskenweg 25, 1096 CJ Amsterdam, the Netherlands
| |
Collapse
|
480
|
Lecca P, Re A. Detecting modules in biological networks by edge weight clustering and entropy significance. Front Genet 2015; 6:265. [PMID: 26379697 PMCID: PMC4551098 DOI: 10.3389/fgene.2015.00265] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 07/30/2015] [Indexed: 12/04/2022] Open
Abstract
Detection of the modular structure of biological networks is of interest to researchers adopting a systems perspective for the analysis of omics data. Computational systems biology has provided a rich array of methods for network clustering. To date, the majority of approaches address this task through a network node classification based on topological or external quantifiable properties of network nodes. Conversely, numerical properties of network edges are underused, even though the information content which can be associated with network edges has augmented due to steady advances in molecular biology technology over the last decade. Properly accounting for network edges in the development of clustering approaches can become crucial to improve quantitative interpretation of omics data, finally resulting in more biologically plausible models. In this study, we present a novel technique for network module detection, named WG-Cluster (Weighted Graph CLUSTERing). WG-Cluster's notable features, compared to current approaches, lie in: (1) the simultaneous exploitation of network node and edge weights to improve the biological interpretability of the connected components detected, (2) the assessment of their statistical significance, and (3) the identification of emerging topological properties in the detected connected components. WG-Cluster utilizes three major steps: (i) an unsupervised version of k-means edge-based algorithm detects sub-graphs with similar edge weights, (ii) a fast-greedy algorithm detects connected components which are then scored and selected according to the statistical significance of their scores, and (iii) an analysis of the convolution between sub-graph mean edge weight and connected component score provides a summarizing view of the connected components. WG-Cluster can be applied to directed and undirected networks of different types of interacting entities and scales up to large omics data sets. Here, we show that WG-Cluster can be successfully used in the differential analysis of physical protein–protein interaction (PPI) networks. Specifically, applying WG-Cluster to a PPI network weighted by measurements of differential gene expression permits to explore the changes in network topology under two distinct (normal vs. tumor) conditions. WG-Cluster code is available at https://sites.google.com/site/paolaleccapersonalpage/.
Collapse
Affiliation(s)
- Paola Lecca
- Centre for Integrative Biology, University of Trento Italy
| | - Angela Re
- Laboratory of Translational Genomics, Centre for Integrative Biology, University of Trento Trento, Italy
| |
Collapse
|
481
|
|
482
|
Rajtmajer SM, Roy A, Albert R, Molenaar PCM, Hillary FG. A voxelwise approach to determine consensus regions-of-interest for the study of brain network plasticity. Front Neuroanat 2015; 9:97. [PMID: 26283928 PMCID: PMC4517380 DOI: 10.3389/fnana.2015.00097] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2015] [Accepted: 07/07/2015] [Indexed: 11/13/2022] Open
Abstract
Despite exciting advances in the functional imaging of the brain, it remains a challenge to define regions of interest (ROIs) that do not require investigator supervision and permit examination of change in networks over time (or plasticity). Plasticity is most readily examined by maintaining ROIs constant via seed-based and anatomical-atlas based techniques, but these approaches are not data-driven, requiring definition based on prior experience (e.g., choice of seed-region, anatomical landmarks). These approaches are limiting especially when functional connectivity may evolve over time in areas that are finer than known anatomical landmarks or in areas outside predetermined seeded regions. An ideal method would permit investigators to study network plasticity due to learning, maturation effects, or clinical recovery via multiple time point data that can be compared to one another in the same ROI while also preserving the voxel-level data in those ROIs at each time point. Data-driven approaches (e.g., whole-brain voxelwise approaches) ameliorate concerns regarding investigator bias, but the fundamental problem of comparing the results between distinct data sets remains. In this paper we propose an approach, aggregate-initialized label propagation (AILP), which allows for data at separate time points to be compared for examining developmental processes resulting in network change (plasticity). To do so, we use a whole-brain modularity approach to parcellate the brain into anatomically constrained functional modules at separate time points and then apply the AILP algorithm to form a consensus set of ROIs for examining change over time. To demonstrate its utility, we make use of a known dataset of individuals with traumatic brain injury sampled at two time points during the first year of recovery and show how the AILP procedure can be applied to select regions of interest to be used in a graph theoretical analysis of plasticity.
Collapse
Affiliation(s)
- Sarah M Rajtmajer
- Department of Mathematics, The Pennsylvania State University University Park, PA, USA
| | - Arnab Roy
- Department of Psychology, The Pennsylvania State University University Park, PA, USA
| | - Reka Albert
- Department of Physics, The Pennsylvania State University University Park, PA, USA
| | - Peter C M Molenaar
- Department of Human Development and Family Studies, The Pennsylvania State University University Park, PA, USA
| | - Frank G Hillary
- Department of Psychology, The Pennsylvania State University University Park, PA, USA ; Department of Neurology, Penn State Milton S. Hershey Medical Center Hershey, PA, USA
| |
Collapse
|
483
|
Zignani M, Quadri C, Gaito S, Rossi GP. Calling, texting, and moving: multidimensional interactions of mobile phone users. COMPUTATIONAL SOCIAL NETWORKS 2015. [DOI: 10.1186/s40649-015-0020-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
484
|
Kianian S, Khayyambashi MR, Movahhedinia N. Semantic community detection using label propagation algorithm. J Inf Sci 2015. [DOI: 10.1177/0165551515592599] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The issue of detecting large communities in online social networks is the subject of a wide range of studies in order to explore the network sub-structure. Most of the existing studies are concerned with network topology with no emphasis on active communities among the large online social networks and social portals, which are not based on network topology like forums. Here, new semantic community detection is proposed by focusing on user attributes instead of network topology. In the proposed approach, a network of user activities is established and weighted through semantic data. Furthermore, consistent extended label propagation algorithm is presented. Doing so, semantic representations of active communities are refined and labelled with user-generated tags that are available in web.2. The results show that the proposed semantic algorithm is able to significantly improve the modularity compared with three previously proposed algorithms.
Collapse
Affiliation(s)
- Sahar Kianian
- Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran
| | | | | |
Collapse
|
485
|
Aires T, Moalic Y, Serrao EA, Arnaud-Haond S. Hologenome theory supported by cooccurrence networks of species-specific bacterial communities in siphonous algae (Caulerpa). FEMS Microbiol Ecol 2015; 91:fiv067. [PMID: 26099965 DOI: 10.1093/femsec/fiv067] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/15/2015] [Indexed: 11/14/2022] Open
Abstract
The siphonous algae of the Caulerpa genus harbor internal microbial communities hypothesized to play important roles in development, defense and metabolic activities of the host. Here, we characterize the endophytic bacterial community of four Caulerpa taxa in the Mediterranean Sea, through 16S rRNA amplicon sequencing. Results reveal a striking alpha diversity of the bacterial communities, similar to levels found in sponges and coral holobionts. These comprise (1) a very small core community shared across all hosts (< 1% of the total community), (2) a variable portion (ca. 25%) shared by some Caulerpa taxa but not by all, which might represent environmentally acquired bacteria and (3) a large (>70%) species-specific fraction of the community, forming very specific clusters revealed by modularity in networks of cooccurrence, even in areas where distinct Caulerpa taxa occurred in sympatry. Indirect inferences based on sequence homology suggest that these communities may play an important role in the metabolism of their host, in particular on their ability to grow on anoxic sediment. These findings support the hologenome theory and the need for a holistic framework in ecological and evolutionary studies of these holobionts that frequently become invasive.
Collapse
Affiliation(s)
- Tania Aires
- CCMAR, Centre of Marine Sciences, University of Algarve, Gambelas, 8005-139 Faro, Portugal
| | - Yann Moalic
- IFREMER- Technopole de Brest-Iroise, BP 70, 29280 Plouzané, France UMR 6197-Laboratoire de Microbiologie des Environnements Extrêmes, Université de Bretagne Occidentale (UBO) Institut Universitaire Européen de la Mer (IUEM), CNRS, Plouzané, France
| | - Ester A Serrao
- CCMAR, Centre of Marine Sciences, University of Algarve, Gambelas, 8005-139 Faro, Portugal
| | - Sophie Arnaud-Haond
- CCMAR, Centre of Marine Sciences, University of Algarve, Gambelas, 8005-139 Faro, Portugal UMR MARBEC (Marine Biodiversity, Exploitation and Conservation) Bd Jean Monnet, BP 171, 34203 Sète Cedex - France
| |
Collapse
|
486
|
Brutz M, Meyer FG. A flexible multiscale approach to overlapping community detection. SOCIAL NETWORK ANALYSIS AND MINING 2015. [DOI: 10.1007/s13278-015-0259-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
487
|
Zhou Y, Wang J, Luo N, Zhang Z. Multiobjective local search for community detection in networks. Soft comput 2015. [DOI: 10.1007/s00500-015-1706-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
488
|
Wu P, Pan L. Multi-objective community detection based on memetic algorithm. PLoS One 2015; 10:e0126845. [PMID: 25932646 PMCID: PMC4416909 DOI: 10.1371/journal.pone.0126845] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2014] [Accepted: 04/08/2015] [Indexed: 11/19/2022] Open
Abstract
Community detection has drawn a lot of attention as it can provide invaluable help in understanding the function and visualizing the structure of networks. Since single objective optimization methods have intrinsic drawbacks to identifying multiple significant community structures, some methods formulate the community detection as multi-objective problems and adopt population-based evolutionary algorithms to obtain multiple community structures. Evolutionary algorithms have strong global search ability, but have difficulty in locating local optima efficiently. In this study, in order to identify multiple significant community structures more effectively, a multi-objective memetic algorithm for community detection is proposed by combining multi-objective evolutionary algorithm with a local search procedure. The local search procedure is designed by addressing three issues. Firstly, nondominated solutions generated by evolutionary operations and solutions in dominant population are set as initial individuals for local search procedure. Then, a new direction vector named as pseudonormal vector is proposed to integrate two objective functions together to form a fitness function. Finally, a network specific local search strategy based on label propagation rule is expanded to search the local optimal solutions efficiently. The extensive experiments on both artificial and real-world networks evaluate the proposed method from three aspects. Firstly, experiments on influence of local search procedure demonstrate that the local search procedure can speed up the convergence to better partitions and make the algorithm more stable. Secondly, comparisons with a set of classic community detection methods illustrate the proposed method can find single partitions effectively. Finally, the method is applied to identify hierarchical structures of networks which are beneficial for analyzing networks in multi-resolution levels.
Collapse
Affiliation(s)
- Peng Wu
- School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
- National Engineering Laboratory for Information Content Analysis Technology, Shanghai Jiao Tong University, Shanghai, China
| | - Li Pan
- School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
- National Engineering Laboratory for Information Content Analysis Technology, Shanghai Jiao Tong University, Shanghai, China
- * E-mail:
| |
Collapse
|
489
|
|
490
|
Jarukasemratana S, Murata T. Edge Weight Method for Community Detection on Mixed Scale-Free Networks. INT J ARTIF INTELL T 2015. [DOI: 10.1142/s0218213015400072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In this paper, we proposed an edge weight method for performing a community detection on mixed scale-free networks.We use the phrase “mixed scale-free networks” for networks where some communities have node degree that follows a power law similar to scale-free networks, while some have node degree that follows normal distribution. In this type of network, community detection algorithms that are designed for scale-free networks will have reduced accuracy because some communities do not have scale-free properties. On the other hand, algorithms that are not designed for scale-free networks will also have reduced accuracy because some communities have scale-free properties. To solve this problem, our algorithm consists of two community detection steps; one is aimed at extracting communities whose node degree follows power law distribution (scale-free), while the other one is aimed at extracting communities whose node degree follows normal distribution (non scale-free). To evaluate our method, we use NMI — Normalized Mutual Information — to measure our results on both synthetic and real-world datasets comparing with both scale-free and non scale-free community detection methods. The results show that our method outperforms all other based line methods on mixed scale-free networks.
Collapse
Affiliation(s)
| | - Tsuyoshi Murata
- Tokyo Institute of Technology, W8-59 2-12-1 Ookayama, Meguro-ku, Tokyo, Japan
| |
Collapse
|
491
|
A cellular learning automata based algorithm for detecting community structure in complex networks. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.04.087] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
492
|
Li S, Lou H, Jiang W, Tang J. Detecting community structure via synchronous label propagation. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2014.04.084] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
493
|
Sahadevan S, Tholen E, Große-Brinkhaus C, Schellander K, Tesfaye D, Hofmann-Apitius M, Cinar MU, Gunawan A, Hölker M, Neuhoff C. Identification of gene co-expression clusters in liver tissues from multiple porcine populations with high and low backfat androstenone phenotype. BMC Genet 2015; 16:21. [PMID: 25884519 PMCID: PMC4365963 DOI: 10.1186/s12863-014-0158-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 12/18/2014] [Indexed: 11/26/2022] Open
Abstract
Background Boar taint is principally caused by accumulation of androstenone and skatole in adipose tissues. Studies have shown high heritability estimates for androstenone whereas skatole production is mainly dependent on nutritional factors. Androstenone is a lipophilic steroid mainly metabolized in liver. Majority of the studies on hepatic androstenone metabolism focus only on a single breed and very few studies account for population similarities/differences in gene expression patterns. In this work, we concentrated on population similarities in gene expression to identify the common genes involved in hepatic androstenone metabolism of multiple pig populations. Based on androstenone measurements, publicly available gene expression datasets from three porcine populations were compiled into either low or high androstenone dataset. Gene expression correlation coefficients from these datasets were converted to rank ratios and joint probabilities of these rank ratios were used to generate dataset specific co-expression clusters. Finally, these networks were clustered using a graph clustering technique. Results Cluster analysis identified a number of statistically significant co-expression clusters in the dataset. Further enrichment analysis of these clusters showed that one of the clusters from low androstenone dataset was highly enriched for xenobiotic, drug, cholesterol and lipid metabolism and cytochrome P450 associated metabolism of drugs and xenobiotics. Literature references revealed that a number of genes in this cluster were involved in phase I and phase II metabolism. Physical and functional similarity assessment showed that the members of this cluster were dispersed across multiple clusters in high androstenone dataset, possibly indicating a weak co-expression of these genes in high androstenone dataset. Conclusions Based on these results we hypothesize that majority of the genes in this cluster forms a signature co-expression cluster in low androstenone dataset in our experiment and that majority of the members of this cluster might be responsible for hepatic androstenone metabolism across all the three populations used in our study. We propose these results as a background work towards understanding breed similarities in hepatic androstenone metabolism. Additional large scale experiments using data from multiple porcine breeds are necessary to validate these findings. Electronic supplementary material The online version of this article (doi:10.1186/s12863-014-0158-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sudeep Sahadevan
- Institute of Animal Science, University of Bonn, Endenicher Alle, Bonn, 53115, Germany. .,Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53754, Germany.
| | - Ernst Tholen
- Institute of Animal Science, University of Bonn, Endenicher Alle, Bonn, 53115, Germany.
| | | | - Karl Schellander
- Institute of Animal Science, University of Bonn, Endenicher Alle, Bonn, 53115, Germany.
| | - Dawit Tesfaye
- Institute of Animal Science, University of Bonn, Endenicher Alle, Bonn, 53115, Germany.
| | - Martin Hofmann-Apitius
- Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Schloss Birlinghoven, Sankt Augustin, 53754, Germany.
| | - Mehmet Ulas Cinar
- Department of Animal Science, Faculty of Agriculture, Erciyes University, Kayseri, Turkey.
| | - Asep Gunawan
- Department of Animal Production and Technology, Bogor Agricultural University, Bogor, Indonesia.
| | - Michael Hölker
- Institute of Animal Science, University of Bonn, Endenicher Alle, Bonn, 53115, Germany.
| | - Christiane Neuhoff
- Institute of Animal Science, University of Bonn, Endenicher Alle, Bonn, 53115, Germany.
| |
Collapse
|
494
|
Jin D, Gabrys B, Dang J. Combined node and link partitions method for finding overlapping communities in complex networks. Sci Rep 2015; 5:8600. [PMID: 25715829 PMCID: PMC4341207 DOI: 10.1038/srep08600] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2014] [Accepted: 01/28/2015] [Indexed: 11/09/2022] Open
Abstract
Community detection in complex networks is a fundamental data analysis task in various domains, and how to effectively find overlapping communities in real applications is still a challenge. In this work, we propose a new unified model and method for finding the best overlapping communities on the basis of the associated node and link partitions derived from the same framework. Specifically, we first describe a unified model that accommodates node and link communities (partitions) together, and then present a nonnegative matrix factorization method to learn the parameters of the model. Thereafter, we infer the overlapping communities based on the derived node and link communities, i.e., determine each overlapped community between the corresponding node and link community with a greedy optimization of a local community function conductance. Finally, we introduce a model selection method based on consensus clustering to determine the number of communities. We have evaluated our method on both synthetic and real-world networks with ground-truths, and compared it with seven state-of-the-art methods. The experimental results demonstrate the superior performance of our method over the competing ones in detecting overlapping communities for all analysed data sets. Improved performance is particularly pronounced in cases of more complicated networked community structures.
Collapse
Affiliation(s)
- Di Jin
- School of Computer Science and Technology, Tianjin University, Tianjin 300073, P. R. China
| | - Bogdan Gabrys
- Data Science Institute, Faculty of Science and Technology, Bournemouth University, Poole, Dorset BH12 5BB, UK
| | - Jianwu Dang
- 1] School of Computer Science and Technology, Tianjin University, Tianjin 300073, P. R. China [2] School of Information Science, Japan Advanced Institute of Science and Technology, Japan
| |
Collapse
|
495
|
Characterization of protein complexes and subcomplexes in protein-protein interaction databases. Biochem Res Int 2015; 2015:245075. [PMID: 25722891 PMCID: PMC4334629 DOI: 10.1155/2015/245075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 01/05/2015] [Accepted: 01/06/2015] [Indexed: 12/24/2022] Open
Abstract
The identification and characterization of protein complexes implicated in protein-protein interaction data are crucial to the understanding of the molecular events under normal and abnormal physiological conditions. This paper provides a novel characterization of subcomplexes in protein interaction databases, stressing definition and representation issues, quantification, biological validation, network metrics, motifs, modularity, and gene ontology (GO) terms. The paper introduces the concept of "nested group" as a way to represent subcomplexes and estimates that around 15% of those nested group with the higher Jaccard index may be a result of data artifacts in protein interaction databases, while a number of them can be found in biologically important modular structures or dynamic structures. We also found that network centralities, enrichment in essential proteins, GO terms related to regulation, imperfect 5-clique motifs, and higher GO homogeneity can be used to identify proteins in nested complexes.
Collapse
|
496
|
|
497
|
Mora A, Sicari R, Cortigiani L, Carpeggiani C, Picano E, Capobianco E. Prognostic models in coronary artery disease: Cox and network approaches. ROYAL SOCIETY OPEN SCIENCE 2015; 2:140270. [PMID: 26064595 PMCID: PMC4448804 DOI: 10.1098/rsos.140270] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Accepted: 01/13/2015] [Indexed: 06/04/2023]
Abstract
Predictive assessment of the risk of developing cardiovascular diseases is usually provided by computational approaches centred on Cox models. The complex interdependence structure underlying clinical data patterns can limit the performance of Cox analysis and complicate the interpretation of results, thus calling for complementary and integrative methods. Prognostic models are proposed for studying the risk associated with patients with known or suspected coronary artery disease (CAD) undergoing vasodilator stress echocardiography, an established technique for CAD detection and prognostication. In order to complement standard Cox models, network inference is considered a possible solution to quantify the complex relationships between heterogeneous data categories. In particular, a mutual information network is designed to explore the paths linking patient-associated variables to endpoint events, to reveal prognostic factors and to identify the best possible predictors of death. Data from a prospective, multicentre, observational study are available from a previous study, based on 4313 patients (2532 men; 64±11 years) with known (n=1547) or suspected (n=2766) CAD, who underwent high-dose dipyridamole (0.84 mg kg(-1) over 6 min) stress echocardiography with coronary flow reserve (CFR) evaluation of left anterior descending (LAD) artery by Doppler. The overall mortality was the only endpoint analysed by Cox models. The estimated connectivity between clinical variables assigns a complementary value to the proposed network approach in relation to the established Cox model, for instance revealing connectivity paths. Depending on the use of multiple metrics, the constraints of regression analysis in measuring the association strength among clinical variables can be relaxed, and identification of communities and prognostic paths can be provided. On the basis of evidence from various model comparisons, we show in this CAD study that there may be characteristic factors involved in prognostic stratification whose complexity suggests an exploration beyond the analysis provided by the still fundamental Cox approach.
Collapse
Affiliation(s)
- Antonio Mora
- Institute of Clinical Physiology, National Research Council, Pisa, Italy
- Laboratory of Integrative Systems Medicine (LISM), Institute of Clinical Physiology, National Research Council, Pisa, Italy
| | - Rosa Sicari
- Institute of Clinical Physiology, National Research Council, Pisa, Italy
| | | | - Clara Carpeggiani
- Institute of Clinical Physiology, National Research Council, Pisa, Italy
| | - Eugenio Picano
- Institute of Clinical Physiology, National Research Council, Pisa, Italy
| | - Enrico Capobianco
- Institute of Clinical Physiology, National Research Council, Pisa, Italy
- Laboratory of Integrative Systems Medicine (LISM), Institute of Clinical Physiology, National Research Council, Pisa, Italy
- Center for Computational Science, University of Miami, Coral Gables, FL 33146, USA
| |
Collapse
|
498
|
Zeng Y, Liu J. Community Detection from Signed Social Networks Using a Multi-objective Evolutionary Algorithm. PROCEEDINGS IN ADAPTATION, LEARNING AND OPTIMIZATION 2015. [DOI: 10.1007/978-3-319-13359-1_21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
499
|
Hassan EA, Hafez AI, Hassanien AE, Fahmy AA. A Discrete Bat Algorithm for the Community Detection Problem. LECTURE NOTES IN COMPUTER SCIENCE 2015. [DOI: 10.1007/978-3-319-19644-2_16] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
500
|
Li HJ, Daniels JJ. Social significance of community structure: statistical view. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 91:012801. [PMID: 25679651 DOI: 10.1103/physreve.91.012801] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Indexed: 05/06/2023]
Abstract
Community structure analysis is a powerful tool for social networks that can simplify their topological and functional analysis considerably. However, since community detection methods have random factors and real social networks obtained from complex systems always contain error edges, evaluating the significance of a partitioned community structure is an urgent and important question. In this paper, integrating the specific characteristics of real society, we present a framework to analyze the significance of a social community. The dynamics of social interactions are modeled by identifying social leaders and corresponding hierarchical structures. Instead of a direct comparison with the average outcome of a random model, we compute the similarity of a given node with the leader by the number of common neighbors. To determine the membership vector, an efficient community detection algorithm is proposed based on the position of the nodes and their corresponding leaders. Then, using a log-likelihood score, the tightness of the community can be derived. Based on the distribution of community tightness, we establish a connection between p-value theory and network analysis, and then we obtain a significance measure of statistical form . Finally, the framework is applied to both benchmark networks and real social networks. Experimental results show that our work can be used in many fields, such as determining the optimal number of communities, analyzing the social significance of a given community, comparing the performance among various algorithms, etc.
Collapse
Affiliation(s)
- Hui-Jia Li
- School of Management Science and Engineering, Central University of Finance and Economics, Beijing 100080, China and Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| | - Jasmine J Daniels
- Department of Applied Physics, Stanford University, Stanford, California 94305, USA
| |
Collapse
|