1
|
Ciba M, Petzold M, Alves CL, Rodrigues FA, Jimbo Y, Thielemann C. Machine learning and complex network analysis of drug effects on neuronal microelectrode biosensor data. Sci Rep 2025; 15:15128. [PMID: 40301534 PMCID: PMC12041479 DOI: 10.1038/s41598-025-99479-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2025] [Accepted: 04/21/2025] [Indexed: 05/01/2025] Open
Abstract
Biosensors, such as microelectrode arrays that record in vitro neuronal activity, provide powerful platforms for studying neuroactive substances. This study presents a machine learning workflow to analyze drug-induced changes in neuronal biosensor data using complex network measures from graph theory. Microelectrode array recordings of neuronal networks exposed to bicuculline, a GABA[Formula: see text] receptor antagonist known to induce hypersynchrony, demonstrated the workflow's ability to detect and characterize pharmacological effects. The workflow integrates network-based features with synchrony, optimizing preprocessing parameters, including spike train bin sizes, segmentation window sizes, and correlation methods. It achieved high classification accuracy (AUC up to 90%) and used Shapley Additive Explanations to interpret feature importance rankings. Significant reductions in network complexity and segregation, hallmarks of epileptiform activity induced by bicuculline, were revealed. While bicuculline's effects are well established, this framework is designed to be broadly applicable for detecting both strong and subtle network alterations induced by neuroactive compounds. The results demonstrate the potential of this methodology for advancing biosensor applications in neuropharmacology and drug discovery.
Collapse
Affiliation(s)
- Manuel Ciba
- BioMEMS Lab, Aschaffenburg University of Applied Sciences, Aschaffenburg, Germany
| | - Marc Petzold
- BioMEMS Lab, Aschaffenburg University of Applied Sciences, Aschaffenburg, Germany
| | - Caroline L Alves
- BioMEMS Lab, Aschaffenburg University of Applied Sciences, Aschaffenburg, Germany.
| | - Francisco A Rodrigues
- Institute of Mathematical and Computer Sciences (ICMC), University of São Paulo (USP), São Paulo, Brazil
| | - Yasuhiko Jimbo
- Department of Human and Engineered Environmental Studies, The University of Tokyo, Tokyo, Japan
| | | |
Collapse
|
2
|
Lakshman AH, Wright ES. EvoWeaver: large-scale prediction of gene functional associations from coevolutionary signals. Nat Commun 2025; 16:3878. [PMID: 40274827 PMCID: PMC12022180 DOI: 10.1038/s41467-025-59175-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2025] [Accepted: 04/09/2025] [Indexed: 04/26/2025] Open
Abstract
The known universe of uncharacterized proteins is expanding far faster than our ability to annotate their functions through laboratory study. Computational annotation approaches rely on similarity to previously studied proteins, thereby ignoring unstudied proteins. Coevolutionary approaches hold promise for injecting new information into our knowledge of the protein universe by linking proteins through 'guilt-by-association'. However, existing coevolutionary algorithms have insufficient accuracy and scalability to connect the entire universe of proteins. We present EvoWeaver, a method that weaves together 12 signals of coevolution to quantify the degree of shared evolution between genes. EvoWeaver accurately identifies proteins involved in protein complexes or separate steps of a biochemical pathway. We show the merits of EvoWeaver by partly reconstructing known biochemical pathways without any prior knowledge other than that available from genomic sequences. Applying EvoWeaver to 1545 gene groups from 8564 genomes reveals missing connections in popular databases and potentially undiscovered links between proteins.
Collapse
Affiliation(s)
- Aidan H Lakshman
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Erik S Wright
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for Evolutionary Biology and Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
3
|
Xie L, Cao B, Wen X, Zheng Y, Wang B, Zhou S, Zheng P. ReLume: Enhancing DNA storage data reconstruction with flow network and graph partitioning. Methods 2025; 240:101-112. [PMID: 40268154 DOI: 10.1016/j.ymeth.2025.03.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2025] [Revised: 03/06/2025] [Accepted: 03/31/2025] [Indexed: 04/25/2025] Open
Abstract
DNA storage is an ideal alternative to silicon-based storage, but focusing on data writing alone cannot address the inevitable errors and durability issues. Therefore, we propose ReLume, a DNA storage data reconstruction method based on flow networks and graph partitioning technology, which can accomplish the data reconstruction task of millions of reads on a laptop with 24 GB RAM. The results show that ReLume copes well with many types of errors, more than doubles sequence recovery rates, and reduces memory usage by about 60 %. ReLume is 10 times more durable than other representative methods, meaning that data can be read without loss after 100 years. Results from the wet lab DNA storage dataset show that ReLume's sequence recovery rates of 73 % and 93.2 %, respectively, significantly outperform existing methods. In summary, ReLume effectively overcomes the accuracy and hardware limitations and provides a feasible idea for the portability of DNA storage.
Collapse
Affiliation(s)
- Lei Xie
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China
| | - Ben Cao
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, PR China
| | - Xiaoru Wen
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China
| | - Yanfen Zheng
- School of Computer Science and Technology, Dalian University of Technology, 116024 Dalian, PR China
| | - Bin Wang
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China.
| | - Shihua Zhou
- Key Laboratory of Advanced Design and Intelligent Computing, Ministry of Education, School of Software Engineering, Dalian University, Dalian 116622, PR China.
| | - Pan Zheng
- Department of Accounting and Information Systems, University of Canterbury, 8140 Christchurch, New Zealand
| |
Collapse
|
4
|
Taiti C, Vivaldo G, Mancuso S, Comparini D, Pandolfi C. Volatile organic compounds (VOCs) fingerprinting combined with complex network analysis as a forecasting tool for tracing the origin and genetic lineage of Arabica specialty coffees. Sci Rep 2025; 15:13709. [PMID: 40258936 PMCID: PMC12012085 DOI: 10.1038/s41598-025-97162-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2024] [Accepted: 04/02/2025] [Indexed: 04/23/2025] Open
Abstract
Due to the globalization of coffee trade, ensuring the safety and traceability of coffee has become a critical challenge, prompting global authorities to implement new traceability systems to enhance quality identification and protect consumers from fraud. Aroma is a crucial parameter in the evaluation and differentiation of coffees, influenced by factors such as genetics, origin, post harvesting process, roast level, and brewing method. This paper provides, for the first time, a comprehensive overview of the volatile fingerprints of specialty coffees, categorized by their respective quality levels. In particular, this study aimed to evaluate the potential of volatile compounds monitored through Proton Transfer Reaction-Time of Flight-Mass Spectrometry (PTR-ToF-MS) as objective, fast, reliable and repeatable tool for tracking the quality and genetic lineage of Arabica specialty coffees. The spectra of volatile organic compounds (VOCs) were acquired from 1132 coffee samples (both specialty and non-specialty) from various varieties, origins, and processing methods. Results clearly indicate that the volatile composition of specialty coffee is predominantly influenced by its genetic lineage. Arabica coffee species belonging to Bourbon, Typica, and Ethiopian landraces showed higher total VOCs emission, while varieties related to Robusta, which are related to the Canephora one, emit less. Finally, by employing a complex network analysis approach based on headspace VOC analysis, it was possible to accurately distinguish between different categories of specialty Arabica coffee. Notably, our analysis shows that the quality of specialty coffee is not linked to the number of VOCs emitted, but rather to the level emission of some pleasant aroma compounds. These findings open new perspectives for the development of aroma profiling techniques and demonstrate the unique aroma release characteristics of specialty coffees.
Collapse
Affiliation(s)
- Cosimo Taiti
- Department of Agriculture, Food, Environment and Forestry (DAGRI), University of Florence, Viale Delle Idee 30, 50019, Sesto Fiorentino, Firenze, Italy
| | - Gianna Vivaldo
- Institute of Geosciences and Earth Resources (IGG), National Research Council of Italy (CNR), Via Moruzzi 1, 56124, Pisa, Italy.
- National Biodiversity Future Center, Piazza Marina, 61, 90133, Palermo, Italy.
| | - Stefano Mancuso
- Department of Agriculture, Food, Environment and Forestry (DAGRI), University of Florence, Viale Delle Idee 30, 50019, Sesto Fiorentino, Firenze, Italy
- Fondazione per il Futuro delle Città, Via Boccaccio 50, 50133, Firenze, Italy
| | - Diego Comparini
- Department of Agriculture, Food, Environment and Forestry (DAGRI), University of Florence, Viale Delle Idee 30, 50019, Sesto Fiorentino, Firenze, Italy
| | - Camilla Pandolfi
- Department of Agriculture, Food, Environment and Forestry (DAGRI), University of Florence, Viale Delle Idee 30, 50019, Sesto Fiorentino, Firenze, Italy
| |
Collapse
|
5
|
Castanho I, Yeganeh PN, Boix CA, Morgan SL, Mathys H, Prokopenko D, White B, Soto LM, Pegoraro G, Shah S, Ploumakis A, Kalavros N, Bennett DA, Lange C, Kim DY, Bertram L, Tsai LH, Kellis M, Tanzi RE, Hide W. Molecular hallmarks of excitatory and inhibitory neuronal resilience and resistance to Alzheimer's disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.13.632801. [PMID: 39868232 PMCID: PMC11761133 DOI: 10.1101/2025.01.13.632801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Background A significant proportion of individuals maintain healthy cognitive function despite having extensive Alzheimer's disease (AD) pathology, known as cognitive resilience. Understanding the molecular mechanisms that protect these individuals can identify therapeutic targets for AD dementia. This study aims to define molecular and cellular signatures of cognitive resilience, protection and resistance, by integrating genetics, bulk RNA, and single-nucleus RNA sequencing data across multiple brain regions from AD, resilient, and control individuals. Methods We analyzed data from the Religious Order Study and the Rush Memory and Aging Project (ROSMAP), including bulk (n=631) and multi-regional single nucleus (n=48) RNA sequencing. Subjects were categorized into AD, resilient, and control based on β-amyloid and tau pathology, and cognitive status. We identified and prioritized protected cell populations using whole genome sequencing-derived genetic variants, transcriptomic profiling, and cellular composition distribution. Results Transcriptomic results, supported by GWAS-derived polygenic risk scores, place cognitive resilience as an intermediate state in the AD continuum. Tissue-level analysis revealed 43 genes enriched in nucleic acid metabolism and signaling that were differentially expressed between AD and resilience. Only GFAP (upregulated) and KLF4 (downregulated) showed differential expression in resilience compared to controls. Cellular resilience involved reorganization of protein folding and degradation pathways, with downregulation of Hsp90 and selective upregulation of Hsp40, Hsp70, and Hsp110 families in excitatory neurons. Excitatory neuronal subpopulations in the entorhinal cortex (ATP8B1+ and MEF2Chigh) exhibited unique resilience signaling through neurotrophin (modulated by LINGO1) and angiopoietin (ANGPT2/TEK) pathways. We identified MEF2C, ATP8B1, and RELN as key markers of resilient excitatory neuronal populations, characterized by selective vulnerability in AD. Protective rare variant enrichment highlighted vulnerable populations, including somatostatin (SST) inhibitory interneurons, validated through immunofluorescence showing co-expression of rare variant associated RBFOX1 and KIF26B in SST+ neurons in the dorsolateral prefrontal cortex. The maintenance of excitatory-inhibitory balance emerges as a key characteristic of resilience. Conclusions We identified molecular and cellular hallmarks of cognitive resilience, an intermediate state in the AD continuum. Resilience mechanisms include preservation of neuronal function, maintenance of excitatory/inhibitory balance, and activation of protective signaling pathways. Specific excitatory neuronal populations appear to play a central role in mediating cognitive resilience, while a subset of vulnerable SST interneurons likely provide compensation against AD-associated dysregulation. This study offers a framework to leverage natural protective mechanisms to mitigate neurodegeneration and preserve cognition in AD.
Collapse
Affiliation(s)
- Isabel Castanho
- Harvard Medical School, Boston, MA, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Pourya Naderi Yeganeh
- Harvard Medical School, Boston, MA, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Carles A. Boix
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Sarah L. Morgan
- Harvard Medical School, Boston, MA, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Centre for Neuroscience, Surgery and Trauma, Blizard Institute, Queen Mary University of London, London E1 2AT, UK
| | - Hansruedi Mathys
- University of Pittsburgh Brain Institute, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
- Picower Institute for Learning and Memory, MIT, Cambridge, MA 02139, USA
| | - Dmitry Prokopenko
- Harvard Medical School, Boston, MA, USA
- Genetics and Aging Research Unit, The Henry and Allison McCance Center for Brain Health, Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Bartholomew White
- Harvard Medical School, Boston, MA, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Larisa M. Soto
- Harvard Medical School, Boston, MA, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Giulia Pegoraro
- Harvard Medical School, Boston, MA, USA
- Medical School, University of Exeter, Exeter EX2 5DW, UK
| | | | - Athanasios Ploumakis
- Harvard Medical School, Boston, MA, USA
- Spatial Technologies Unit, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Nikolas Kalavros
- Harvard Medical School, Boston, MA, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, 1750 W Harrison Street, Suite 1000, Chicago, IL, 60612, USA
| | - Christoph Lange
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, 02115, Boston, MA, USA
| | - Doo Yeon Kim
- Harvard Medical School, Boston, MA, USA
- Genetics and Aging Research Unit, The Henry and Allison McCance Center for Brain Health, Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Lars Bertram
- Lübeck Interdisciplinary Platform for Genome Analytics, Institutes of Neurogenetics and Cardiogenetics, University of Lübeck, Lübeck, Germany
- Department of Psychology, University of Oslo, Oslo, Norway
| | - Li-Huei Tsai
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Picower Institute for Learning and Memory, MIT, Cambridge, MA 02139, USA
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA 02139, USA
| | - Manolis Kellis
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Rudolph E. Tanzi
- Harvard Medical School, Boston, MA, USA
- Genetics and Aging Research Unit, The Henry and Allison McCance Center for Brain Health, Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Winston Hide
- Harvard Medical School, Boston, MA, USA
- Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| |
Collapse
|
6
|
Passarelli-Araujo H, Venancio TM, Hanage WP. Relating ecological diversity to genetic discontinuity across bacterial species. Genome Biol 2025; 26:8. [PMID: 39794865 PMCID: PMC11720962 DOI: 10.1186/s13059-024-03443-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 11/21/2024] [Indexed: 01/13/2025] Open
Abstract
BACKGROUND Genetic discontinuity represents abrupt breaks in genomic identity among species. Advances in genome sequencing have enhanced our ability to track and characterize genetic discontinuity in bacterial populations. However, exploring the degree to which bacterial diversity exists as a continuum or sorted into discrete and readily defined species remains a challenge in microbial ecology. Here, we aim to quantify the genetic discontinuity ( δ ) and investigate how this metric is related to ecology. RESULTS We harness a dataset comprising 210,129 genomes to systematically explore genetic discontinuity patterns across several distantly related species, finding clear breakpoints which vary depending on the taxa in question. By delving into pangenome characteristics, we uncover a significant association between pangenome saturation and genetic discontinuity. Closed pangenomes are associated with more pronounced breaks, exemplified by Mycobacterium tuberculosis. Additionally, through a machine learning approach, we detect key features such as gene conservation patterns and functional annotations that significantly impact genetic discontinuity prediction. CONCLUSIONS Our study clarifies bacterial genetic patterns and their ecological impacts, enhancing the delineation of species boundaries and deepening our understanding of microbial diversity.
Collapse
Affiliation(s)
- Hemanoel Passarelli-Araujo
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA.
- Departamento de Bioquímica E Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil.
| | - Thiago M Venancio
- Laboratório de Química e Função de Proteínas e Peptídeos, Centro de Biociências e Biotecnologia, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos Dos Goytacazes, RJ, Brazil.
| | - William P Hanage
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
7
|
Ruan Z, Lin F, Zhang Z, Cao J, Xiang W, Wei X, Liu J. Pairpot: a database with real-time lasso-based analysis tailored for paired single-cell and spatial transcriptomics. Nucleic Acids Res 2025; 53:D1087-D1098. [PMID: 39494542 PMCID: PMC11701735 DOI: 10.1093/nar/gkae986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/10/2024] [Accepted: 10/15/2024] [Indexed: 11/05/2024] Open
Abstract
Paired single-cell and spatially resolved transcriptomics (SRT) data supplement each other, providing in-depth insights into biological processes and disease mechanisms. Previous SRT databases have limitations in curating sufficient single-cell and SRT pairs (SC-SP pairs) and providing real-time heuristic analysis, which hinder the effort to uncover potential biological insights. Here, we developed Pairpot (http://pairpot.bioxai.cn), a database tailored for paired single-cell and SRT data with real-time heuristic analysis. Pairpot curates 99 high-quality pairs including 1,425,656 spots from 299 datasets, and creates the association networks. It constructs the curated pairs by integrating multiple slices and establishing potential associations between single-cell and SRT data. On this basis, Pairpot adopts semi-supervised learning that enables real-time heuristic analysis for SC-SP pairs where Lasso-View refines the user-selected SRT domains within milliseconds, Pair-View infers cell proportions of spots based on user-selected cell types in real-time and Layer-View displays SRT slices using a 3D hierarchical layout. Experiments demonstrated Pairpot's efficiency in identifying heterogeneous domains and cell proportions.
Collapse
Affiliation(s)
- Zhihan Ruan
- Centre for Bioinformatics and Intelligent Medicine, College of Computer Science, Nankai University, No.38 Tongyan Road, 300350 Tianjin, China
| | - Fan Lin
- Centre for Bioinformatics and Intelligent Medicine, College of Computer Science, Nankai University, No.38 Tongyan Road, 300350 Tianjin, China
| | - Zhenjie Zhang
- Centre for Bioinformatics and Intelligent Medicine, College of Computer Science, Nankai University, No.38 Tongyan Road, 300350 Tianjin, China
| | - Jiayue Cao
- Centre for Bioinformatics and Intelligent Medicine, College of Computer Science, Nankai University, No.38 Tongyan Road, 300350 Tianjin, China
| | - Wenting Xiang
- Centre for Bioinformatics and Intelligent Medicine, College of Computer Science, Nankai University, No.38 Tongyan Road, 300350 Tianjin, China
| | - Xiaoyi Wei
- Fifth Affiliated Hospital of Sun Yat-sen University, No.52 East Meihua Road, 519000 Zhuhai, China
| | - Jian Liu
- State Key Laboratory of Medical Chemical Biology, College of Computer Science, Nankai University, No.38 Tongyan Road, 300350 Tianjin, China
| |
Collapse
|
8
|
Mascia E, Nale V, Ferrè L, Sorosina M, Clarelli F, Chiodi A, Santoro S, Giordano A, Misra K, Cannizzaro M, Moiola L, Martinelli V, Milanesi L, Filippi M, Mosca E, Esposito F. Genetic Contribution to Medium-Term Disease Activity in Multiple Sclerosis. Mol Neurobiol 2025; 62:322-334. [PMID: 38850349 DOI: 10.1007/s12035-024-04264-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 05/25/2024] [Indexed: 06/10/2024]
Abstract
Multiple sclerosis (MS) is a complex disorder characterized by high heterogeneity in terms of phenotypic expression, prognosis and treatment response. In the present study, we aimed to explore the genetic contribution to MS disease activity at different levels: genes, pathways and tissue-specific networks. Two cohorts of relapsing-remitting MS patients who started a first-line treatment (n = 1294) were enrolled to evaluate the genetic association with disease activity after 4 years of follow-up. The analyses were performed at whole-genome SNP and gene level, followed by the construction of gene-gene interaction networks specific for brain and lymphocytes. The resulting gene modules were evaluated to highlight key players from a topological and functional perspective. We identified 23 variants and 223 genes with suggestive association to 4-years disease activity, highlighting genes like PON2 involved in oxidative stress and in mitochondria functions and other genes, like ILRUN, involved in the modulation of the immune system. Network analyses led to the identification of a brain module composed of 228 genes and a lymphocytes module composed of 287 genes. The network analysis allowed us to prioritize genes relevant for their topological properties; among them, there are MPHOSPH9 (connector hub in both brain and lymphocyte module) and OPA1 (in brain module), two genes already implicated in MS. Modules showed the enrichment of both shared and tissue-specific pathways, mainly implicated in inflammation. In conclusion, our results suggest that the processes underlying disease activity act on shared mechanisms across brain and lymphocyte tissues.
Collapse
Affiliation(s)
- Elisabetta Mascia
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Valentina Nale
- Institute of Biomedical Technologies, National Research Council, Segrate, Italy
| | - Laura Ferrè
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Melissa Sorosina
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Ferdinando Clarelli
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Alice Chiodi
- Institute of Biomedical Technologies, National Research Council, Segrate, Italy
| | - Silvia Santoro
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Antonino Giordano
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Kaalindi Misra
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Miryam Cannizzaro
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Lucia Moiola
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Vittorio Martinelli
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Luciano Milanesi
- Institute of Biomedical Technologies, National Research Council, Segrate, Italy
| | - Massimo Filippi
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy
- Vita-Salute San Raffaele University, Milan, Italy
- Neurophysiology Service, IRCCS San Raffaele Scientific Institute, Milan, Italy
- Neuroimaging Research Unit, Institute of Experimental Neurology, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Ettore Mosca
- Institute of Biomedical Technologies, National Research Council, Segrate, Italy
| | - Federica Esposito
- Laboratory of Human Genetics of Neurological Disorders, Institute of Experimental Neurology, Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy.
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Scientific Institute, Milan, Italy.
| |
Collapse
|
9
|
He JK, Wallis FPS, Gvirtz A, Rathje S. Artificial intelligence chatbots mimic human collective behaviour. Br J Psychol 2024. [PMID: 39739553 DOI: 10.1111/bjop.12764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2024] [Accepted: 12/16/2024] [Indexed: 01/02/2025]
Abstract
Artificial Intelligence (AI) chatbots, such as ChatGPT, have been shown to mimic individual human behaviour in a wide range of psychological and economic tasks. Do groups of AI chatbots also mimic collective behaviour? If so, artificial societies of AI chatbots may aid social scientific research by simulating human collectives. To investigate this theoretical possibility, we focus on whether AI chatbots natively mimic one commonly observed collective behaviour: homophily, people's tendency to form communities with similar others. In a large simulated online society of AI chatbots powered by large language models (N = 33,299), we find that communities form over time around bots using a common language. In addition, among chatbots that predominantly use English (N = 17,746), communities emerge around bots that post similar content. These initial empirical findings suggest that AI chatbots mimic homophily, a key aspect of human collective behaviour. Thus, in addition to simulating individual human behaviour, AI-powered artificial societies may advance social science research by allowing researchers to simulate nuanced aspects of collective behaviour.
Collapse
Affiliation(s)
| | | | - Andrés Gvirtz
- Artificial Societies Ltd., London, UK
- King's College London, London, UK
| | | |
Collapse
|
10
|
Taguchi YH, Turki T. Novel artificial intelligence-based identification of drug-gene-disease interaction using protein-protein interaction. BMC Bioinformatics 2024; 25:377. [PMID: 39696005 PMCID: PMC11653834 DOI: 10.1186/s12859-024-06009-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Accepted: 12/05/2024] [Indexed: 12/20/2024] Open
Abstract
The evaluation of drug-gene-disease interactions is key for the identification of drugs effective against disease. However, at present, drugs that are effective against genes that are critical for disease are difficult to identify. Following a disease-centric approach, there is a need to identify genes critical to disease function and find drugs that are effective against them. By contrast, following a drug-centric approach comprises identifying the genes targeted by drugs, and then the diseases in which the identified genes are critical. Both of these processes are complex. Using a gene-centric approach, whereby we identify genes that are effective against the disease and can be targeted by drugs, is much easier. However, how such sets of genes can be identified without specifying either the target diseases or drugs is not known. In this study, a novel artificial intelligence-based approach that employs unsupervised methods and identifies genes without specifying neither diseases nor drugs is presented. To evaluate its feasibility, we applied tensor decomposition (TD)-based unsupervised feature extraction (FE) to perform drug repositioning from protein-protein interactions (PPI) without any other information. Proteins selected by TD-based unsupervised FE include many genes related to cancers, as well as drugs that target the selected proteins. Thus, we were able to identify cancer drugs using only PPI. Because the selected proteins had more interactions, we replaced the selected proteins with hub proteins and found that hub proteins themselves could be used for drug repositioning. In contrast to hub proteins, which can only identify cancer drugs, TD-based unsupervised FE enables the identification of drugs for other diseases. In addition, TD-based unsupervised FE can be used to identify drugs that are effective in in vivo experiments, which is difficult when hub proteins are used. In conclusion, TD-based unsupervised FE is a useful tool for drug repositioning using only PPI without other information.
Collapse
Affiliation(s)
- Y-H Taguchi
- Department of Physics, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan.
| | - Turki Turki
- Department of Computer Science, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
| |
Collapse
|
11
|
Li D, Mei Q, Li G. scQA: A dual-perspective cell type identification model for single cell transcriptome data. Comput Struct Biotechnol J 2024; 23:520-536. [PMID: 38235363 PMCID: PMC10791572 DOI: 10.1016/j.csbj.2023.12.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 12/16/2023] [Accepted: 12/18/2023] [Indexed: 01/19/2024] Open
Abstract
Single-cell RNA sequencing technologies have been pivotal in advancing the development of algorithms for clustering heterogeneous cell populations. Existing methods for utilizing scRNA-seq data to identify cell types tend to neglect the beneficial impact of dropout events and perform clustering focusing solely on quantitative perspective. Here, we introduce a novel method named scQA, notable for its ability to concurrently identify cell types and cell type-specific key genes from both qualitative and quantitative perspectives. In contrast to other methods, scQA not only identifies cell types but also extracts key genes associated with these cell types, enabling bidirectional clustering for scRNA-seq data. Through an iterative process, our approach aims to minimize the number of landmarks to approximately a dozen while maximizing the inclusion of quasi-trend-preserved genes with dropouts both qualitatively and quantitatively. It then clusters cells by employing an ingenious label propagation strategy, obviating the requirement for a predetermined number of cell types. Validated on 20 publicly available scRNA-seq datasets, scQA consistently outperforms other salient tools. Furthermore, we confirm the effectiveness and potential biological significance of the identified key genes through both external and internal validation. In conclusion, scQA emerges as a valuable tool for investigating cell heterogeneity due to its distinctive fusion of qualitative and quantitative facets, along with bidirectional clustering capabilities. Furthermore, it can be seamlessly integrated into border scRNA-seq analyses. The source codes are publicly available at https://github.com/LD-Lyndee/scQA.
Collapse
Affiliation(s)
- Di Li
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Qinglin Mei
- MOE Key Laboratory of Bioinformatics, BNRIST Bioinformatics Division, Department of Automation, Tsinghua University, Beijing 100084, China
| | - Guojun Li
- Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| |
Collapse
|
12
|
Daou L, Hanna EM. Predicting protein complexes in protein interaction networks using Mapper and graph convolution networks. Comput Struct Biotechnol J 2024; 23:3595-3609. [PMID: 39493503 PMCID: PMC11530816 DOI: 10.1016/j.csbj.2024.10.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 10/04/2024] [Accepted: 10/04/2024] [Indexed: 11/05/2024] Open
Abstract
Protein complexes are groups of interacting proteins that are central to multiple biological processes. Studying protein complexes can enhance our understanding of cellular functions and malfunctions and thus support the development of effective disease treatments. High-throughput experimental techniques allow the generation of large-scale protein-protein interaction datasets. Accordingly, various computational approaches to predict protein complexes from protein-protein interactions were presented in the literature. They are typically based on networks in which nodes and edges represent proteins and their interactions, respectively. State-of-the-art approaches mainly rely on clustering static networks to identify complexes. However, since protein interactions are highly dynamic in nature, recent approaches seek to model such dynamics by typically integrating gene expression data and identifying protein complexes accordingly. We propose MComplex, a method that utilizes time-series gene expression with interaction data to generate a temporal network which is passed to a generative adversarial network whose generator is a graph convolutional network. This creates embeddings which are then analyzed using a modified graph-based version of the Mapper algorithm to predict corresponding protein complexes. We test our approach on multiple benchmark datasets and compare identified complexes against gold-standard protein complex datasets. Our results show that MComplex outperforms existing methods in several evaluation aspects, namely recall and maximum matching ratio as well as a composite score covering aggregated evaluation measures. The code and data are available for free download from https://github.com/LeonardoDaou/MComplex.
Collapse
Affiliation(s)
- Leonardo Daou
- Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| | - Eileen Marie Hanna
- Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| |
Collapse
|
13
|
Almquist ZW, Nguyen TD, Sorensen M, Fu X, Sidiropoulos ND. Uncovering migration systems through spatio-temporal tensor co-clustering. Sci Rep 2024; 14:26861. [PMID: 39501001 PMCID: PMC11538304 DOI: 10.1038/s41598-024-78112-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Accepted: 10/28/2024] [Indexed: 11/08/2024] Open
Abstract
A central problem in the study of human mobility is that of migration systems. Typically, migration systems are defined as a set of relatively stable movements of people between two or more locations over time. While these emergent systems are expected to vary over time, they ideally contain a stable underlying structure that could be discovered empirically. There have been some notable attempts to formally or informally define migration systems. However, they have been limited by being hard to operationalize and defining migration systems in ways that ignore origin/destination aspects and fail to account for migration dynamics over time. In this work, we propose to employ spatio-temporal tensor co-clustering-that stems from signal processing and machine learning theory-as a novel migration system analysis tool. Tensor co-clustering is designed to cluster entities exhibiting similar patterns across multiple modalities and thus suits our purpose of analyzing spatial migration activities across time. To demonstrate its effectiveness in describing stable migration systems, we first focus on domestic migration between counties in the US from 1990 to 2018. We conduct three case studies on domestic migration, namely, (i) US Metropolitan Areas, (ii) the state of California, and (iii) Louisiana, in which the last focuses on detecting exogenous events such as Hurricane Katrina in 2005. In addition, we also examine a case study at a larger scale, using worldwide international migration data from 200 countries between 1990 and 2015. Finally, we conclude with a discussion of this approach and its limitations.
Collapse
Affiliation(s)
- Zack W Almquist
- Departments of Sociology and Statistics, University of Washington, Seattle, WA, 98195, USA.
| | - Tri Duc Nguyen
- Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, 97331, USA
| | - Mikael Sorensen
- Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22904, USA
| | - Xiao Fu
- Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, 97331, USA
| | - Nicholas D Sidiropoulos
- Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, 22904, USA
| |
Collapse
|
14
|
Kazantseva E, Donmez A, Frolova M, Pop M, Kolmogorov M. Strainy: phasing and assembly of strain haplotypes from long-read metagenome sequencing. Nat Methods 2024; 21:2034-2043. [PMID: 39327484 DOI: 10.1038/s41592-024-02424-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 08/22/2024] [Indexed: 09/28/2024]
Abstract
Bacterial species in microbial communities are often represented by mixtures of strains, distinguished by small variations in their genomes. Short-read approaches can be used to detect small-scale variation between strains but fail to phase these variants into contiguous haplotypes. Long-read metagenome assemblers can generate contiguous bacterial chromosomes but often suppress strain-level variation in favor of species-level consensus. Here we present Strainy, an algorithm for strain-level metagenome assembly and phasing from Nanopore and PacBio reads. Strainy takes a de novo metagenomic assembly as input and identifies strain variants, which are then phased and assembled into contiguous haplotypes. Using simulated and mock Nanopore and PacBio metagenome data, we show that Strainy assembles accurate and complete strain haplotypes, outperforming current Nanopore-based methods and comparable with PacBio-based algorithms in completeness and accuracy. We then use Strainy to assemble strain haplotypes of a complex environmental metagenome, revealing distinct strain distribution and mutational patterns in bacterial species.
Collapse
Affiliation(s)
- Ekaterina Kazantseva
- Bioinformatics and Systems Biology Program, ITMO University, St. Petersburg, Russia
| | - Ataberk Donmez
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Maria Frolova
- Functional Genomics of Prokaryotes Laboratory, Institute of Cell Biophysics, RAS, Pushchino, Russia
| | - Mihai Pop
- Department of Computer Science, University of Maryland, College Park, MD, USA.
| | - Mikhail Kolmogorov
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
15
|
Alves CL, Martinelli T, Sallum LF, Rodrigues FA, Toutain TGLDO, Porto JAM, Thielemann C, Aguiar PMDC, Moeckel M. Multiclass classification of Autism Spectrum Disorder, attention deficit hyperactivity disorder, and typically developed individuals using fMRI functional connectivity analysis. PLoS One 2024; 19:e0305630. [PMID: 39418298 PMCID: PMC11486369 DOI: 10.1371/journal.pone.0305630] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 06/03/2024] [Indexed: 10/19/2024] Open
Abstract
Neurodevelopmental conditions, such as Autism Spectrum Disorder (ASD) and Attention Deficit Hyperactivity Disorder (ADHD), present unique challenges due to overlapping symptoms, making an accurate diagnosis and targeted intervention difficult. Our study employs advanced machine learning techniques to analyze functional magnetic resonance imaging (fMRI) data from individuals with ASD, ADHD, and typically developed (TD) controls, totaling 120 subjects in the study. Leveraging multiclass classification (ML) algorithms, we achieve superior accuracy in distinguishing between ASD, ADHD, and TD groups, surpassing existing benchmarks with an area under the ROC curve near 98%. Our analysis reveals distinct neural signatures associated with ASD and ADHD: individuals with ADHD exhibit altered connectivity patterns of regions involved in attention and impulse control, whereas those with ASD show disruptions in brain regions critical for social and cognitive functions. The observed connectivity patterns, on which the ML classification rests, agree with established diagnostic approaches based on clinical symptoms. Furthermore, complex network analyses highlight differences in brain network integration and segregation among the three groups. Our findings pave the way for refined, ML-enhanced diagnostics in accordance with established practices, offering a promising avenue for developing trustworthy clinical decision-support systems.
Collapse
Affiliation(s)
- Caroline L. Alves
- Laboratory for Hybrid Modeling, Aschaffenburg University of Applied Sciences, Aschaffenburg, Bayern, Germany
| | - Tiago Martinelli
- Institute of Mathematical and Computer Sciences, University of São Paulo, São Paulo, São Paulo, Brazil
| | - Loriz Francisco Sallum
- Institute of Mathematical and Computer Sciences, University of São Paulo, São Paulo, São Paulo, Brazil
| | | | | | - Joel Augusto Moura Porto
- Institute of Physics of São Carlos (IFSC), University of São Paulo (USP), São Carlos, São Paulo, Brazil
- Institute of Biological Information Processing, Heinrich Heine University Düsseldorf, Düsseldorf, North Rhine–Westphalia Land, Germany
| | - Christiane Thielemann
- BioMEMS Lab, Aschaffenburg University of Applied Sciences, Aschaffenburg, Bayern, Germany
| | - Patrícia Maria de Carvalho Aguiar
- Hospital Israelita Albert Einstein, São Paulo, São Paulo, Brazil
- Department of Neurology and Neurosurgery, Federal University of São Paulo, São Paulo, São Paulo, Brazil
| | - Michael Moeckel
- Laboratory for Hybrid Modeling, Aschaffenburg University of Applied Sciences, Aschaffenburg, Bayern, Germany
| |
Collapse
|
16
|
Aref S, Mostajabdaveh M, Chheda H. Bayan algorithm: Detecting communities in networks through exact and approximate optimization of modularity. Phys Rev E 2024; 110:044315. [PMID: 39562863 DOI: 10.1103/physreve.110.044315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2024] [Accepted: 09/24/2024] [Indexed: 11/21/2024]
Abstract
Community detection is a classic network problem with extensive applications in various fields. Its most common method is using modularity maximization heuristics which rarely return an optimal partition or anything similar. Partitions with globally optimal modularity are difficult to compute, and therefore have been underexplored. Using structurally diverse networks, we compare 30 community detection methods including our proposed algorithm that offers optimality and approximation guarantees: the Bayan algorithm. Unlike existing methods, Bayan globally maximizes modularity or approximates it within a factor. Our results show the distinctive accuracy and stability of maximum-modularity partitions in retrieving planted partitions at rates higher than most alternatives for a wide range of parameter settings in two standard benchmarks. Compared to the partitions from 29 other algorithms, maximum-modularity partitions have the best medians for description length, coverage, performance, average conductance, and well clusteredness. These advantages come at the cost of additional computations which Bayan makes possible for small networks (networks that have up to 3000 edges in their largest connected component). Bayan is several times faster than using open-source and commercial solvers for modularity maximization, making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Our results point to a few well-performing algorithms, among which Bayan stands out as the most reliable method for small networks. A python implementation of the Bayan algorithm (bayanpy) is publicly available through the package installer for python.
Collapse
|
17
|
Frolova D, Lima L, Roberts LW, Bohnenkämper L, Wittler R, Stoye J, Iqbal Z. Applying rearrangement distances to enable plasmid epidemiology with pling. Microb Genom 2024; 10:001300. [PMID: 39401066 PMCID: PMC11472880 DOI: 10.1099/mgen.0.001300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 09/05/2024] [Indexed: 10/15/2024] Open
Abstract
Plasmids are a key vector of antibiotic resistance, but the current bioinformatics toolkit is not well suited to tracking them. The rapid structural changes seen in plasmid genomes present considerable challenges to evolutionary and epidemiological analysis. Typical approaches are either low resolution (replicon typing) or use shared k-mer content to define a genetic distance. However, this distance can both overestimate plasmid relatedness by ignoring rearrangements, and underestimate by over-penalizing gene gain/loss. Therefore a model is needed which captures the key components of how plasmid genomes evolve structurally - through gene/block gain or loss, and rearrangement. A secondary requirement is to prevent promiscuous transposable elements (TEs) leading to over-clustering of unrelated plasmids. We choose the 'Double Cut and Join Indel' (DCJ-Indel) model, in which plasmids are studied at a coarse level, as a sequence of signed integers (representing genes or aligned blocks), and the distance between two plasmids is the minimum number of rearrangement events or indels needed to transform one into the other. We show how this gives much more meaningful distances between plasmids. We introduce a software workflow pling (https://github.com/iqbal-lab-org/pling), which uses the DCJ-Indel model, to calculate distances between plasmids and then cluster them. In our approach, we combine containment distances and DCJ-Indel distances to build a TE-aware plasmid network. We demonstrate superior performance and interpretability to other plasmid clustering tools on the 'Russian Doll' dataset and a hospital transmission dataset.
Collapse
Affiliation(s)
- Daria Frolova
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - Leandro Lima
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - Leah Wendy Roberts
- Centre for Immunology and Infection Control, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Leonard Bohnenkämper
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
- Graduate School "Digital Infrastructure for the Life Sciences" (DILS), Bielefeld University, Bielefeld, Germany
| | - Roland Wittler
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Jens Stoye
- Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Zamin Iqbal
- European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
- Milner Centre for Evolution, University of Bath, Bath, UK
| |
Collapse
|
18
|
Yu R, Liu Y, Yang R, Wu Y. VQGNet: An Unsupervised Defect Detection Approach for Complex Textured Steel Surfaces. SENSORS (BASEL, SWITZERLAND) 2024; 24:6252. [PMID: 39409292 PMCID: PMC11478711 DOI: 10.3390/s24196252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 08/20/2024] [Accepted: 08/23/2024] [Indexed: 10/20/2024]
Abstract
Defect detection on steel surfaces with complex textures is a critical and challenging task in the industry. The limited number of defect samples and the complexity of the annotation process pose significant challenges. Moreover, performing defect segmentation based on accurate identification further increases the task's difficulty. To address this issue, we propose VQGNet, an unsupervised algorithm that can precisely recognize and segment defects simultaneously. A feature fusion method based on aggregated attention and a classification-aided module is proposed to segment defects by integrating different features in the original images and the anomaly maps, which direct the attention to the anomalous information instead of the irregular complex texture. The anomaly maps are generated more confidently using strategies for multi-scale feature fusion and neighbor feature aggregation. Moreover, an anomaly generation method suitable for grayscale images is introduced to facilitate the model's learning on the anomalous samples. The refined anomaly maps and fused features are both input into the classification-aided module for the final classification and segmentation. VQGNet achieves state-of-the-art (SOTA) performance on the industrial steel dataset, with an I-AUROC of 99.6%, I-F1 of 98.8%, P-AUROC of 97.0%, and P-F1 of 80.3%. Additionally, ViT-Query demonstrates robust generalization capabilities in generating anomaly maps based on the Kolektor Surface-Defect Dataset 2.
Collapse
Affiliation(s)
| | | | | | - Yingna Wu
- Center for Adaptive System Engineering, ShanghaiTech University, Shanghai 201210, China; (R.Y.); (Y.L.); (R.Y.)
| |
Collapse
|
19
|
Hu X, Wang Z, Zhao J, Wang R, Lei H, Liu W, Long B. Location method for emergency rescue node on expressways based on spatio-temporal characteristics of vehicle operation. Sci Rep 2024; 14:19435. [PMID: 39169122 PMCID: PMC11339429 DOI: 10.1038/s41598-024-70532-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 08/19/2024] [Indexed: 08/23/2024] Open
Abstract
Expressway networks are continuously developing and emergency rescue demand is increasing proportionately. The location of expressway emergency rescue nodes needs refinement to meet changing requirements. In this study, the expressway was modeled as an expressway network. The differences in the origin destination (OD) distribution matrices for working days and major holidays were used as the bases for determining the need for temporary emergency rescue nodes. Overlapping and non-overlapping community detection algorithms were used to extract the distribution characteristics of OD during both day categories. These distributions were used to determine permanent and temporary emergency rescue sites. In this study, we considered the differences in traffic volume, distance, and impact of four vehicle types on traffic accidents to select the location of emergency rescue nodes, and allocate emergency resources. An emergency rescue node selection model for an expressway network was established based on spatio-temporal characteristics. The results based on a regional example determined that 22 permanent and 25 temporary emergency rescue nodes were appropriate. The average rescue time for traffic accidents during working days and major holidays compared to the P-center location model, was reduced by approximately 27.08% and 6.70%, respectively. The coefficient of variation of emergency rescue time was reduced by approximately 28.22% and 21.41%, respectively. The results indicated that the model satisfied the expressway emergency rescue demand requirements, and improved the rationality of the rescue center node layout.
Collapse
Affiliation(s)
- Xinghua Hu
- School of Traffic &Transportation, Chongqing Jiaotong University, Chongqing, 400074, People's Republic of China.
- Chongqing Key Laboratory of Intelligent Integrated and Multidimensional Transportation System, Chongqing Jiaotong University, Chongqing, 400074, People's Republic of China.
| | - Zhouzuo Wang
- School of Traffic &Transportation, Chongqing Jiaotong University, Chongqing, 400074, People's Republic of China
| | - Jiahao Zhao
- School of Traffic &Transportation, Chongqing Jiaotong University, Chongqing, 400074, People's Republic of China
- Chongqing Key Laboratory of Intelligent Integrated and Multidimensional Transportation System, Chongqing Jiaotong University, Chongqing, 400074, People's Republic of China
| | - Ran Wang
- Chongqing YouLiang Science & Technology Co., Ltd., Chongqing, 401336, People's Republic of China
| | - Hao Lei
- School of Traffic &Transportation, Chongqing Jiaotong University, Chongqing, 400074, People's Republic of China
| | - Wei Liu
- School of Traffic &Transportation, Chongqing Jiaotong University, Chongqing, 400074, People's Republic of China
| | - Bing Long
- Chongqing Transport Planning Institute, Chongqing, 400020, People's Republic of China
| |
Collapse
|
20
|
Qian J, Yang B, Wang S, Yuan S, Zhu W, Zhou Z, Zhang Y, Hu G. Drug Repurposing for COVID-19 by Constructing a Comorbidity Network with Central Nervous System Disorders. Int J Mol Sci 2024; 25:8917. [PMID: 39201608 PMCID: PMC11354300 DOI: 10.3390/ijms25168917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2024] [Revised: 08/06/2024] [Accepted: 08/14/2024] [Indexed: 09/02/2024] Open
Abstract
In the post-COVID-19 era, treatment options for potential SARS-CoV-2 outbreaks remain limited. An increased incidence of central nervous system (CNS) disorders has been observed in long-term COVID-19 patients. Understanding the shared molecular mechanisms between these conditions may provide new insights for developing effective therapies. This study developed an integrative drug-repurposing framework for COVID-19, leveraging comorbidity data with CNS disorders, network-based modular analysis, and dynamic perturbation analysis to identify potential drug targets and candidates against SARS-CoV-2. We constructed a comorbidity network based on the literature and data collection, including COVID-19-related proteins and genes associated with Alzheimer's disease, Parkinson's disease, multiple sclerosis, and autism spectrum disorder. Functional module detection and annotation identified a module primarily involved in protein synthesis as a key target module, utilizing connectivity map drug perturbation data. Through the construction of a weighted drug-target network and dynamic network-based drug-repurposing analysis, ubiquitin-carboxy-terminal hydrolase L1 emerged as a potential drug target. Molecular dynamics simulations suggested pregnenolone and BRD-K87426499 as two drug candidates for COVID-19. This study introduces a dynamic-perturbation-network-based drug-repurposing approach to identify COVID-19 drug targets and candidates by incorporating the comorbidity conditions of CNS disorders.
Collapse
Affiliation(s)
- Jing Qian
- MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Key Laboratory of Pathogen Bioscience and Anti-Infective Medicine, Department of Bioinformatics, Center for Systems Biology, School of Life Sciences, Suzhou Medical College of Soochow University, Suzhou 215213, China; (J.Q.); (S.W.)
| | - Bin Yang
- MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Key Laboratory of Pathogen Bioscience and Anti-Infective Medicine, Department of Bioinformatics, Center for Systems Biology, School of Life Sciences, Suzhou Medical College of Soochow University, Suzhou 215213, China; (J.Q.); (S.W.)
| | - Shuo Wang
- MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Key Laboratory of Pathogen Bioscience and Anti-Infective Medicine, Department of Bioinformatics, Center for Systems Biology, School of Life Sciences, Suzhou Medical College of Soochow University, Suzhou 215213, China; (J.Q.); (S.W.)
| | - Su Yuan
- MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Key Laboratory of Pathogen Bioscience and Anti-Infective Medicine, Department of Bioinformatics, Center for Systems Biology, School of Life Sciences, Suzhou Medical College of Soochow University, Suzhou 215213, China; (J.Q.); (S.W.)
| | - Wenjing Zhu
- MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Key Laboratory of Pathogen Bioscience and Anti-Infective Medicine, Department of Bioinformatics, Center for Systems Biology, School of Life Sciences, Suzhou Medical College of Soochow University, Suzhou 215213, China; (J.Q.); (S.W.)
| | - Ziyun Zhou
- MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Key Laboratory of Pathogen Bioscience and Anti-Infective Medicine, Department of Bioinformatics, Center for Systems Biology, School of Life Sciences, Suzhou Medical College of Soochow University, Suzhou 215213, China; (J.Q.); (S.W.)
| | - Yujuan Zhang
- Experimental Center of Suzhou Medical College of Soochow University, Suzhou 215123, China
| | - Guang Hu
- MOE Key Laboratory of Geriatric Diseases and Immunology, Suzhou Key Laboratory of Pathogen Bioscience and Anti-Infective Medicine, Department of Bioinformatics, Center for Systems Biology, School of Life Sciences, Suzhou Medical College of Soochow University, Suzhou 215213, China; (J.Q.); (S.W.)
- Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Soochow University, Suzhou 215123, China
- Key Laboratory of Alkene-Carbon Fibres-Based Technology & Application for Detection of Major Infectious Diseases, Soochow University, Suzhou 215123, China
- Jiangsu Key Laboratory of Infection and Immunity, Soochow University, Suzhou 215123, China
| |
Collapse
|
21
|
Kovács B, Kojaku S, Palla G, Fortunato S. Iterative embedding and reweighting of complex networks reveals community structure. Sci Rep 2024; 14:17184. [PMID: 39060433 PMCID: PMC11282304 DOI: 10.1038/s41598-024-68152-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 07/19/2024] [Indexed: 07/28/2024] Open
Abstract
Graph embeddings learn the structure of networks and represent it in low-dimensional vector spaces. Community structure is one of the features that are recognized and reproduced by embeddings. We show that an iterative procedure, in which a graph is repeatedly embedded and its links are reweighted based on the geometric proximity between the nodes, reinforces intra-community links and weakens inter-community links, making the clusters of the initial network more visible and more easily detectable. The geometric separation between the communities can become so strong that even a very simple parsing of the links may recover the communities as isolated components with surprisingly high precision. Furthermore, when used as a pre-processing step, our embedding and reweighting procedure can improve the performance of traditional community detection algorithms.
Collapse
Affiliation(s)
- Bianka Kovács
- Department of Biological Physics, Eötvös Loránd University, Budapest, Pázmány P. stny. 1/A, 1117, Hungary
| | - Sadamori Kojaku
- Luddy School of Informatics, Computing, and Engineering, Indiana University, 1015 East 11th Street, Bloomington, IN, 47408, USA
- Department of Systems Science and Industrial Engineering, SUNY Binghamton, P.O. Box 6000, Binghamton, NY, 13902, USA
| | - Gergely Palla
- Department of Biological Physics, Eötvös Loránd University, Budapest, Pázmány P. stny. 1/A, 1117, Hungary.
- Health Services Management Training Centre, Semmelweis University, Budapest, Kútvölgyi út 2., 1125, Hungary.
| | - Santo Fortunato
- Luddy School of Informatics, Computing, and Engineering, Indiana University, 1015 East 11th Street, Bloomington, IN, 47408, USA
| |
Collapse
|
22
|
Cai L, Zhou J, Wang D. Two-stage multi-objective evolutionary algorithm for overlapping community discovery. PeerJ Comput Sci 2024; 10:e2185. [PMID: 39145204 PMCID: PMC11323150 DOI: 10.7717/peerj-cs.2185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 06/18/2024] [Indexed: 08/16/2024]
Abstract
As one of the essential topological structures in complex networks, community structure has significant theoretical and application value and has attracted the attention of researchers in many fields. In a social network, individuals may belong to different communities simultaneously, such as a workgroup and a hobby group. Therefore, overlapping community discovery can help us understand and model the network structure of these multiple relationships more accurately. This article proposes a two-stage multi-objective evolutionary algorithm for overlapping community discovery problem. First, using the initialization method to divide the central node based on node degree, combined with the cross-mutation evolution strategy of the genome matrix, the first stage of non-overlapping community division is completed on the decomposition-based multi-objective optimization framework. Then, based on the result set of the first stage, appropriate nodes are selected from each individual's community as the central node of the initial population in the second stage, and the fuzzy threshold is optimized through the fuzzy clustering method based on evolutionary calculation and the feedback model, to find reasonable overlapping nodes. Finally, tests are conducted on synthetic datasets and real datasets. The statistical results demonstrate that compared with other representative algorithms, this algorithm performs optimally on test instances and has better results.
Collapse
Affiliation(s)
- Lei Cai
- Key Laboratory of Complex Systems and Intelligent Optimization of Guizhou Province, School of Computer and Information, Qiannan Normal University for Nationalities, Duyun, Guizhou, China
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang, China
| | - Jincheng Zhou
- Key Laboratory of Complex Systems and Intelligent Optimization of Guizhou Province, School of Computer and Information, Qiannan Normal University for Nationalities, Duyun, Guizhou, China
| | - Dan Wang
- School of Mathematics and Statistics, Qiannan Normal University for Nationalities, Duyun, Guizhou, China
| |
Collapse
|
23
|
Lin Z, Lin X, Yang X. An Automated Analysis Framework for Epidemiological Survey on COVID-19. IEEE J Biomed Health Inform 2024; 28:3186-3199. [PMID: 38412074 DOI: 10.1109/jbhi.2024.3370253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024]
Abstract
For a long time, the prevention and control of COVID-19 has received significant attention. A crucial aspect of controlling the disease's spread is the epidemiological survey of patients and the subsequent analysis of epidemiological survey reports (case reports). However, current mainstream analysis approaches are all made manually. This manual method is time-consuming and manpower-intensive. This paper designs an automated visual epidemiological survey analysis (AVESA) framework for the epidemiological survey on COVID-19. AVESA designs a deep neural network for information extraction from case reports and automatically constructs an epidemiological knowledge graph based on predefined pattern. Moreover, a multi-dimensional knowledge reasoning model is developed for conducting knowledge reasoning in the complete COVID-19 epidemiological knowledge graph. In the entity extraction sub-task and multi-task extraction sub-task, AVESA achieved F1 scores of 85.12% and 92.29% respectively on the constructed dataset, significantly outperforming the standalone information extraction models. In full-graph computing, all three experiments align closely with manual analysis standards. In the risk analysis experiment, the weighted PageRank algorithm showed an average improvement of 11.21% in Top_Recall_n% over the standard PageRank algorithm. In the community detection experiment, the weighted Louvain algorithm showed a mere 4.34% community difference rate compared to manual analysis.
Collapse
|
24
|
Pratelli M, Saracco F, Petrocchi M. Entropy-based detection of Twitter echo chambers. PNAS NEXUS 2024; 3:pgae177. [PMID: 38737768 PMCID: PMC11086943 DOI: 10.1093/pnasnexus/pgae177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 04/15/2024] [Indexed: 05/14/2024]
Abstract
Echo chambers, i.e. clusters of users exposed to news and opinions in line with their previous beliefs, were observed in many online debates on social platforms. We propose a completely unbiased entropy-based method for detecting echo chambers. The method is completely agnostic to the nature of the data. In the Italian Twitter debate about the Covid-19 vaccination, we find a limited presence of users in echo chambers (about 0.35% of all users). Nevertheless, their impact on the formation of a common discourse is strong, as users in echo chambers are responsible for nearly a third of the retweets in the original dataset. Moreover, in the case study observed, echo chambers appear to be a receptacle for disinformative content.
Collapse
Affiliation(s)
- Manuel Pratelli
- IMT School For Advanced Studies Lucca, Piazza San Francesco 19, Lucca 55100, Italy
- Istituto di Informatica e Telematica, CNR, via G. Moruzzi 1, Pisa 56124, Italy
| | - Fabio Saracco
- “Enrico Fermi” Research Center, Via Panisperna 89A, Rome 00184, Italy
- IMT School For Advanced Studies Lucca, Piazza San Francesco 19, Lucca 55100, Italy
- Institute for Applied Computing “Mauro Picone”, CNR, Via dei Taurini 19, Rome 00185, Italy
| | - Marinella Petrocchi
- Istituto di Informatica e Telematica, CNR, via G. Moruzzi 1, Pisa 56124, Italy
- IMT School For Advanced Studies Lucca, Piazza San Francesco 19, Lucca 55100, Italy
| |
Collapse
|
25
|
Wang Z, Huang R, Yang D, Peng Y, Zhou B, Chen Z. Identifying influential nodes based on the disassortativity and community structure of complex network. Sci Rep 2024; 14:8453. [PMID: 38605134 PMCID: PMC11009344 DOI: 10.1038/s41598-024-59071-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Accepted: 04/07/2024] [Indexed: 04/13/2024] Open
Abstract
The complex networks exhibit significant heterogeneity in node connections, resulting in a few nodes playing critical roles in various scenarios, including decision-making, disease control, and population immunity. Therefore, accurately identifying these influential nodes that play crucial roles in networks is very important. Many methods have been proposed in different fields to solve this issue. This paper focuses on the different types of disassortativity existing in networks and innovatively introduces the concept of disassortativity of the node, namely, the inconsistency between the degree of a node and the degrees of its neighboring nodes, and proposes a measure of disassortativity of the node (DoN) by a step function. Furthermore, the paper analyzes and indicates that in many real-world network applications, such as online social networks, the influence of nodes within the network is often associated with disassortativity of the node and the community boundary structure of the network. Thus, the influential metric of node based on disassortativity and community structure (mDC) is proposed. Extensive experiments are conducted in synthetic and real networks, and the performance of the DoN and mDC is validated through network robustness experiments and immune experiment of disease infection. Experimental and analytical results demonstrate that compared to other state-of-the-art centrality measures, the proposed methods (DoN and mDC) exhibits superior identification performance and efficiency, particularly in non-disassortative networks and networks with clear community structures. Furthermore, we find that the DoN and mDC exhibit high stability to network noise and inaccuracies of the network data.
Collapse
Affiliation(s)
- Zuxi Wang
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, People's Republic of China
- National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, People's Republic of China
- Key Laboratory of Image Information Processing and Intelligent Control, Ministry of Education of China, Wuhan, 430074, People's Republic of China
| | - Ruixiang Huang
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, People's Republic of China
- National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, People's Republic of China
- Key Laboratory of Image Information Processing and Intelligent Control, Ministry of Education of China, Wuhan, 430074, People's Republic of China
| | - Dian Yang
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, People's Republic of China
- National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, People's Republic of China
- Key Laboratory of Image Information Processing and Intelligent Control, Ministry of Education of China, Wuhan, 430074, People's Republic of China
| | - Yuqiang Peng
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, People's Republic of China
| | - Boyun Zhou
- School of Information and Electronic Engineering, Zhejiang Gongshang University, Hangzhou, 310018, People's Republic of China
| | - Zhong Chen
- School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, People's Republic of China.
- National Key Laboratory of Multispectral Information Intelligent Processing Technology, Wuhan, 430074, People's Republic of China.
- Key Laboratory of Image Information Processing and Intelligent Control, Ministry of Education of China, Wuhan, 430074, People's Republic of China.
| |
Collapse
|
26
|
Loo EPI, Durán P, Pang TY, Westhoff P, Deng C, Durán C, Lercher M, Garrido-Oter R, Frommer WB. Sugar transporters spatially organize microbiota colonization along the longitudinal root axis of Arabidopsis. Cell Host Microbe 2024; 32:543-556.e6. [PMID: 38479394 DOI: 10.1016/j.chom.2024.02.014] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 02/01/2024] [Accepted: 02/21/2024] [Indexed: 04/13/2024]
Abstract
Plant roots are functionally heterogeneous in cellular architecture, transcriptome profile, metabolic state, and microbial immunity. We hypothesized that axial differentiation may also impact spatial colonization by root microbiota along the root axis. We developed two growth systems, ArtSoil and CD-Rhizotron, to grow and then dissect Arabidopsis thaliana roots into three segments. We demonstrate that distinct endospheric and rhizosphere bacterial communities colonize the segments, supporting the hypothesis of microbiota differentiation along the axis. Root metabolite profiling of each segment reveals differential metabolite enrichment and specificity. Bioinformatic analyses and GUS histochemistry indicate microbe-induced accumulation of SWEET2, 4, and 12 sugar uniporters. Profiling of root segments from sweet mutants shows altered spatial metabolic profiles and reorganization of endospheric root microbiota. This work reveals the interdependency between root metabolites and microbial colonization and the contribution of SWEETs to spatial diversity and stability of microbial ecosystem.
Collapse
Affiliation(s)
- Eliza P-I Loo
- Heinrich Heine University Düsseldorf, Faculty of Mathematics and Natural Sciences, Institute for Molecular Physiology, 40225 Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, 40225 Düsseldorf, Germany.
| | - Paloma Durán
- Department of Plant-Microbe Interactions, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany; Cluster of Excellence on Plant Sciences, 40225 Düsseldorf, Germany
| | - Tin Yau Pang
- Heinrich Heine University Düsseldorf, Faculty of Mathematics and Natural Sciences, Institute for Computer Science and Department of Biology, 40225 Düsseldorf, Germany; Heinrich Heine University Düsseldorf, Medical Faculty and University Hospital Düsseldorf, Division of Cardiology, Pulmonology and Vascular Medicine, 40225 Düsseldorf, Germany
| | - Philipp Westhoff
- Heinrich Heine University Düsseldorf, Faculty of Mathematics and Natural Sciences, Plant Metabolism and Metabolomics Laboratory, 40225 Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, 40225 Düsseldorf, Germany
| | - Chen Deng
- Heinrich Heine University Düsseldorf, Faculty of Mathematics and Natural Sciences, Institute for Molecular Physiology, 40225 Düsseldorf, Germany
| | - Carlos Durán
- Department of Plant-Microbe Interactions, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany
| | - Martin Lercher
- Heinrich Heine University Düsseldorf, Faculty of Mathematics and Natural Sciences, Institute for Computer Science and Department of Biology, 40225 Düsseldorf, Germany; Heinrich Heine University Düsseldorf, Medical Faculty and University Hospital Düsseldorf, Division of Cardiology, Pulmonology and Vascular Medicine, 40225 Düsseldorf, Germany
| | - Ruben Garrido-Oter
- Department of Plant-Microbe Interactions, Max Planck Institute for Plant Breeding Research, 50829 Cologne, Germany; Cluster of Excellence on Plant Sciences, 40225 Düsseldorf, Germany; Earlham Institute, Norwich NR4 7UZ, UK
| | - Wolf B Frommer
- Heinrich Heine University Düsseldorf, Faculty of Mathematics and Natural Sciences, Institute for Molecular Physiology, 40225 Düsseldorf, Germany; Cluster of Excellence on Plant Sciences, 40225 Düsseldorf, Germany; Institute of Transformative Bio-Molecules (WPI-ITbM), Nagoya University, 464-8601 Nagoya, Japan.
| |
Collapse
|
27
|
Su X, Xue S, Liu F, Wu J, Yang J, Zhou C, Hu W, Paris C, Nepal S, Jin D, Sheng QZ, Yu PS. A Comprehensive Survey on Community Detection With Deep Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4682-4702. [PMID: 35263257 DOI: 10.1109/tnnls.2021.3137396] [Citation(s) in RCA: 37] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Detecting a community in a network is a matter of discerning the distinct features and connections of a group of members that are different from those in other communities. The ability to do this is of great significance in network analysis. However, beyond the classic spectral clustering and statistical inference methods, there have been significant developments with deep learning techniques for community detection in recent years-particularly when it comes to handling high-dimensional network data. Hence, a comprehensive review of the latest progress in community detection through deep learning is timely. To frame the survey, we have devised a new taxonomy covering different state-of-the-art methods, including deep learning models based on deep neural networks (DNNs), deep nonnegative matrix factorization, and deep sparse filtering. The main category, i.e., DNNs, is further divided into convolutional networks, graph attention networks, generative adversarial networks, and autoencoders. The popular benchmark datasets, evaluation metrics, and open-source implementations to address experimentation settings are also summarized. This is followed by a discussion on the practical applications of community detection in various domains. The survey concludes with suggestions of challenging topics that would make for fruitful future research directions in this fast-growing deep learning field.
Collapse
|
28
|
Christensen AP, Garrido LE, Guerra-Peña K, Golino H. Comparing community detection algorithms in psychometric networks: A Monte Carlo simulation. Behav Res Methods 2024; 56:1485-1505. [PMID: 37326769 DOI: 10.3758/s13428-023-02106-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/03/2023] [Indexed: 06/17/2023]
Abstract
Identifying the correct number of factors in multivariate data is fundamental to psychological measurement. Factor analysis has a long tradition in the field, but it has been challenged recently by exploratory graph analysis (EGA), an approach based on network psychometrics. EGA first estimates a network and then applies the Walktrap community detection algorithm. Simulation studies have demonstrated that EGA has comparable or better accuracy for recovering the same number of communities as there are factors in the simulated data than factor analytic methods. Despite EGA's effectiveness, there has yet to be an investigation into whether other sparsity induction methods or community detection algorithms could achieve equivalent or better performance. Furthermore, unidimensional structures are fundamental to psychological measurement yet they have been sparsely studied in simulations using community detection algorithms. In the present study, we performed a Monte Carlo simulation using the zero-order correlation matrix, GLASSO, and two variants of a non-regularized partial correlation sparsity induction methods with several community detection algorithms. We examined the performance of these method-algorithm combinations in both continuous and polytomous data across a variety of conditions. The results indicate that the Fast-greedy, Louvain, and Walktrap algorithms paired with the GLASSO method were consistently among the most accurate and least-biased overall.
Collapse
Affiliation(s)
- Alexander P Christensen
- Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, 37203, USA.
| | - Luis Eduardo Garrido
- Pontificia Universidad Católica Madre y Maestra, Santiago De Los Caballeros, Dominican Republic
| | - Kiero Guerra-Peña
- Pontificia Universidad Católica Madre y Maestra, Santiago De Los Caballeros, Dominican Republic
| | | |
Collapse
|
29
|
Guan J, Chen B, Huang X. Community Detection via Autoencoder-Like Nonnegative Tensor Decomposition. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:4179-4191. [PMID: 36170387 DOI: 10.1109/tnnls.2022.3201906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Community detection aims at partitioning a network into several densely connected subgraphs. Recently, nonnegative matrix factorization (NMF) has been widely adopted in many successful community detection applications. However, most existing NMF-based community detection algorithms neglect the multihop network topology and the extreme sparsity of adjacency matrices. To resolve them, we propose a novel conception of adjacency tensor, which extends adjacency matrix to multihop cases. Then, we develop a novel tensor Tucker decomposition-based community detection method-autoencoder-like nonnegative tensor decomposition (ANTD), leveraging the constructed adjacency tensor. Distinct from simply applying tensor decomposition on the constructed adjacency tensor, which only works as a decoder, ANTD also introduces an encoder component to constitute an autoencoder-like architecture, which can further enhance the quality of the detected communities. We also develop an efficient alternative updating algorithm with convergence guarantee to optimize ANTD, and theoretically analyze the algorithm complexity. Moreover, we also study a graph regularized variant of ANTD. Extensive experiments on real-world benchmark networks by comparing 27 state-of-the-art methods, validate the effectiveness, efficiency, and robustness of our proposed methods.
Collapse
|
30
|
Palukuri MV, Marcotte EM. DeepSLICEM: Clustering CryoEM particles using deep image and similarity graph representations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.04.578778. [PMID: 38370702 PMCID: PMC10871265 DOI: 10.1101/2024.02.04.578778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Finding the 3D structure of proteins and their complexes has several applications, such as developing vaccines that target viral proteins effectively. Methods such as cryogenic electron microscopy (cryo-EM) have improved in their ability to capture high-resolution images, and when applied to a purified sample containing copies of a macromolecule, they can be used to produce a high-quality snapshot of different 2D orientations of the macromolecule, which can be combined to reconstruct its 3D structure. Instead of purifying a sample so that it contains only one macromolecule, a process that can be difficult, time-consuming, and expensive, a cell sample containing multiple particles can be photographed directly and separated into its constituent particles using computational methods. Previous work, SLICEM, has separated 2D projection images of different particles into their respective groups using 2 methods, clustering a graph with edges weighted by pairwise similarities of common lines of the 2D projections. In this work, we develop DeepSLICEM, a pipeline that clusters rich representations of 2D projections, obtained by combining graphical features from a similarity graph based on common lines, with additional image features extracted from a convolutional neural network. DeepSLICEM explores 6 pretrained convolutional neural networks and one supervised Siamese CNN for image representation, 10 pretrained deep graph neural networks for similarity graph node representations, and 4 methods for clustering, along with 8 methods for directly clustering the similarity graph. On 6 synthetic and experimental datasets, the DeepSLICEM pipeline finds 92 method combinations achieving better clustering accuracy than previous methods from SLICEM. Thus, in this paper, we demonstrate that deep neural networks have great potential for accurately separating mixtures of 2D projections of different macromolecules computationally.
Collapse
Affiliation(s)
- Meghana V Palukuri
- Oden Institute for Computational Engineering and Sciences, University of Texas, Austin, TX 78712, USA
- Department of Molecular Biosciences, University of Texas, Austin, TX 78712, USA
| | - Edward M Marcotte
- Oden Institute for Computational Engineering and Sciences, University of Texas, Austin, TX 78712, USA
- Department of Molecular Biosciences, University of Texas, Austin, TX 78712, USA
| |
Collapse
|
31
|
Leeney W, McConville R. Uncertainty in GNN Learning Evaluations: A Comparison between Measures for Quantifying Randomness in GNN Community Detection. ENTROPY (BASEL, SWITZERLAND) 2024; 26:78. [PMID: 38248203 PMCID: PMC10813847 DOI: 10.3390/e26010078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/04/2024] [Accepted: 01/15/2024] [Indexed: 01/23/2024]
Abstract
(1) The enhanced capability of graph neural networks (GNNs) in unsupervised community detection of clustered nodes is attributed to their capacity to encode both the connectivity and feature information spaces of graphs. The identification of latent communities holds practical significance in various domains, from social networks to genomics. Current real-world performance benchmarks are perplexing due to the multitude of decisions influencing GNN evaluations for this task. (2) Three metrics are compared to assess the consistency of algorithm rankings in the presence of randomness. The consistency and quality of performance between the results under a hyperparameter optimisation with the default hyperparameters is evaluated. (3) The results compare hyperparameter optimisation with default hyperparameters, revealing a significant performance loss when neglecting hyperparameter investigation. A comparison of metrics indicates that ties in ranks can substantially alter the quantification of randomness. (4) Ensuring adherence to the same evaluation criteria may result in notable differences in the reported performance of methods for this task. The W randomness coefficient, based on the Wasserstein distance, is identified as providing the most robust assessment of randomness.
Collapse
Affiliation(s)
- William Leeney
- School of Engineering Mathematics and Technology, University of Bristol, Bristol BS8 1TR, UK;
| | | |
Collapse
|
32
|
Malekzadeh M, Long JA. A network community structure similarity index for weighted networks. PLoS One 2023; 18:e0292018. [PMID: 38019878 PMCID: PMC10686481 DOI: 10.1371/journal.pone.0292018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 09/10/2023] [Indexed: 12/01/2023] Open
Abstract
Identification of communities in complex systems is an essential part of network analysis. Accordingly, measuring similarities between communities is a fundamental part of analysing community structure in different, yet related, networks. Commonly used methods for quantifying network community similarity fail to consider the effects of edge weights. Existing methods remain limited when the two networks being compared have different numbers of nodes. In this study, we address these issues by proposing a novel network community structure similarity index (NCSSI) based on the edit distance concept. NCSSI is proposed as a similarity index for comparing network communities. The NCSSI incorporates both community labels and edge weights. The NCSSI can also be employed to assess the similarity between two communities with varying numbers of nodes. We test the proposed method using simulated data and case-study analysis of New York Yellow Taxi flows and compare the results with that of other commonly used methods (i.e., mutual information and the Jaccard index). Our results highlight how NCSSI effectively captures the impact of both label and edge weight changes and their impacts on community structure, which are not captured in existing approaches. In conclusion, NCSSI offers a new approach that incorporates both label and weight variations for community similarity measurement in complex networks.
Collapse
Affiliation(s)
- Milad Malekzadeh
- Department of Geography and Environment, Western University, London, ON, Canada
| | - Jed A. Long
- Department of Geography and Environment, Western University, London, ON, Canada
| |
Collapse
|
33
|
Tohalino JAV, Silva TC, Amancio DR. Using citation networks to evaluate the impact of text length on keyword extraction. PLoS One 2023; 18:e0294500. [PMID: 38011182 PMCID: PMC10681196 DOI: 10.1371/journal.pone.0294500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 11/02/2023] [Indexed: 11/29/2023] Open
Abstract
The identification of key concepts within unstructured data is of paramount importance in practical applications. Despite the abundance of proposed methods for extracting primary topics, only a few works investigated the influence of text length on the performance of keyword extraction (KE) methods. Specifically, many studies lean on abstracts and titles for content extraction from papers, leaving it uncertain whether leveraging the complete content of papers can yield consistent results. Hence, in this study, we employ a network-based approach to evaluate the concordance between keywords extracted from abstracts and those from the entire papers. Community detection methods are utilized to identify interconnected papers in citation networks. Subsequently, paper clusters are formed to identify salient terms within each cluster, employing a methodology akin to the term frequency-inverse document frequency (tf-idf) approach. Once each cluster has been endowed with its distinctive set of key terms, these selected terms are employed to serve as representative keywords at the paper level. The top-ranked words at the cluster level, which also appear in the abstract, are chosen as keywords for the paper. Our findings indicate that although various community detection methods used in KE yield similar levels of accuracy. Notably, text clustering approaches outperform all citation-based methods, while all approaches yield relatively low accuracy values. We also identified a lack of concordance between keywords extracted from the abstracts and those extracted from the corresponding full-text source. Considering that citations and text clustering yield distinct outcomes, combining them in hybrid approaches could offer improved performance.
Collapse
Affiliation(s)
| | | | - Diego R. Amancio
- Institute of Mathematics and Computer Science – USP, São Carlos, SP, Brazil
| |
Collapse
|
34
|
Naghizadeh MM, Osati S, Homayounfar R, Masoudi-Nejad A. Food co-consumption network as a new approach to dietary pattern in non-alcoholic fatty liver disease. Sci Rep 2023; 13:20703. [PMID: 38001137 PMCID: PMC10673913 DOI: 10.1038/s41598-023-47752-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Accepted: 11/17/2023] [Indexed: 11/26/2023] Open
Abstract
Dietary patterns strongly correlate with non-alcoholic fatty liver disease (NAFLD), which is a leading cause of chronic liver disease in developed societies. In this study, we introduce a new definition, the co-consumption network (CCN), which depicts the common consumption patterns of food groups through network analysis. We then examine the relationship between dietary patterns and NAFLD by analyzing this network. We selected 1500 individuals living in Tehran, Iran, cross-sectionally. They completed a food frequency questionnaire and underwent scanning via the FibroScan for liver stiffness, using the CAP score. The food items were categorized into 40 food groups. We reconstructed the CCN using the Spearman correlation-based connection. We then created healthy and unhealthy clusters using the label propagation algorithm. Participants were assigned to two clusters using the hypergeometric distribution. Finally, we classified participants into two healthy NAFLD networks, and reconstructed the gender and disease differential CCNs. We found that the sweet food group was the hub of the proposed CCN, with the largest cliques of size 5 associated with the unhealthy cluster. The unhealthy module members had a significantly higher CAP score (253.7 ± 47.8) compared to the healthy module members (218.0 ± 46.4) (P < 0.001). The disease differential CCN showed that in the case of NAFLD, processed meat had been co-consumed with mayonnaise and soft drinks, in contrast to the healthy participants, who had co-consumed fruits with green leafy and yellow vegetables. The CCN is a powerful method for presenting food groups, their consumption quantity, and their interactions efficiently. Moreover, it facilitates the examination of the relationship between dietary patterns and NAFLD.
Collapse
Affiliation(s)
- Mohammad Mehdi Naghizadeh
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
- Noncommunicable Diseases Research Center, Fasa University of Medical Science, Fasa, Iran
| | - Saeed Osati
- National Nutrition and Food Technology Research Institute, Faculty of Nutrition Sciences and Food Technology, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Reza Homayounfar
- National Nutrition and Food Technology Research Institute, Faculty of Nutrition Sciences and Food Technology, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
| |
Collapse
|
35
|
Plamper P, Lechtenfeld OJ, Herzsprung P, Groß A. A Temporal Graph Model to Predict Chemical Transformations in Complex Dissolved Organic Matter. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18116-18126. [PMID: 37159837 PMCID: PMC10666529 DOI: 10.1021/acs.est.3c00351] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 04/25/2023] [Accepted: 04/25/2023] [Indexed: 05/11/2023]
Abstract
Dissolved organic matter (DOM) is a complex mixture of thousands of natural molecules that undergo constant transformation in the environment, such as sunlight induced photochemical reactions. Despite molecular level resolution from ultrahigh resolution mass spectrometry (UHRMS), trends of mass peak intensities are currently the only way to follow photochemically induced molecular changes in DOM. Many real-world relationships and temporal processes can be intuitively modeled using graph data structures (networks). Graphs enhance the potential and value of AI applications by adding context and interconnections allowing the uncovering of hidden or unknown relationships in data sets. We use a temporal graph model and link prediction to identify transformations of DOM molecules in a photo-oxidation experiment. Our link prediction algorithm simultaneously considers educt removal and product formation for molecules linked by predefined transformation units (oxidation, decarboxylation, etc.). The transformations are further weighted by the extent of intensity change and clustered on the graph structure to identify groups of similar reactivity. The temporal graph is capable of identifying relevant molecules subject to similar reactions and enabling to study their time course. Our approach overcomes previous data evaluation limitations for mechanistic studies of DOM and leverages the potential of temporal graphs to study DOM reactivity by UHRMS.
Collapse
Affiliation(s)
- Philipp Plamper
- Anhalt
University of Applied Sciences, Department Computer Science and Languages, Lohmannstraße 23, Köthen 06366, Germany
| | - Oliver J. Lechtenfeld
- Helmholtz
Centre for Environmental Research − UFZ, Department of Analytical Chemistry, Research Group
BioGeoOmics, Permoserstraße
15, Leipzig 04318, Germany
- ProVIS
- Centre for Chemical Microscopy, Helmholtz Centre for Environmental
Research - UFZ, Permoserstraße
15, Leipzig 04318, Germany
| | - Peter Herzsprung
- Helmholtz
Centre for Environmental Research − UFZ, Department of Lake Research, Brückstraße 3a, Magdeburg 39114, Germany
| | - Anika Groß
- Anhalt
University of Applied Sciences, Department Computer Science and Languages, Lohmannstraße 23, Köthen 06366, Germany
| |
Collapse
|
36
|
Qi K, Zhang H, Zhou Y, Liu Y, Li Q. A community partitioning algorithm for cyberspace. Sci Rep 2023; 13:19021. [PMID: 37923794 PMCID: PMC10624825 DOI: 10.1038/s41598-023-46556-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 11/02/2023] [Indexed: 11/06/2023] Open
Abstract
Community partitioning is an effective technique for cyberspace mapping. However, existing community partitioning algorithm only uses the topological structure of the network to divide the community and disregards factors such as real hierarchy, overlap, and directionality of information transmission between communities in cyberspace. Consequently, the traditional community division algorithm is not suitable for dividing cyberspace resources effectively. Based on cyberspace community structure characteristics, this study introduces an algorithm that combines an improved local fitness maximization (LFM) algorithm with the PageRank (PR) algorithm for community partitioning on cyberspace resources, called PR-LFM. First, seed nodes are determined using degree centrality, followed by local community expansion. Nodes belonging to multiple communities undergo further partitioning so that they are retained in the community where they are most important, thus preserving the community's original structure. The experimental data demonstrate good results in the resource division of cyberspace.
Collapse
Affiliation(s)
- Kai Qi
- Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou, 450001, Henan, China
| | - Heng Zhang
- Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou, 450001, Henan, China.
| | - Yang Zhou
- Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou, 450001, Henan, China
| | - Yifan Liu
- Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou, 450001, Henan, China
| | - Qingxiang Li
- Institute of Geospatial Information, PLA Strategic Support Force Information Engineering University, Zhengzhou, 450001, Henan, China
| |
Collapse
|
37
|
Tian M, Moriano P. Robustness of community structure under edge addition. Phys Rev E 2023; 108:054302. [PMID: 38115408 DOI: 10.1103/physreve.108.054302] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 09/08/2023] [Indexed: 12/21/2023]
Abstract
Communities often represent key structural and functional clusters in networks. To preserve such communities, it is important to understand their robustness under network perturbations. Previous work in community robustness analysis has focused on studying changes in the community structure as a response of edge rewiring and node or edge removal. However, the impact of increasing connectivity on the robustness of communities in networked systems is relatively unexplored. Studying the limits of community robustness under edge addition is crucial to better understanding the cases in which density expands or false edges erroneously appear. In this paper, we analyze the effect of edge addition on community robustness in synthetic and empirical temporal networks. We study two scenarios of edge addition: random and targeted. We use four community detection algorithms, Infomap, Label Propagation, Leiden, and Louvain, and demonstrate the results in community similarity metrics. The experiments on synthetic networks show that communities are more robust when the initial partition is stronger or the edge addition is random, and the experiments on empirical data also indicate that robustness performance can be affected by the community similarity metric. Overall, our results suggest that the communities identified by the different types of community detection algorithms exhibit different levels of robustness, and so the robustness of communities depends strongly on the choice of detection method.
Collapse
Affiliation(s)
- Moyi Tian
- Division of Applied Mathematics, Brown University, Providence, Rhode Island 02912, USA
| | - Pablo Moriano
- Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37830, USA
| |
Collapse
|
38
|
Zhang C, Li X, Pei H, Zhang Z, Liu B, Yang B. LaenNet: Learning robust GCNs by propagating labels. Neural Netw 2023; 168:652-664. [PMID: 37847949 DOI: 10.1016/j.neunet.2023.09.035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 03/03/2023] [Accepted: 09/20/2023] [Indexed: 10/19/2023]
Abstract
Graph Convolutional Networks (GCNs) can be acknowledged as one of the most significant methodologies for graph representation learning, and the family of GCNs has recently achieved great success in the community. However, in real-world scenarios, the graph data may be imperfect, e.g., with noisy and sparse features or labels, which poses a great challenge to the robustness of GCNs. To meet this challenge, we propose a simple-yet-effective LAbel-ENhanced Networks (LaenNet) architecture for GCNs, where the basic spirit is to propagate labels together with features. Specifically, we add an extra LaenNet module at one hidden layer of GCNs, which propagates labels along the graph and then integrates them with the hidden representations as the inputs to the deeper layer. The proposed LaenNet can be directly generalized to the variants of GCNs. We conduct extensive experiments to verify LaenNet on semi-supervised node classification tasks under four noisy and sparse graph data scenarios, including the graphs with noisy features, sparse features, noisy labels, and sparse labels. Empirical results indicate the superiority and robustness of LaenNet compared to the state-of-the-art baseline models. The implementation code is available to ease reproducibility1.
Collapse
Affiliation(s)
- Chunxu Zhang
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
| | - Ximing Li
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
| | | | - Zijian Zhang
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
| | - Bing Liu
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
| | - Bo Yang
- College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
| |
Collapse
|
39
|
Gaiteri C, Connell DR, Sultan FA, Iatrou A, Ng B, Szymanski BK, Zhang A, Tasaki S. Robust, scalable, and informative clustering for diverse biological networks. Genome Biol 2023; 24:228. [PMID: 37828545 PMCID: PMC10571258 DOI: 10.1186/s13059-023-03062-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 09/19/2023] [Indexed: 10/14/2023] Open
Abstract
Clustering molecular data into informative groups is a primary step in extracting robust conclusions from big data. However, due to foundational issues in how they are defined and detected, such clusters are not always reliable, leading to unstable conclusions. We compare popular clustering algorithms across thousands of synthetic and real biological datasets, including a new consensus clustering algorithm-SpeakEasy2: Champagne. These tests identify trends in performance, show no single method is universally optimal, and allow us to examine factors behind variation in performance. Multiple metrics indicate SpeakEasy2 generally provides robust, scalable, and informative clusters for a range of applications.
Collapse
Affiliation(s)
- Chris Gaiteri
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA.
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
- Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA.
| | - David R Connell
- Rush University Graduate College, Rush University Medical Center, Chicago, IL, USA
| | - Faraz A Sultan
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Artemis Iatrou
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Harvard University, Belmont, MA, USA
| | - Bernard Ng
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Boleslaw K Szymanski
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
- Network Science and Technology Center, Rensselaer Polytechnic Institute, Troy, NY, USA
- Academy of Social Sciences, Łódź, Poland
| | - Ada Zhang
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
- Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA
| |
Collapse
|
40
|
Alves CL, Toutain TGLDO, Porto JAM, Aguiar PMDC, de Sena EP, Rodrigues FA, Pineda AM, Thielemann C. Analysis of functional connectivity using machine learning and deep learning in different data modalities from individuals with schizophrenia. J Neural Eng 2023; 20:056025. [PMID: 37673060 DOI: 10.1088/1741-2552/acf734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 09/06/2023] [Indexed: 09/08/2023]
Abstract
Objective. Schizophrenia(SCZ) is a severe mental disorder associated with persistent or recurrent psychosis, hallucinations, delusions, and thought disorders that affect approximately 26 million people worldwide, according to the World Health Organization. Several studies encompass machine learning (ML) and deep learning algorithms to automate the diagnosis of this mental disorder. Others study SCZ brain networks to get new insights into the dynamics of information processing in individuals suffering from the condition. In this paper, we offer a rigorous approach with ML and deep learning techniques for evaluating connectivity matrices and measures of complex networks to establish an automated diagnosis and comprehend the topology and dynamics of brain networks in SCZ individuals.Approach.For this purpose, we employed an functional magnetic resonance imaging (fMRI) and electroencephalogram (EEG) dataset. In addition, we combined EEG measures, i.e. Hjorth mobility and complexity, with complex network measurements to be analyzed in our model for the first time in the literature.Main results.When comparing the SCZ group to the control group, we found a high positive correlation between the left superior parietal lobe and the left motor cortex and a positive correlation between the left dorsal posterior cingulate cortex and the left primary motor. Regarding complex network measures, the diameter, which corresponds to the longest shortest path length in a network, may be regarded as a biomarker because it is the most crucial measure in different data modalities. Furthermore, the SCZ brain networks exhibit less segregation and a lower distribution of information. As a result, EEG measures outperformed complex networks in capturing the brain alterations associated with SCZ.Significance. Our model achieved an area under receiver operating characteristic curve (AUC) of 100% and an accuracy of 98.5% for the fMRI, an AUC of 95%, and an accuracy of 95.4% for the EEG data set. These are excellent classification results. Furthermore, we investigated the impact of specific brain connections and network measures on these results, which helped us better describe changes in the diseased brain.
Collapse
Affiliation(s)
- Caroline L Alves
- University of São Paulo (USP), Institute of Mathematical and Computer Sciences (ICMC), São Paulo, Brazil
- BioMEMS Lab, Aschaffenburg University of Applied Sciences, Aschaffenburg, Germany
| | | | | | - Patrícia Maria de Carvalho Aguiar
- Hospital Israelita Albert Einstein, São Paulo, Brazil
- Federal University of São Paulo, Department of Neurology and Neurosurgery, São Paulo, Brazil
| | | | - Francisco A Rodrigues
- University of São Paulo (USP), Institute of Mathematical and Computer Sciences (ICMC), São Paulo, Brazil
| | - Aruane M Pineda
- University of São Paulo (USP), Institute of Mathematical and Computer Sciences (ICMC), São Paulo, Brazil
| | | |
Collapse
|
41
|
Ho H, Chovatia M, Egan R, He G, Yoshinaga Y, Liachko I, O’Malley R, Wang Z. Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning. PeerJ 2023; 11:e16129. [PMID: 37753177 PMCID: PMC10519199 DOI: 10.7717/peerj.16129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 08/28/2023] [Indexed: 09/28/2023] Open
Abstract
Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of samples due to a lack of robust species co-abundance information. In this study, we exploited the chromatin conformation information obtained from Hi-C sequencing and developed a new reference-independent algorithm, Metagenome Binning with Abundance and Tetra-nucleotide frequencies-Long Range (metaBAT-LR), to improve the binning completeness of these datasets. This self-supervised algorithm builds a model from a set of high-quality genome bins to predict scaffold pairs that are likely to be derived from the same genome. Then, it applies these predictions to merge incomplete genome bins, as well as recruit unbinned scaffolds. We validated metaBAT-LR's ability to bin-merge and recruit scaffolds on both synthetic and real-world metagenome datasets of varying complexity. Benchmarking against similar software tools suggests that metaBAT-LR uncovers unique bins that were missed by all other methods. MetaBAT-LR is open-source and is available at https://bitbucket.org/project-metabat/metabat-lr.
Collapse
Affiliation(s)
- Harrison Ho
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Lab, Berkeley, CA, United States
- School of Natural Sciences, University of California, Merced, CA, United States
| | - Mansi Chovatia
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Lab, Berkeley, CA, United States
| | - Rob Egan
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Lab, Berkeley, CA, United States
| | - Guifen He
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Lab, Berkeley, CA, United States
| | - Yuko Yoshinaga
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Lab, Berkeley, CA, United States
| | | | - Ronan O’Malley
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Lab, Berkeley, CA, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Lab, Berkeley, CA, United States
| | - Zhong Wang
- Department of Energy Joint Genome Institute, Lawrence Berkeley National Lab, Berkeley, CA, United States
- School of Natural Sciences, University of California, Merced, CA, United States
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Lab, Berkeley, CA, United States
| |
Collapse
|
42
|
Tokala S, Enduri MK, Lakshmi TJ, Sharma H. Community-Based Matrix Factorization (CBMF) Approach for Enhancing Quality of Recommendations. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1360. [PMID: 37761659 PMCID: PMC10528144 DOI: 10.3390/e25091360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 09/08/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023]
Abstract
Matrix factorization is a long-established method employed for analyzing and extracting valuable insight recommendations from complex networks containing user ratings. The execution time and computational resources demanded by these algorithms pose limitations when confronted with large datasets. Community detection algorithms play a crucial role in identifying groups and communities within intricate networks. To overcome the challenge of extensive computing resources with matrix factorization techniques, we present a novel framework that utilizes the inherent community information of the rating network. Our proposed approach, named Community-Based Matrix Factorization (CBMF), has the following steps: (1) Model the rating network as a complex bipartite network. (2) Divide the network into communities. (3) Extract the rating matrices pertaining only to those communities and apply MF on these matrices in parallel. (4) Merge the predicted rating matrices belonging to communities and evaluate the root mean square error (RMSE). In our experimentation, we use basic MF, SVD++, and FANMF for matrix factorization, and the Louvain algorithm is used for community division. The experimental evaluation on six datasets shows that the proposed CBMF enhances the quality of recommendations in each case. In the MovieLens 100K dataset, RMSE has been reduced to 0.21 from 1.26 using SVD++ by dividing the network into 25 communities. A similar reduction in RMSE is observed for the datasets of FilmTrust, Jester, Wikilens, Good Books, and Cell Phone.
Collapse
Affiliation(s)
- Srilatha Tokala
- Algorithms and Complexity Theory Lab, Department of Computer Science and Engineering, SRM University-AP, Amaravati 522502, India; (S.T.); (M.K.E.); (T.J.L.)
| | - Murali Krishna Enduri
- Algorithms and Complexity Theory Lab, Department of Computer Science and Engineering, SRM University-AP, Amaravati 522502, India; (S.T.); (M.K.E.); (T.J.L.)
| | - T. Jaya Lakshmi
- Algorithms and Complexity Theory Lab, Department of Computer Science and Engineering, SRM University-AP, Amaravati 522502, India; (S.T.); (M.K.E.); (T.J.L.)
| | - Hemlata Sharma
- Department of Computing, Sheffield Hallam University, Howard Street, Sheffield S1 1WB, UK
| |
Collapse
|
43
|
Hallast P, Ebert P, Loftus M, Yilmaz F, Audano PA, Logsdon GA, Bonder MJ, Zhou W, Höps W, Kim K, Li C, Hoyt SJ, Dishuck PC, Porubsky D, Tsetsos F, Kwon JY, Zhu Q, Munson KM, Hasenfeld P, Harvey WT, Lewis AP, Kordosky J, Hoekzema K, O'Neill RJ, Korbel JO, Tyler-Smith C, Eichler EE, Shi X, Beck CR, Marschall T, Konkel MK, Lee C. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature 2023; 621:355-364. [PMID: 37612510 PMCID: PMC10726138 DOI: 10.1038/s41586-023-06425-6] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 07/11/2023] [Indexed: 08/25/2023]
Abstract
The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.
Collapse
Affiliation(s)
- Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Mark Loftus
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Feyza Yilmaz
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Peter A Audano
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marc Jan Bonder
- Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Weichen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Wolfram Höps
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - Kwondo Kim
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Chong Li
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Fotios Tsetsos
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Jee Young Kwon
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Qihui Zhu
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Patrick Hasenfeld
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Jan O Korbel
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany
| | | | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Xinghua Shi
- Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
| | - Christine R Beck
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- The University of Connecticut Health Center, Farmington, CT, USA
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Miriam K Konkel
- Department of Genetics & Biochemistry, Clemson University, Clemson, SC, USA
- Center for Human Genetics, Clemson University, Greenwood, SC, USA
| | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA.
| |
Collapse
|
44
|
Žalik KR, Žalik M. Density-Based Entropy Centrality for Community Detection in Complex Networks. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1196. [PMID: 37628226 PMCID: PMC10453840 DOI: 10.3390/e25081196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 07/31/2023] [Accepted: 08/02/2023] [Indexed: 08/27/2023]
Abstract
One of the most important problems in complex networks is the location of nodes that are essential or play a main role in the network. Nodes with main local roles are the centers of real communities. Communities are sets of nodes of complex networks and are densely connected internally. Choosing the right nodes as seeds of the communities is crucial in determining real communities. We propose a new centrality measure named density-based entropy centrality for the local identification of the most important nodes. It measures the entropy of the sum of the sizes of the maximal cliques to which each node and its neighbor nodes belong. The proposed centrality is a local measure for explaining the local influence of each node, which provides an efficient way to locally identify the most important nodes and for community detection because communities are local structures. It can be computed independently for individual vertices, for large networks, and for not well-specified networks. The use of the proposed density-based entropy centrality for community seed selection and community detection outperforms other centrality measures.
Collapse
Affiliation(s)
- Krista Rizman Žalik
- Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000 Maribor, Slovenia
| | | |
Collapse
|
45
|
Kates-Harbeck J, Desai MM. Social network structure and the spread of complex contagions from a population genetics perspective. Phys Rev E 2023; 108:024306. [PMID: 37723694 DOI: 10.1103/physreve.108.024306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 06/30/2023] [Indexed: 09/20/2023]
Abstract
Ideas, behaviors, and opinions spread through social networks. If the probability of spreading to a new individual is a nonlinear function of the fraction of the individuals' affected neighbors, such a spreading process becomes a "complex contagion." This nonlinearity does not typically appear with physically spreading infections, but instead can emerge when the concept that is spreading is subject to game theoretical considerations (e.g., for choices of strategy or behavior) or psychological effects such as social reinforcement and other forms of peer influence (e.g., for ideas, preferences, or opinions). Here we study how the stochastic dynamics of such complex contagions are affected by the underlying network structure. Motivated by simulations of complex contagions on real social networks, we present a framework for analyzing the statistics of contagions with arbitrary nonlinear adoption probabilities based on the mathematical tools of population genetics. The central idea is to use an effective lower-dimensional diffusion process to approximate the statistics of the contagion. This leads to a tradeoff between the effects of "selection" (microscopic tendencies for an idea to spread or die out), random drift, and network structure. Our framework illustrates intuitively several key properties of complex contagions: stronger community structure and network sparsity can significantly enhance the spread, while broad degree distributions dampen the effect of selection compared to random drift. Finally, we show that some structural features can exhibit critical values that demarcate regimes where global contagions become possible for networks of arbitrary size. Our results draw parallels between the competition of genes in a population and memes in a world of minds and ideas. Our tools provide insight into the spread of information, behaviors, and ideas via social influence, and highlight the role of macroscopic network structure in determining their fate.
Collapse
Affiliation(s)
| | - Michael M Desai
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
46
|
Peixoto TP, Kirkley A. Implicit models, latent compression, intrinsic biases, and cheap lunches in community detection. Phys Rev E 2023; 108:024309. [PMID: 37723811 DOI: 10.1103/physreve.108.024309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 08/02/2023] [Indexed: 09/20/2023]
Abstract
The task of community detection, which aims to partition a network into clusters of nodes to summarize its large-scale structure, has spawned the development of many competing algorithms with varying objectives. Some community detection methods are inferential, explicitly deriving the clustering objective through a probabilistic generative model, while other methods are descriptive, dividing a network according to an objective motivated by a particular application, making it challenging to compare these methods on the same scale. Here we present a solution to this problem that associates any community detection objective, inferential or descriptive, with its corresponding implicit network generative model. This allows us to compute the description length of a network and its partition under arbitrary objectives, providing a principled measure to compare the performance of different algorithms without the need for "ground-truth" labels. Our approach also gives access to instances of the community detection problem that are optimal to any given algorithm and in this way reveals intrinsic biases in popular descriptive methods, explaining their tendency to overfit. Using our framework, we compare a number of community detection methods on artificial networks and on a corpus of over 500 structurally diverse empirical networks. We find that more expressive community detection methods exhibit consistently superior compression performance on structured data instances, without having degraded performance on a minority of situations where more specialized algorithms perform optimally. Our results undermine the implications of the "no free lunch" theorem for community detection, both conceptually and in practice, since it is confined to unstructured data instances, unlike relevant community detection problems which are structured by requirement.
Collapse
Affiliation(s)
- Tiago P Peixoto
- Department of Network and Data Science, Central European University, 1100 Vienna, Austria
| | - Alec Kirkley
- Institute of Data Science, University of Hong Kong, Hong Kong; Department of Urban Planning and Design, University of Hong Kong, Hong Kong; and Urban Systems Institute, University of Hong Kong, Hong Kong
| |
Collapse
|
47
|
Kamuhanda D, Cui M, Tessone CJ. Illegal Community Detection in Bitcoin Transaction Networks. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1069. [PMID: 37510016 PMCID: PMC10378389 DOI: 10.3390/e25071069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 07/04/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023]
Abstract
Community detection is widely used in social networks to uncover groups of related vertices (nodes). In cryptocurrency transaction networks, community detection can help identify users that are most related to known illegal users. However, there are challenges in applying community detection in cryptocurrency transaction networks: (1) the use of pseudonymous addresses that are not directly linked to personal information make it difficult to interpret the detected communities; (2) on Bitcoin, a user usually owns multiple Bitcoin addresses, and nodes in transaction networks do not always represent users. Existing works on cluster analysis on Bitcoin transaction networks focus on addressing the later using different heuristics to cluster addresses that are controlled by the same user. This research focuses on illegal community detection containing one or more illegal Bitcoin addresses. We first investigate the structure of Bitcoin transaction networks and suitable community detection methods, then collect a set of illegal addresses and use them to label the detected communities. The results show that 0.06% of communities from daily transaction networks contain one or more illegal addresses when 2,313,344 illegal addresses are used to label the communities. The results also show that distance-based clustering methods and other methods depending on them, such as network representation learning, are not suitable for Bitcoin transaction networks while community quality optimization and label-propagation-based methods are the most suitable.
Collapse
Affiliation(s)
- Dany Kamuhanda
- UZH Blockchain Center, University of Zurich, 8050 Zurich, Switzerland
- Blockchain & Distributed Ledger Technologies Group, Department of Informatics, University of Zurich, 8050 Zurich, Switzerland
- Department of Mathematics, Science and Physical Education, University of Rwanda-College of Education, Rwamagana P.O. Box 55, Rwanda
| | - Mengtian Cui
- College of Computer Science and Engineering, Southwest Minzu University, Chengdu 610040, China
| | - Claudio J Tessone
- UZH Blockchain Center, University of Zurich, 8050 Zurich, Switzerland
- Blockchain & Distributed Ledger Technologies Group, Department of Informatics, University of Zurich, 8050 Zurich, Switzerland
| |
Collapse
|
48
|
Picasso-Risso C, Vilalta C, Sanhueza JM, Kikuti M, Schwartz M, Corzo CA. Disentangling transport movement patterns of trucks either transporting pigs or while empty within a swine production system before and during the COVID-19 epidemic. Front Vet Sci 2023; 10:1201644. [PMID: 37519995 PMCID: PMC10376687 DOI: 10.3389/fvets.2023.1201644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 06/19/2023] [Indexed: 08/01/2023] Open
Abstract
Transport of pigs between sites occurs frequently as part of genetic improvement and age segregation. However, a lack of transport biosecurity could have catastrophic implications if not managed properly as disease spread would be imminent. However, there is a lack of a comprehensive study of vehicle movement trends within swine systems in the Midwest. In this study, we aimed to describe and characterize vehicle movement patterns within one large Midwest swine system representative of modern pig production to understand movement trends and proxies for biosecurity compliance and identify potential risky behaviors that may result in a higher risk for infectious disease spread. Geolocation tracking devices recorded vehicle movements of a subset of trucks and trailers from a production system every 5 min and every time tracks entered a landmark between January 2019 and December 2020, before and during the COVID-19 pandemic. We described 6,213 transport records from 12 vehicles controlled by the company. In total, 114 predefined landmarks were included during the study period, representing 5 categories of farms and truck wash facilities. The results showed that trucks completed the majority (76.4%, 2,111/2,762) of the recorded movements. The seasonal distribution of incoming movements was similar across years (P > 0.05), while the 2019 winter and summer seasons showed higher incoming movements to sow farms than any other season, year, or production type (P < 0.05). More than half of the in-movements recorded occurred within the triad of sow farms, wean-to-market stage, and truck wash facilities. Overall, time spent at each landmark was 9.08% higher in 2020 than in 2019, without seasonal highlights, but with a notably higher time spent at truck wash facilities than any other type of landmark. Network analyses showed high connectivity among farms with identifiable clusters in the network. Furthermore, we observed a decrease in connectivity in 2020 compared with 2019, as indicated by the majority of network parameter values. Further network analysis will be needed to understand its impact on disease spread and control. However, the description and quantification of movement trends reported in this study provide findings that might be the basis for targeting infectious disease surveillance and control.
Collapse
Affiliation(s)
- Catalina Picasso-Risso
- Department of Veterinary Population Medicine, University of Minnesota, Saint Paul, MN, United States
- Facultad de Veterinaria, Universidad de la Republica, Montevideo, Uruguay
- Department of Veterinary Preventive Medicine, College of Veterinary Medicine, The Ohio State University, Columbus, OH, United States
| | - Carles Vilalta
- Department of Veterinary Population Medicine, University of Minnesota, Saint Paul, MN, United States
- Unitat mixta d'Investigació IRTA-UAB en Sanitat Animal, Centre de Recerca en Sanitat Animal, Campus de la Universitat Autònoma de Barcelona, Bellaterra, Spain
- IRTA, Programa de Sanitat Animal, Centre de Recerca en Sanitat Animal, Campus de la Universitat Autònoma de Barcelona, Bellaterra, Spain
| | - Juan Manuel Sanhueza
- Department of Veterinary Population Medicine, University of Minnesota, Saint Paul, MN, United States
- Departamento de Ciencias Veterinarias y Salud Publica, Facultad de Recursos Naturales, Universidad Católica de Temuco, Temuco, Chile
| | - Mariana Kikuti
- Department of Veterinary Population Medicine, University of Minnesota, Saint Paul, MN, United States
| | - Mark Schwartz
- Department of Veterinary Population Medicine, University of Minnesota, Saint Paul, MN, United States
| | - Cesar A. Corzo
- Department of Veterinary Population Medicine, University of Minnesota, Saint Paul, MN, United States
| |
Collapse
|
49
|
Kokoli M, Karatzas E, Baltoumas FA, Schneider R, Pafilis E, Paragkamian S, Doncheva NT, Jensen L, Pavlopoulos G. Arena3D web: interactive 3D visualization of multilayered networks supporting multiple directional information channels, clustering analysis and application integration. NAR Genom Bioinform 2023; 5:lqad053. [PMID: 37260509 PMCID: PMC10227371 DOI: 10.1093/nargab/lqad053] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 04/25/2023] [Accepted: 05/18/2023] [Indexed: 06/02/2023] Open
Abstract
Arena3Dweb is an interactive web tool that visualizes multi-layered networks in 3D space. In this update, Arena3Dweb supports directed networks as well as up to nine different types of connections between pairs of nodes with the use of Bézier curves. It comes with different color schemes (light/gray/dark mode), custom channel coloring, four node clustering algorithms which one can run on-the-fly, visualization in VR mode and predefined layer layouts (zig-zag, star and cube). This update also includes enhanced navigation controls (mouse orbit controls, layer dragging and layer/node selection), while its newly developed API allows integration with external applications as well as saving and loading of sessions in JSON format. Finally, a dedicated Cytoscape app has been developed, through which users can automatically send their 2D networks from Cytoscape to Arena3Dweb for 3D multi-layer visualization. Arena3Dweb is accessible at http://arena3d.pavlopouloslab.info or http://arena3d.org.
Collapse
Affiliation(s)
| | | | - Fotis A Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari16672, Greece
| | - Reinhard Schneider
- University of Luxembourg, Luxembourg Centre for Systems Biomedicine, Bioinformatics Core, Esch-sur-Alzette, Luxembourg
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, Heraklion 71003, Greece
| | - Savvas Paragkamian
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Former U.S. Base of Gournes, Heraklion 71003, Greece
- Department of Biology, University of Crete, Voutes University Campus, P.O. Box 2208, 70013 Heraklion, Crete, Greece
| | - Nadezhda T Doncheva
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen N DK-2200, Denmark
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen N DK-2200, Denmark
| | | |
Collapse
|
50
|
Fontanelli O, Guzmán P, Meneses-Viveros A, Hernández-Alvarez A, Flores-Garrido M, Olmedo-Alvarez G, Hernández-Rosales M, Anda-Jáuregui GD. Intermunicipal travel networks of Mexico during the COVID-19 pandemic. Sci Rep 2023; 13:8566. [PMID: 37237051 DOI: 10.1038/s41598-023-35542-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 05/19/2023] [Indexed: 05/28/2023] Open
Abstract
Human mobility networks are widely used for diverse studies in geography, sociology, and economics. In these networks, nodes usually represent places or regions and links refer to movement between them. They become essential when studying the spread of a virus, the planning of transit, or society's local and global structures. Therefore, the construction and analysis of human mobility networks are crucial for a vast number of real-life applications. This work presents a collection of networks that describe the human travel patterns between municipalities in Mexico in the 2020-2021 period. Using anonymized mobile location data, we constructed directed, weighted networks representing the volume of travels between municipalities. We analysed changes in global, local, and mesoscale network features. We observe that changes in these features are associated with factors such as COVID-19 restrictions and population size. In general, the implementation of restrictions at the start of the COVID-19 pandemic in early 2020, induced more intense changes in network features than later events, which had a less notable impact in network features. These networks will result very useful for researchers and decision-makers in the areas of transportation, infrastructure planning, epidemic control and network science at large.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Guillermo de Anda-Jáuregui
- Computational Genomics Division, National Institute of Genomics Medicine, Mexico City, Mexico.
- Investigadores e Investigadoras por México, National Council of Humanities, Sciences and Technologies, Mexico City, Mexico.
- Centro de Ciencias de la Complejidad (C3), Universidad Nacional Autónoma de México, Mexico City, Mexico.
| |
Collapse
|