1
|
Toledo Junior TJDO, Amancio DR, Romero RAF. Complex networks applied to political analysis: Group voting behavior in the Brazilian congress. PLoS One 2025; 20:e0319643. [PMID: 40228180 PMCID: PMC11996218 DOI: 10.1371/journal.pone.0319643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 01/23/2025] [Indexed: 04/16/2025] Open
Abstract
The Senate and the Chamber of Deputies constitute the Brazilian Congress and are responsible for the Brazilian legislative management. Complex networks were shown to be a suitable tool to analyze this type of system. Several researches explored party dynamics in the Chamber of Deputies, however, no attention has been given to the Senate. Previous works that have stated the necessity of a backbone extraction methodology to be used in these types of networks also failed to define an automatic backbone extraction methodology to uncover group structure in legislative networks, reverting to heuristics or subjective approaches. In this work, we explore both legislative houses and compare them to see their differences and similarities. We also systematize an automatic backbone extraction methodology. Further, we expand on previous analyses by bringing spectrum and government x opposition analysis based on voting data. Our results show that the Senate and the Chamber of Deputies have behaved differently during major events in Brazil over the second decade of the century. From the obtained results it is fair to say that the dynamics for both houses are different and that the best backbone extraction algorithm varies over time and is different for each house.
Collapse
Affiliation(s)
| | - Diego Raphael Amancio
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, Brazil
| | | |
Collapse
|
2
|
Vital A, Silva FN, Amancio DR. Comparing random walks in graph embedding and link prediction. PLoS One 2024; 19:e0312863. [PMID: 39504339 PMCID: PMC11540220 DOI: 10.1371/journal.pone.0312863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 10/14/2024] [Indexed: 11/08/2024] Open
Abstract
Random walks find extensive applications across various complex network domains, including embedding generation and link prediction. Despite the widespread utilization of random walks, the precise impact of distinct biases on embedding generation from sequence data and their subsequent effects on link prediction remain elusive. We conduct a comparative analysis of several random walk strategies, including the true self-avoiding random walk and the traditional random walk. We also analyze walks biased towards node degree and those with inverse node degree bias. Diverse adaptations of the node2vec algorithm to induce distinct exploratory behaviors were also investigated. Our empirical findings demonstrate that despite the varied behaviors inherent in these embeddings, only slight performance differences manifest in the context of link prediction. This implies the resilient recovery of network structure, regardless of the specific walk heuristic employed to traverse the network. Consequently, the results suggest that data generated from sequences governed by unknown mechanisms can be successfully reconstructed.
Collapse
Affiliation(s)
- Adilson Vital
- Institute of Mathematics and Computer Science, USP, São Carlos, SP, Brazil
| | - Filipi Nascimento Silva
- The Observatory on Social Media (OSoMe), Indiana University, Bloomington, Indiana, United States of America
| | | |
Collapse
|
3
|
Oliveira ON, Christino L, Oliveira MCF, Paulovich FV. Artificial Intelligence Agents for Materials Sciences. J Chem Inf Model 2023; 63:7605-7609. [PMID: 38084508 DOI: 10.1021/acs.jcim.3c01778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
The artificial intelligence (AI) tools based on large-language models may serve as a demonstration that we are reaching a groundbreaking new paradigm in which machines themselves will generate knowledge autonomously. This statement is based on the assumption that the ability to master natural languages is the ultimate frontier for this new paradigm and perhaps an essential step to achieving the so-called general artificial intelligence. Autonomous knowledge generation implies that a machine will be able, for instance, to retrieve and understand the contents of the scientific literature and provide interpretations for existing data, allowing it to propose and address new scientific problems. While one may assume that the continued development of AI tools exploiting large-language models, with more data used for training, may lead these systems to learn autonomously, this learning can be accelerated by devising human-assisted strategies to deal with specific tasks. For example, strategies may be implemented for AI tools to emulate the analysis of multivariate data by human experts or in identifying and explaining patterns in temporal series. In addition to generic AI tools, such as Chat AIs, one may conceive personal AI agents, potentially working together, that are likely to serve end users in the near future. In this perspective paper, we discuss the development of this type of agent, focusing on its architecture and requirements. As a proof-of-concept, we exemplify how such an AI agent could work to assist researchers in materials sciences.
Collapse
Affiliation(s)
- O N Oliveira
- University of São Paulo, São Carlos 13560-970, SP, Brazil
| | - L Christino
- Dalhousie University, Halifax B3H 4R2, Canada
- Eindhoven University of Technology (TU/e), Eindhoven 5600 MB, Netherlands
| | - M C F Oliveira
- University of São Paulo, São Carlos 13560-970, SP, Brazil
| | - F V Paulovich
- Eindhoven University of Technology (TU/e), Eindhoven 5600 MB, Netherlands
| |
Collapse
|
4
|
Tohalino JAV, Silva TC, Amancio DR. Using citation networks to evaluate the impact of text length on keyword extraction. PLoS One 2023; 18:e0294500. [PMID: 38011182 PMCID: PMC10681196 DOI: 10.1371/journal.pone.0294500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 11/02/2023] [Indexed: 11/29/2023] Open
Abstract
The identification of key concepts within unstructured data is of paramount importance in practical applications. Despite the abundance of proposed methods for extracting primary topics, only a few works investigated the influence of text length on the performance of keyword extraction (KE) methods. Specifically, many studies lean on abstracts and titles for content extraction from papers, leaving it uncertain whether leveraging the complete content of papers can yield consistent results. Hence, in this study, we employ a network-based approach to evaluate the concordance between keywords extracted from abstracts and those from the entire papers. Community detection methods are utilized to identify interconnected papers in citation networks. Subsequently, paper clusters are formed to identify salient terms within each cluster, employing a methodology akin to the term frequency-inverse document frequency (tf-idf) approach. Once each cluster has been endowed with its distinctive set of key terms, these selected terms are employed to serve as representative keywords at the paper level. The top-ranked words at the cluster level, which also appear in the abstract, are chosen as keywords for the paper. Our findings indicate that although various community detection methods used in KE yield similar levels of accuracy. Notably, text clustering approaches outperform all citation-based methods, while all approaches yield relatively low accuracy values. We also identified a lack of concordance between keywords extracted from the abstracts and those extracted from the corresponding full-text source. Considering that citations and text clustering yield distinct outcomes, combining them in hybrid approaches could offer improved performance.
Collapse
Affiliation(s)
| | | | - Diego R. Amancio
- Institute of Mathematics and Computer Science – USP, São Carlos, SP, Brazil
| |
Collapse
|
5
|
Churchill R, Singh L. Using topic-noise models to generate domain-specific topics across data sources. Knowl Inf Syst 2023; 65:2159-2186. [PMID: 36683608 PMCID: PMC9842404 DOI: 10.1007/s10115-022-01805-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 12/04/2022] [Accepted: 12/05/2022] [Indexed: 01/18/2023]
Abstract
Domain-specific document collections, such as data sets about the COVID-19 pandemic, politics, and sports, have become more common as platforms grow and develop better ways to connect people whose interests align. These data sets come from many different sources, ranging from traditional sources like open-ended surveys and newspaper articles to one of the dozens of online social media platforms. Most topic models are equipped to generate topics from one or more of these data sources, but models rarely work well across all types of documents. The main problem that many models face is the varying noise levels inherent in different types of documents. We propose topic-noise models, a new type of topic model that jointly models topic and noise distributions to produce a more accurate, flexible representation of documents regardless of their origin and varying qualities. Our topic-noise model, Topic Noise Discriminator (TND) approximates topic and noise distributions side-by-side with the help of word embedding spaces. While topic-noise models are important for the types of short, noisy documents that often originate on social media platforms, TND can also be used with more traditional data sources like newspapers. TND itself generates a noise distribution that when ensembled with other generative topic models can produce more coherent and diverse topic sets. We show the effectiveness of this approach using Latent Dirichlet Allocation (LDA), and demonstrate the ability of TND to improve the quality of LDA topics in noisy document collections. Finally, researchers are beginning to generate topics using multiple sources and finding that they need a way to identify a core set based on text from different sources. We propose using cross-source topic blending (CSTB), an approach that maps topics sets to an s-partite graph and identifies core topics that blend topics from across s sources by identifying subgraphs with certain linkage properties. We demonstrate the effectiveness of topic-noise models and CSTB empirically on large real-world data sets from multiple domains and data sources.
Collapse
Affiliation(s)
- Rob Churchill
- grid.213910.80000 0001 1955 1644Department of Computer Science, Georgetown University, 3700 O Street, Washington, D.C., 20007 USA
| | - Lisa Singh
- grid.213910.80000 0001 1955 1644Department of Computer Science, Georgetown University, 3700 O Street, Washington, D.C., 20007 USA
| |
Collapse
|
6
|
Network-based prediction of the disclosure of ideation about self-harm and suicide in online counseling sessions. COMMUNICATIONS MEDICINE 2022; 2:156. [PMID: 36474010 PMCID: PMC9723576 DOI: 10.1038/s43856-022-00222-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 11/23/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND In psychological services, the transition to the disclosure of ideation about self-harm and suicide (ISS) is a critical point warranting attention. This study developed and tested a succinct descriptor to predict such transitions in an online synchronous text-based counseling service. METHOD We analyzed two years' worth of counseling sessions (N = 49,770) from Open Up, a 24/7 service in Hong Kong. Sessions from Year 1 (N = 20,618) were used to construct a word affinity network (WAN), which depicts the semantic relationships between words. Sessions from Year 2 (N = 29,152), including 1168 with explicit ISS, were used to train and test the downstream ISS prediction model. We divided and classified these sessions into ISS blocks (ISSBs), blocks prior to ISSBs (PISSBs), and non-ISS blocks (NISSBs). To detect PISSB, we adopted complex network approaches to examine the distance among different types of blocks in WAN. RESULTS Our analyses find that words within a block tend to form a module in WAN and that network-based distance between modules is a reliable indicator of PISSB. The proposed model yields a c-statistic of 0.79 in identifying PISSB. CONCLUSIONS This simple yet robust network-based model could accurately predict the transition point of suicidal ideation prior to its explicit disclosure. It can potentially improve the preparedness and efficiency of help-providers in text-based counseling services for mitigating self-harm and suicide.
Collapse
|
7
|
Identification of topic evolution: network analytics with piecewise linear representation and word embedding. Scientometrics 2022. [DOI: 10.1007/s11192-022-04273-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
8
|
Abstract
Network-based procedures for topic detection in huge text collections offer an intuitive alternative to probabilistic topic models. We present in detail a method that is especially designed with the requirements of domain experts in mind. Like similar methods, it employs community detection in term co-occurrence graphs, but it is enhanced by including a resolution parameter that can be used for changing the targeted topic granularity. We also establish a term ranking and use semantic word-embedding for presenting term communities in a way that facilitates their interpretation. We demonstrate the application of our method with a widely used corpus of general news articles and show the results of detailed social-sciences expert evaluations of detected topics at various resolutions. A comparison with topics detected by Latent Dirichlet Allocation is also included. Finally, we discuss factors that influence topic interpretation.
Collapse
|
9
|
Jung S, Yoon WC. An alternative topic model based on Common Interest Authors for topic evolution analysis. J Informetr 2020. [DOI: 10.1016/j.joi.2020.101040] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
10
|
F. de Arruda H, Q. Marinho V, da F. Costa L, R. Amancio D. Paragraph-based representation of texts: A complex networks approach. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2018.12.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
11
|
Lima TS, de Arruda HF, Silva FN, Comin CH, Amancio DR, Costa LDF. The dynamics of knowledge acquisition via self-learning in complex networks. CHAOS (WOODBURY, N.Y.) 2018; 28:083106. [PMID: 30180654 DOI: 10.1063/1.5027007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Accepted: 07/16/2018] [Indexed: 06/08/2023]
Abstract
Studies regarding knowledge organization and acquisition are of great importance to understand areas related to science and technology. A common way to model the relationship between different concepts is through complex networks. In such representations, networks' nodes store knowledge and edges represent their relationships. Several studies that considered this type of structure and knowledge acquisition dynamics employed one or more agents to discover node concepts by walking on the network. In this study, we investigate a different type of dynamics adopting a single node as the "network brain." Such a brain represents a range of real systems such as the information about the environment that is acquired by a person and is stored in the brain. To store the discovered information in a specific node, the agents walk on the network and return to the brain. We propose three different dynamics and test them on several network models and on a real system, which is formed by journal articles and their respective citations. The results revealed that, according to the adopted walking models, the efficiency of self-knowledge acquisition has only a weak dependency on topology and search strategy.
Collapse
Affiliation(s)
- Thales S Lima
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13566-590, Brazil
| | - Henrique F de Arruda
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13566-590, Brazil
| | - Filipi N Silva
- São Carlos Institute of Physics, University of São Paulo, São Carlos, São Paulo 13566-590, Brazil
| | - Cesar H Comin
- Department of Computer Science, Federal University of São Carlos, São Carlos, São Paulo 13565-905, Brazil
| | - Diego R Amancio
- Institute of Mathematics and Computer Science, University of São Paulo, São Carlos, São Paulo 13566-590, Brazil
| | - Luciano da F Costa
- São Carlos Institute of Physics, University of São Paulo, São Carlos, São Paulo 13566-590, Brazil
| |
Collapse
|