1
|
Wagan AA, Talpur S, Narejo S. Clustering uncertain overlapping symptoms of multiple diseases in clinical diagnosis. PeerJ Comput Sci 2024; 10:e2315. [PMID: 39650487 PMCID: PMC11623175 DOI: 10.7717/peerj-cs.2315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 08/19/2024] [Indexed: 12/11/2024]
Abstract
In various fields, including medical science, datasets characterized by uncertainty are generated. Conventional clustering algorithms, designed for deterministic data, often prove inadequate when applied to uncertain data, posing significant challenges. Recent advancements have introduced clustering algorithms based on a possible world model, specifically designed to handle uncertainty, showing promising outcomes. However, these algorithms face two primary issues. First, they treat all possible worlds equally, neglecting the relative importance of each world. Second, they employ time-consuming and inefficient post-processing techniques for world selection. This research aims to create clusters of observed symptoms in patients, enabling the exploration of intricate relationships between symptoms. However, the symptoms dataset presents unique challenges, as it entails uncertainty and exhibits overlapping symptoms across multiple diseases, rendering the formation of mutually exclusive clusters impractical. Conventional similarity measures, assuming mutually exclusive clusters, fail to address these challenges effectively. Furthermore, the categorical nature of the symptoms dataset further complicates the analysis, as most similarity measures are optimized for numerical datasets. To overcome these scientific obstacles, this research proposes an innovative clustering algorithm that considers the precise weight of each symptom in every disease, facilitating the generation of overlapping clusters that accurately depict the associations between symptoms in the context of various diseases.
Collapse
Affiliation(s)
- Asif Ali Wagan
- Computer Systems Engineering, Mehran University of Engineering & Technology Jamshoro, Jamshoro, Sindh, Pakistan
| | - Shahnawaz Talpur
- Computer Systems Engineering, Mehran University of Engineering & Technology Jamshoro, Jamshoro, Sindh, Pakistan
| | - Sanam Narejo
- Computer Systems Engineering, Mehran University of Engineering & Technology Jamshoro, Jamshoro, Sindh, Pakistan
| |
Collapse
|
2
|
Xu J, Li C, Peng L, Ren Y, Shi X, Shen HT, Zhu X. Adaptive Feature Projection With Distribution Alignment for Deep Incomplete Multi-View Clustering. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2023; 32:1354-1366. [PMID: 37022865 DOI: 10.1109/tip.2023.3243521] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Incomplete multi-view clustering (IMVC) analysis, where some views of multi-view data usually have missing data, has attracted increasing attention. However, existing IMVC methods still have two issues: 1) they pay much attention to imputing or recovering the missing data, without considering the fact that the imputed values might be inaccurate due to the unknown label information, 2) the common features of multiple views are always learned from the complete data, while ignoring the feature distribution discrepancy between the complete and incomplete data. To address these issues, we propose an imputation-free deep IMVC method and consider distribution alignment in feature learning. Concretely, the proposed method learns the features for each view by autoencoders and utilizes an adaptive feature projection to avoid the imputation for missing data. All available data are projected into a common feature space, where the common cluster information is explored by maximizing mutual information and the distribution alignment is achieved by minimizing mean discrepancy. Additionally, we design a new mean discrepancy loss for incomplete multi-view learning and make it applicable in mini-batch optimization. Extensive experiments demonstrate that our method achieves the comparable or superior performance compared with state-of-the-art methods.
Collapse
|
3
|
Zuo L, Xu Y, Cheng C, Choo KKR. A Privacy-Preserving Semisupervised Algorithm Under Maximum Correntropy Criterion. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:6817-6830. [PMID: 34101601 DOI: 10.1109/tnnls.2021.3083535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Existing semisupervised learning approaches generally focus on the single-agent (centralized) setting, and hence, there is the risk of privacy leakage during joint data processing. At the same time, using the mean square error criterion in such approaches does not allow one to efficiently deal with problems involving non-Gaussian distribution. Thus, in this article, we present a novel privacy-preserving semisupervised algorithm under the maximum correntropy criterion (MCC). The proposed algorithm allows us to share data among different entities while effectively mitigating the risk of privacy leaks. In addition, under MCC, our proposed approach works well for data with non-Gaussian distribution noise. Our experiments on three different learning tasks demonstrate that our method distinctively outperforms the related algorithms in common regression learning scenarios.
Collapse
|
4
|
Graph-Represented Broad Learning System for Landslide Susceptibility Mapping in Alpine-Canyon Region. REMOTE SENSING 2022. [DOI: 10.3390/rs14122773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Zhouqu County is located at the intersection of two active structural belts in the east of the Qinghai-Tibet Plateau, which is a rare, high-incidence area of landslides, debris flow, and earthquakes on a global scale. The complex regional geological background, the fragile ecological environment, and the significant tectonic activities have caused great difficulties for the dynamic susceptibility assessment and prediction of landslides in the study area. Specifically, Zhouqu is a typical alpine-canyon region in geomorphology; currently there is still a lack of a landslide susceptibility assessment study for this particular type of area. Therefore, the development of landslide susceptibility mapping (LSM) in this area is of great significance for quickly grasping the regional landslide situation and formulating disaster reduction strategies. In this article, we propose a graph-represented learning algorithm named GBLS within a broad framework in order to better extract the spatially relevant characteristics of the geographical data and to quickly obtain the change pattern of landslide susceptibility according to the frequent variation (increase or decrease) of the data. Based on the broad structure, we construct a group of graph feature nodes through graph-represented learning to make better use of geometric correlation of data to upgrade the precision. The proposed method maintains the efficiency and effectiveness due to its broad structure, and even better, it is able to take advantage of incremental data to complete fast learning methodology without repeated calculation, thus avoiding time waste and massive computation consumption. Empirical results verify the excellent performance with high efficiency and generalization of GBLS on the 407 landslides in the study area inventoried by remote sensing interpretation and field investigation. Then, the landslide susceptibility map is drawn to visualize the landslide susceptibility assessment according to the result of GBLS with the highest AUC (0.982). The four most influential factors were ranked out as rainfall, NDVI, aspect, and Terrain Ruggedness Index. Our research provides a selection criterion that can be referenced for future research where GBLS is of great significance in LSM of the alpine-canyon region. It plays an important role in demonstrating and popularizing the research in the same type of landform environment. The LSM would help the government better prevent and confine the risk of landslide hazards in the alpine-canyon region of Zhouqu.
Collapse
|
5
|
|
6
|
Tan Q, Ye M, Ma AJ, Yang B, Yip TCF, Wong GLH, Yuen PC. Explainable Uncertainty-Aware Convolutional Recurrent Neural Network for Irregular Medical Time Series. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:4665-4679. [PMID: 33055037 DOI: 10.1109/tnnls.2020.3025813] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Influenced by the dynamic changes in the severity of illness, patients usually take examinations in hospitals irregularly, producing a large volume of irregular medical time-series data. Performing diagnosis prediction from the irregular medical time series is challenging because the intervals between consecutive records significantly vary along time. Existing methods often handle this problem by generating regular time series from the irregular medical records without considering the uncertainty in the generated data, induced by the varying intervals. Thus, a novel Uncertainty-Aware Convolutional Recurrent Neural Network (UA-CRNN) is proposed in this article, which introduces the uncertainty information in the generated data to boost the risk prediction. To tackle the complex medical time series with subseries of different frequencies, the uncertainty information is further incorporated into the subseries level rather than the whole sequence to seamlessly adjust different time intervals. Specifically, a hierarchical uncertainty-aware decomposition layer (UADL) is designed to adaptively decompose time series into different subseries and assign them proper weights in accordance with their reliabilities. Meanwhile, an Explainable UA-CRNN (eUA-CRNN) is proposed to exploit filters with different passbands to ensure the unity of components in each subseries and the diversity of components in different subseries. Furthermore, eUA-CRNN incorporates with an uncertainty-aware attention module to learn attention weights from the uncertainty information, providing the explainable prediction results. The extensive experimental results on three real-world medical data sets illustrate the superiority of the proposed method compared with the state-of-the-art methods.
Collapse
|
7
|
Kotary DK, Nanda SJ. Distributed clustering in peer to peer networks using multi-objective whale optimization. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106625] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
8
|
Zhao W, Zhang F, Lian H. Debiasing and Distributed Estimation for High-Dimensional Quantile Regression. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:2569-2577. [PMID: 31484140 DOI: 10.1109/tnnls.2019.2933467] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Distributed and parallel computing is becoming more important with the availability of extremely large data sets. In this article, we consider this problem for high-dimensional linear quantile regression. We work under the assumption that the coefficients in the regression model are sparse; therefore, a LASSO penalty is naturally used for estimation. We first extend the debiasing procedure, which is previously proposed for smooth parametric regression models to quantile regression. The technical challenges include dealing with the nondifferentiability of the loss function and the estimation of the unknown conditional density. In this article, the main objective is to derive a divide-and-conquer estimation approach using the debiased estimator which is useful under the big data setting. The effectiveness of distributed estimation is demonstrated using some numerical examples.
Collapse
|
9
|
Abstract
With the development of big data technology more and more perfect, many colleges and universities have begun to use it to analyze the construction work. In daily life, such as class, study, and entertainment, the campus network exists. The purpose of this article is to study the online behavior of users, analyze students’ use of the campus network by analyzing students, and not only have a clear understanding of the students’ online access but also feedback on the operation and maintenance of the campus network. Based on the big data, this article uses distributed clustering algorithm to study the online behavior of users. This article selects a college online user as the research object and studies and analyzes the online behavior of school users. This study found that the second-year student network usage is as high as 330,000, which is 60.98% more than the senior. In addition, the majority of student users spend most of their online time on the weekend, and the other time is not much different. The duration is concentrated within 1 h, 1–2 h, 2–3 h in these three time periods. By studying the user’s online behavior, you can understand the utilization rate of the campus network bandwidth resources and the distribution of the use of the network, to prevent students from indulging in the virtual network world, and to ensure that the network users can improve the online experience of the campus network while accessing the network resources reasonably. The research provides a reference for network administrators to adjust network bandwidth and optimize the network.
Collapse
Affiliation(s)
- Yan Wang
- School of Accounting and Finance, Xi’an Peihua University, Xi’an, Shaanxi, People’s Republic of China
| |
Collapse
|
10
|
Wu X, Zhang J, Wang FY. Stability-Based Generalization Analysis of Distributed Learning Algorithms for Big Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2020; 31:801-812. [PMID: 31071054 DOI: 10.1109/tnnls.2019.2910188] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
As one of the efficient approaches to deal with big data, divide-and-conquer distributed algorithms, such as the distributed kernel regression, bootstrap, structured perception training algorithms, and so on, are proposed and broadly used in learning systems. Some learning theories have been built to analyze the feasibility, approximation, and convergence bounds of these distributed learning algorithms. However, less work has been studied on the stability of these distributed learning algorithms. In this paper, we discuss the generalization bounds of distributed learning algorithms from the view of algorithmic stability. First, we introduce a definition of uniform distributed stability for distributed algorithms and study the distributed algorithms' generalization risk bounds. Then, we analyze the stability properties and generalization risk bounds of a kind of regularization-based distributed algorithms. Two generalization distributed risks obtained show that the generalization distributed risk bounds for the difference between their generalization distributed and empirical distributed/leave-one-computer-out risks are closely related to the size of samples n and the amount of working computers m as O(m/n1/2) . Furthermore, the results in this paper indicate that, for a good generalization regularized distributed kernel algorithm, the regularization parameter λ should be adjusted with the change of the term m/n1/2 . These theoretic discoveries provide the useful guidance when deploying the distributed algorithms on practical big data platforms. We explore our theoretic analyses through two simulation experiments. Finally, we discuss some problems about the sufficient amount of working computers, nonequivalence, and generalization for distributed learning. We show that the rules for the computation on one single computer may not always hold for distributed learning.
Collapse
|