1
|
Clustering mixed type data: a space structure-based approach. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01602-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
2
|
Ouadfel S, Abd Elaziz M. Efficient high-dimension feature selection based on enhanced equilibrium optimizer. EXPERT SYSTEMS WITH APPLICATIONS 2022; 187:115882. [DOI: 10.1016/j.eswa.2021.115882] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
3
|
Mau TN, Huynh VN. An LSH-based k-representatives clustering method for large categorical data. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.050] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
4
|
Li X, Wu Z, Zhao Z, Ding F, He D. A mixed data clustering algorithm with noise-filtered distribution centroid and iterative weight adjustment strategy. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.07.039] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
5
|
Outlier detection based on weighted neighbourhood information network for mixed-valued datasets. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.02.045] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
6
|
Tyler SR, Chun Y, Ribeiro VM, Grishina G, Grishin A, Hoffman GE, Do AN, Bunyavanich S. Merged Affinity Network Association Clustering: Joint multi-omic/clinical clustering to identify disease endotypes. Cell Rep 2021; 35:108975. [PMID: 33852839 PMCID: PMC8195153 DOI: 10.1016/j.celrep.2021.108975] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 01/25/2021] [Accepted: 03/18/2021] [Indexed: 12/21/2022] Open
Abstract
Although clinical and laboratory data have long been used to guide medical practice, this information is rarely integrated with multi-omic data to identify endotypes. We present Merged Affinity Network Association Clustering (MANAclust), a coding-free, automated pipeline enabling integration of categorical and numeric data spanning clinical and multi-omic profiles for unsupervised clustering to identify disease subsets. Using simulations and real-world data from The Cancer Genome Atlas, we demonstrate that MANAclust’s feature selection algorithms are accurate and outperform competitors. We also apply MANAclust to a clinically and multi-omically phenotyped asthma cohort. MANAclust identifies clinically and molecularly distinct clusters, including heterogeneous groups of “healthy controls” and viral and allergy-driven subsets of asthmatic subjects. We also find that subjects with similar clinical presentations have disparate molecular profiles, highlighting the need for additional testing to uncover asthma endotypes. This work facilitates data-driven personalized medicine through integration of clinical parameters with multi-omics. MANAclust is freely available at https://bitbucket.org/scottyler892/manaclust/src/master/. Clinical data commonly used in medical practice are underutilized in multi-omic analyses to identify disease endotypes. Tyler et al. present a python package called Merged Affinity Network Association Clustering (MANAclust) that automatically processes and integrates categorical and numeric data types, facilitating the inclusion of clinical data in multi-omic endotyping efforts.
Collapse
Affiliation(s)
- Scott R Tyler
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Yoojin Chun
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Victoria M Ribeiro
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Galina Grishina
- Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Alexander Grishin
- Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Gabriel E Hoffman
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Anh N Do
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Supinda Bunyavanich
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
7
|
Ren M, Wang Z, Yang G. A Self-Adaptive Weighted Fuzzy c-Means for Mixed-Type Data. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2020. [DOI: 10.1142/s1469026820500303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The influence of features on each cluster is not the same in a mixed-type dataset. Based on the rough set and shadow set theories, the fuzzy distribution centroid was defined to represent the clustering center of the discrete feature so that the fuzzy c-means algorithm (FCM) could be extended to cluster the data with both continuous and discrete features. Then, considering the different contributions of the features to each cluster, a new weighted objective function was constructed in accordance with the principles of fuzzy compactness and separation. Because the learning feature weight is the key step in feature-weighted FCM, this paper regarded the feature weight as a variable optimized in the clustering process and put forward a self-adaptive mixed-type weighted FCM. The experimental results showed that the algorithm could be effectively applied to a heterogeneous mixed-type dataset.
Collapse
Affiliation(s)
- Min Ren
- School of Mathematics and Quantitative Economics, Shandong University of Finance and Economics, Jinan, Shandong Province, P. R. China
| | - Zhihao Wang
- School of Management and Engineering, Shandong University of Finance and Economics, Jinan, Shandong Province, P. R. China
| | - Guangfen Yang
- School of Mathematics and Quantitative Economics, Shandong University of Finance and Economics, Jinan, Shandong Province, P. R. China
| |
Collapse
|
8
|
Gudin J, Mavroudi S, Korfiati A, Theofilatos K, Dietze D, Hurwitz P. Reducing Opioid Prescriptions by Identifying Responders on Topical Analgesic Treatment Using an Individualized Medicine and Predictive Analytics Approach. J Pain Res 2020; 13:1255-1266. [PMID: 32547186 PMCID: PMC7266406 DOI: 10.2147/jpr.s246503] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
PURPOSE Chronic pain is a life changing condition, and non-opioid treatments have been lately introduced to overcome the addictive nature of opioid therapies and their side effects. In the present study, we explore the potential of machine learning methods to discriminate chronic pain patients into ones who will benefit from such a treatment and ones who will not, aiming to personalize their treatment. PATIENTS AND METHODS In the current study, data from the OPERA study were used, with 631 chronic pain patients answering the Brief Pain Inventory (BPI) validated questionnaire along with supplemental questions before and after a follow-up period. A novel machine learning approach combining multi-objective optimization and support vector regression was used to build prediction models which can predict, using responses in the baseline, the four different outcomes of the study: total drugs change, total interference change, total severity change, and total complaints change. Data were split to training (504 patients) and testing (127 patients) sets and all results are measured on the independent test set. RESULTS The machine learning models extracted in the present study significantly overcame other state of the art machine learning methods which were deployed for comparative purposes. The experimental results indicated that the machine learning models can predict the outcomes of this study with considerably high accuracy (AUC 73.8-87.2%) and this allowed their incorporation in a decision support system for the selection of the treatment of chronic pain patients. CONCLUSION Results of this study revealed the potential of machine learning for an individualized medicine application for chronic pain therapies. Topical analgesics treatment were proven to be, in general, beneficial but carefully selecting with the suggested individualized medicine decision support system was able to decrease by approximately 10% the patients which would have been subscribed with topical analgesics without having benefits from it.
Collapse
Affiliation(s)
| | - Seferina Mavroudi
- Department of Nursing, School of Health Rehabilitation Sciences, University of Patras, Pátrai, Greece
- InSyBio Ltd, Winchester, UK
| | | | | | - Derek Dietze
- Metrics for Learning LLC, Queen Creek, Arizona, USA
| | - Peter Hurwitz
- Clarity Science LLC, Narragansett, Rhode Island, USA
| |
Collapse
|
9
|
|
10
|
|
11
|
Initial Seed Selection for Mixed Data Using Modified K-means Clustering Algorithm. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2019. [DOI: 10.1007/s13369-019-04121-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
12
|
Abstract
A simple and fast k-medoids algorithm that updates medoids by minimizing the total distance within clusters has been developed. Although it is simple and fast, as its name suggests, it nonetheless has neglected local optima and empty clusters that may arise. With the distance as an input to the algorithm, a generalized distance function is developed to increase the variation of the distances, especially for a mixed variable dataset. The variation of the distances is a crucial part of a partitioning algorithm due to different distances producing different outcomes. The experimental results of the simple k-medoids algorithm produce consistently good performances in various settings of mixed variable data. It also has a high cluster accuracy compared to other distance-based partitioning algorithms for mixed variable data.
Collapse
|
13
|
Jain P, Dixit VS. Recommendations with context aware framework using particle swarm optimization and unsupervised learning. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-179001] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Parul Jain
- Department of Computer Science, Atma Ram Sanatan Dharam College, University of Delhi, India
| | - Veer Sain Dixit
- Department of Computer Science, Atma Ram Sanatan Dharam College, University of Delhi, India
| |
Collapse
|
14
|
Ji J, Chen Y, Feng G, Zhao X, He F. Clustering mixed numeric and categorical data with artificial bee colony strategy. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-18146] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Jinchao Ji
- School of Information Science and Technology, Northeast Normal University, Changchun, China
- Institute of Computational Biology, Northeast Normal University, Changchun, China
- Key Laboratory of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Yongbing Chen
- School of Information Science and Technology, Northeast Normal University, Changchun, China
- Institute of Computational Biology, Northeast Normal University, Changchun, China
- Key Laboratory of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, China
| | - Guozhong Feng
- School of Information Science and Technology, Northeast Normal University, Changchun, China
- Institute of Computational Biology, Northeast Normal University, Changchun, China
- Key Laboratory of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, China
| | - Xiaowei Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun, China
- Institute of Computational Biology, Northeast Normal University, Changchun, China
- Key Laboratory of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, China
| | - Fei He
- School of Information Science and Technology, Northeast Normal University, Changchun, China
- Institute of Computational Biology, Northeast Normal University, Changchun, China
- Key Laboratory of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
15
|
Chen J, Lin X, Xuan Q, Xiang Y. FGCH: a fast and grid based clustering algorithm for hybrid data stream. APPL INTELL 2018. [DOI: 10.1007/s10489-018-1324-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
|
17
|
An Efficient Grid-Based K-Prototypes Algorithm for Sustainable Decision-Making on Spatial Objects. SUSTAINABILITY 2018. [DOI: 10.3390/su10082614] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Data mining plays a critical role in sustainable decision-making. Although the k-prototypes algorithm is one of the best-known algorithms for clustering both numeric and categorical data, clustering a large number of spatial objects with mixed numeric and categorical attributes is still inefficient due to complexity. In this paper, we propose an efficient grid-based k-prototypes algorithm, GK-prototypes, which achieves high performance for clustering spatial objects. The first proposed algorithm utilizes both maximum and minimum distance between cluster centers and a cell, which can reduce unnecessary distance calculation. The second proposed algorithm as an extension of the first proposed algorithm, utilizes spatial dependence; spatial data tends to be similar to objects that are close. Each cell has a bitmap index which stores the categorical values of all objects within the same cell for each attribute. This bitmap index can improve performance if the categorical data is skewed. Experimental results show that the proposed algorithms can achieve better performance than the existing pruning techniques of the k-prototypes algorithm.
Collapse
|
18
|
Gorzalczany MB, Rudzinski F. Generalized Self-Organizing Maps for Automatic Determination of the Number of Clusters and Their Multiprototypes in Cluster Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2018; 29:2833-2845. [PMID: 28600264 DOI: 10.1109/tnnls.2017.2704779] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper presents a generalization of self-organizing maps with 1-D neighborhoods (neuron chains) that can be effectively applied to complex cluster analysis problems. The essence of the generalization consists in introducing mechanisms that allow the neuron chain-during learning-to disconnect into subchains, to reconnect some of the subchains again, and to dynamically regulate the overall number of neurons in the system. These features enable the network-working in a fully unsupervised way (i.e., using unlabeled data without a predefined number of clusters)-to automatically generate collections of multiprototypes that are able to represent a broad range of clusters in data sets. First, the operation of the proposed approach is illustrated on some synthetic data sets. Then, this technique is tested using several real-life, complex, and multidimensional benchmark data sets available from the University of California at Irvine (UCI) Machine Learning repository and the Knowledge Extraction based on Evolutionary Learning data set repository. A sensitivity analysis of our approach to changes in control parameters and a comparative analysis with an alternative approach are also performed.
Collapse
|
19
|
Mortality prediction in intensive care units (ICUs) using a deep rule-based fuzzy classifier. J Biomed Inform 2018; 79:48-59. [DOI: 10.1016/j.jbi.2018.02.008] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Revised: 02/13/2018] [Accepted: 02/16/2018] [Indexed: 11/23/2022]
|
20
|
Sangam RS, Om H. Equi-Clustream: a framework for clustering time evolving mixed data. ADV DATA ANAL CLASSI 2018. [DOI: 10.1007/s11634-018-0316-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
21
|
|
22
|
|
23
|
|
24
|
de Amorim RC, Makarenkov V. Applying subclustering and Lp distance in Weighted K-Means with distributed centroids. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.08.018] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
25
|
Clustering Heterogeneous Data with k-Means by Mutual Information-Based Unsupervised Feature Transformation. ENTROPY 2015. [DOI: 10.3390/e17031535] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
26
|
|