1
|
Mirzaei G. Constructing gene similarity networks using co-occurrence probabilities. BMC Genomics 2023; 24:697. [PMID: 37990157 PMCID: PMC10662556 DOI: 10.1186/s12864-023-09780-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 11/01/2023] [Indexed: 11/23/2023] Open
Abstract
Gene similarity networks play important role in unraveling the intricate associations within diverse cancer types. Conventionally, gauging the similarity between genes has been approached through experimental methodologies involving chemical and molecular analyses, or through the lens of mathematical techniques. However, in our work, we have pioneered a distinctive mathematical framework, one rooted in the co-occurrence of attribute values and single point mutations, thereby establishing a novel approach for quantifying the dissimilarity or similarity among genes. Central to our approach is the recognition of mutations as key players in the evolutionary trajectory of cancer. Anchored in this understanding, our methodology hinges on the consideration of two categorical attributes: mutation type and nucleotide change. These attributes are pivotal, as they encapsulate the critical variations that can precipitate substantial changes in gene behavior and ultimately influence disease progression. Our study takes on the challenge of formulating similarity measures that are intrinsic to genes' categorical data. Taking into account the co-occurrence probability of attribute values within single point mutations, our innovative mathematical approach surpasses the boundaries of conventional methods. We thereby provide a robust and comprehensive means to assess gene similarity and take a significant step forward in refining the tools available for uncovering the subtle yet impactful associations within the complex realm of gene interactions in cancer.
Collapse
Affiliation(s)
- Golrokh Mirzaei
- Department of Computer Science and Engineering, The Ohio State University, Marion, USA.
| |
Collapse
|
2
|
Akyea RK, Ntaios G, Kontopantelis E, Georgiopoulos G, Soria D, Asselbergs FW, Kai J, Weng SF, Qureshi N. A population-based study exploring phenotypic clusters and clinical outcomes in stroke using unsupervised machine learning approach. PLOS DIGITAL HEALTH 2023; 2:e0000334. [PMID: 37703231 PMCID: PMC10499205 DOI: 10.1371/journal.pdig.0000334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 07/19/2023] [Indexed: 09/15/2023]
Abstract
Individuals developing stroke have varying clinical characteristics, demographic, and biochemical profiles. This heterogeneity in phenotypic characteristics can impact on cardiovascular disease (CVD) morbidity and mortality outcomes. This study uses a novel clustering approach to stratify individuals with incident stroke into phenotypic clusters and evaluates the differential burden of recurrent stroke and other cardiovascular outcomes. We used linked clinical data from primary care, hospitalisations, and death records in the UK. A data-driven clustering analysis (kamila algorithm) was used in 48,114 patients aged ≥ 18 years with incident stroke, from 1-Jan-1998 to 31-Dec-2017 and no prior history of serious vascular events. Cox proportional hazards regression was used to estimate hazard ratios (HRs) for subsequent adverse outcomes, for each of the generated clusters. Adverse outcomes included coronary heart disease (CHD), recurrent stroke, peripheral vascular disease (PVD), heart failure, CVD-related and all-cause mortality. Four distinct phenotypes with varying underlying clinical characteristics were identified in patients with incident stroke. Compared with cluster 1 (n = 5,201, 10.8%), the risk of composite recurrent stroke and CVD-related mortality was higher in the other 3 clusters (cluster 2 [n = 18,655, 38.8%]: hazard ratio [HR], 1.07; 95% CI, 1.02-1.12; cluster 3 [n = 10,244, 21.3%]: HR, 1.20; 95% CI, 1.14-1.26; and cluster 4 [n = 14,014, 29.1%]: HR, 1.44; 95% CI: 1.37-1.50). Similar trends in risk were observed for composite recurrent stroke and all-cause mortality outcome, and subsequent recurrent stroke outcome. However, results were not consistent for subsequent risk in CHD, PVD, heart failure, CVD-related mortality, and all-cause mortality. In this proof of principle study, we demonstrated how a heterogenous population of patients with incident stroke can be stratified into four relatively homogenous phenotypes with differential risk of recurrent and major cardiovascular outcomes. This offers an opportunity to revisit the stratification of care for patients with incident stroke to improve patient outcomes.
Collapse
Affiliation(s)
- Ralph K. Akyea
- PRISM Research Group, Centre for Academic Primary Care, School of Medicine, University of Nottingham, Nottingham, United Kingdom
| | - George Ntaios
- Department of Internal Medicine, Faculty of Medicine, School of Health Sciences, University of Thessaly, Larissa, Greece
| | - Evangelos Kontopantelis
- Division of Population Health, Health Services Research and Primary Care, School of Health Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre (MAHSC), The University of Manchester, Manchester, United Kingdom
- Division of Informatics, Imaging and Data Sciences, School of Health Sciences, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre (MAHSC), The University of Manchester, Manchester, United Kingdom
| | - Georgios Georgiopoulos
- School of Biomedical Engineering and Imaging Sciences, St Thomas Hospital, King’s College London, London, United Kingdom
| | - Daniele Soria
- School of Computing, University of Kent, Canterbury, United Kingdom
| | - Folkert W. Asselbergs
- Amsterdam University Medical Centers, Department of Cardiology, University of Amsterdam, Amsterdam, The Netherlands
- Health Data Research UK and Institute of Health Informatics, University College London, London, United Kingdom
| | - Joe Kai
- PRISM Research Group, Centre for Academic Primary Care, School of Medicine, University of Nottingham, Nottingham, United Kingdom
| | - Stephen F. Weng
- PRISM Research Group, Centre for Academic Primary Care, School of Medicine, University of Nottingham, Nottingham, United Kingdom
| | - Nadeem Qureshi
- PRISM Research Group, Centre for Academic Primary Care, School of Medicine, University of Nottingham, Nottingham, United Kingdom
| |
Collapse
|
3
|
The Lookup Table Regression Model for Histogram-Valued Symbolic Data. STATS 2022. [DOI: 10.3390/stats5040077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
This paper presents the Lookup Table Regression Model (LTRM) for histogram-valued symbolic data. We first transform the given symbolic data to a numerical data table by the quantile method. Then, under the selected response variable, we apply the Monotone Blocks Segmentation (MBS) to the obtained numerical data table. If the selected response variable and some remained explanatory variable(s) organize a monotone structure, the MBS generates a Lookup Table composed of interval values. For a given object, we search the nearest value of an explanatory variable, then the corresponding value of the response variable becomes the estimated value. If the response variable and the explanatory variable(s) are covariate but they follow to a non-monotonic structure, we need to divide the given data into several monotone substructures. For this purpose, we apply the hierarchical conceptual clustering to the given data, and we obtain Multiple Lookup Tables by applying the MBS to each of substructures. We show the usefulness of the proposed method by using an artificial data set and real data sets.
Collapse
|
4
|
Clustering mixed-type data using a probabilistic distance algorithm. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
5
|
Rodríguez SI, de Carvalho FDA. Soft subspace clustering of interval-valued data with regularizations. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107191] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
6
|
Unsupervised Feature Selection for Histogram-Valued Symbolic Data Using Hierarchical Conceptual Clustering. STATS 2021. [DOI: 10.3390/stats4020024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
This paper presents an unsupervised feature selection method for multi-dimensional histogram-valued data. We define a multi-role measure, called the compactness, based on the concept size of given objects and/or clusters described using a fixed number of equal probability bin-rectangles. In each step of clustering, we agglomerate objects and/or clusters so as to minimize the compactness for the generated cluster. This means that the compactness plays the role of a similarity measure between objects and/or clusters to be merged. Minimizing the compactness is equivalent to maximizing the dis-similarity of the generated cluster, i.e., concept, against the whole concept in each step. In this sense, the compactness plays the role of cluster quality. We also show that the average compactness of each feature with respect to objects and/or clusters in several clustering steps is useful as a feature effectiveness criterion. Features having small average compactness are mutually covariate and are able to detect a geometrically thin structure embedded in the given multi-dimensional histogram-valued data. We obtain thorough understandings of the given data via visualization using dendrograms and scatter diagrams with respect to the selected informative features. We illustrate the effectiveness of the proposed method by using an artificial data set and real histogram-valued data sets.
Collapse
|
7
|
Zhu X, Pedrycz W, Li Z. A Development of Granular Input Space in System Modeling. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:1639-1650. [PMID: 30892261 DOI: 10.1109/tcyb.2019.2899633] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In this paper, we elaborate on a new design approach to the development and analysis of granular input spaces and ensuing granular modeling. Given a numeric model (no matter what specific design methodology has been used to construct it and what architecture has been adopted), we form a granular input space through allocating a certain level of information granularity across the input variables. The formation of granular input space helps us gain a better insight into the ranking of input variables with respect to their precision (the variables with a lower level of information granularity need to be specified in a precise way when estimating the inputs). As a consequence, for granular inputs, the outputs of the granular model are also information granules (say, intervals, fuzzy sets, rough sets, etc.). It is shown that the process of forming granular input space can be sought as an optimization of allocation of information granularity across the input variables so that the specificity of the corresponding granular outputs of the granular model becomes the highest while coverage of data becomes maximized. The construction of granular input space dwells upon two fundamental principles of granular computing-the principle of justifiable granularity and the optimal allocation of information granularity. The quality of the granular input space is quantified in terms of the two conflicting criteria, that is, the specificity of the results produced by the granular model and the coverage of experimental data delivered by this model. In the ensuing optimization problem, one maximizes a product of specificity and coverage. Differential evolution is engaged in this optimization task. The experimental studies involve both synthetic dataset and data coming from the machine learning repository.
Collapse
|
8
|
Ogasawara Y, Kon M. Two clustering methods based on the Ward's method and dendrograms with interval-valued dissimilarities for interval-valued data. Int J Approx Reason 2021. [DOI: 10.1016/j.ijar.2020.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
9
|
|
10
|
The Measurement of Social Cohesion at Province Level in Poland Using Metric and Interval-Valued Data. SUSTAINABILITY 2020. [DOI: 10.3390/su12187664] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The notion of social cohesion is increasingly used in the political, economic and academic debate. Due to its multidimensional, the assessment of social cohesion is not easy, especially if it is conducted at a lower than national level of aggregation. The aim of the study is to assess social cohesion in provinces of Poland in 2018 using the hybrid approach involving multidimensional scaling and linear ordering based on an aggregate measure. This type of study is usually conducted using classic metric data. However, the traditional approach does not account for the variation between lower level units (i.e., districts). The authors propose a methodology which makes this possible. Additionally the results of assessment of the multidimensional phenomenon can be presented in a two-dimensional space. Classic metric data and symbolic interval-valued data (three data types: min-max, 1st decile and 9th decile, 2nd decile and 8th decile) are jointly represented in a single diagram. The consistency of the research method ensures comparability of results of linear ordering. Two criteria were used in the comparative analysis of four rankings of social cohesion. The results of the study clearly showed that the current level of social cohesion at the provinces level is geographically and historically dependent.
Collapse
|
11
|
|
12
|
|
13
|
Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data. ADV DATA ANAL CLASSI 2020. [DOI: 10.1007/s11634-020-00411-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
14
|
A Recommendation Mechanism for Under-Emphasized Tourist Spots Using Topic Modeling and Sentiment Analysis. SUSTAINABILITY 2019. [DOI: 10.3390/su12010320] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
With rapid advancements in internet applications, the growth rate of recommendation systems for tourists has skyrocketed. This has generated an enormous amount of travel-based data in the form of reviews, blogs, and ratings. However, most recommendation systems only recommend the top-rated places. Along with the top-ranked places, we aim to discover places that are often ignored by tourists owing to lack of promotion or effective advertising, referred to as under-emphasized locations. In this study, we use all relevant data, such as travel blogs, ratings, and reviews, in order to obtain optimal recommendations. We also aim to discover the latent factors that need to be addressed, such as food, cleanliness, and opening hours, and recommend a tourist place based on user history data. In this study, we propose a cross mapping table approach based on the location’s popularity, ratings, latent topics, and sentiments. An objective function for recommendation optimization is formulated based on these mappings. The baseline algorithms are latent Dirichlet allocation (LDA) and support vector machine (SVM). Our results show that the combined features of LDA, SVM, ratings, and cross mappings are conducive to enhanced performance. The main motivation of this study was to help tourist industries to direct more attention towards designing effective promotional activities for under-emphasized locations.
Collapse
|
15
|
|
16
|
Waste Management Analysis in Developing Countries through Unsupervised Classification of Mixed Data. SOCIAL SCIENCES-BASEL 2019. [DOI: 10.3390/socsci8060186] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The increase in global population and the improvement of living standards in developing countries has resulted in higher solid waste generation. Solid waste management increasingly represents a challenge, but it might also be an opportunity for the municipal authorities of these countries. To this end, the awareness of a variety of factors related to waste management and an efficacious in-depth analysis of them might prove to be particularly significant. For this purpose, and since data are both qualitative and quantitative, a cluster analysis specific for mixed data has been implemented on the dataset. The analysis allows us to distinguish two well-defined groups. The first one is poorer, less developed, and urbanized, with a consequent lower life expectancy of inhabitants. Consequently, it registers lower waste generation and lower C O 2 emissions. Surprisingly, it is more engaged in recycling and in awareness campaigns related to it. Since the cluster discrimination between the two groups is well defined, the second cluster registers the opposite tendency for all the analyzed variables. In conclusion, this kind of analysis offers a potential pathway for academics to work with policy-makers in moving toward the realization of waste management policies tailored to the local context.
Collapse
|
17
|
Akay Ö, Yüksel G. Hierarchical clustering of mixed variable panel data based on new distance. COMMUN STAT-SIMUL C 2019. [DOI: 10.1080/03610918.2019.1588306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Affiliation(s)
- Özlem Akay
- Department of Statistics, The Faculty of Science and Letters, Cukurova University, Adana, Turkey
| | - Güzin Yüksel
- Department of Statistics, The Faculty of Science and Letters, Cukurova University, Adana, Turkey
| |
Collapse
|
18
|
Castillo E, Morales DP, García A, Parrilla L, Ruiz VU, Álvarez-Bermejo JA. A clustering-based method for single-channel fetal heart rate monitoring. PLoS One 2018; 13:e0199308. [PMID: 29933366 PMCID: PMC6014640 DOI: 10.1371/journal.pone.0199308] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Accepted: 06/05/2018] [Indexed: 11/29/2022] Open
Abstract
Non-invasive fetal electrocardiography (ECG) is based on the acquisition of signals from abdominal surface electrodes. The composite abdominal signal consists of the maternal electrocardiogram along with the fetal electrocardiogram and other electrical interferences. These recordings allow for the acquisition of valuable and reliable information that helps ensure fetal well-being during pregnancy. This paper introduces a procedure for fetal heart rate extraction from a single-channel abdominal ECG signal. The procedure is composed of three main stages: a method based on wavelet for signal denoising, a new clustering-based methodology for detecting fetal QRS complexes, and a final stage to correct false positives and false negatives. The novelty of the procedure thus relies on using clustering techniques to classify singularities from the abdominal ECG into three types: maternal QRS complexes, fetal QRS complexes, and noise. The amplitude and time distance of all the local maxima followed by a local minimum were selected as features for the clustering classification. A wide set of real abdominal ECG recordings from two different databases, providing a large range of different characteristics, was used to illustrate the efficiency of the proposed method. The accuracy achieved shows that the proposed technique exhibits a competitve performance when compared to other recent works in the literature and a better performance over threshold-based techniques.
Collapse
Affiliation(s)
- Encarnación Castillo
- Department of Electronics and Computer Technology, Campus Universitario Fuentenueva, University of Granada, Granada, Spain
- * E-mail:
| | - Diego P. Morales
- Department of Electronics and Computer Technology, Campus Universitario Fuentenueva, University of Granada, Granada, Spain
| | - Antonio García
- Department of Electronics and Computer Technology, Campus Universitario Fuentenueva, University of Granada, Granada, Spain
| | - Luis Parrilla
- Department of Electronics and Computer Technology, Campus Universitario Fuentenueva, University of Granada, Granada, Spain
| | - Víctor U. Ruiz
- Department of Electronics and Computer Technology, Campus Universitario Fuentenueva, University of Granada, Granada, Spain
| | | |
Collapse
|
19
|
Zhu X, Pedrycz W, Li Z. Granular Data Description: Designing Ellipsoidal Information Granules. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4475-4484. [PMID: 28113415 DOI: 10.1109/tcyb.2016.2612226] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Granular computing (GrC) has emerged as a unified conceptual and processing framework. Information granules are fundamental constructs that permeate concepts and models of GrC. This paper is concerned with a design of a collection of meaningful, easily interpretable ellipsoidal information granules with the use of the principle of justifiable granularity by taking into consideration reconstruction abilities of the designed information granules. The principle of justifiable granularity supports designing of information granules based on numeric or granular evidence, and aims to achieve a compromise between justifiability and specificity of the information granules to be constructed. A two-stage development strategy behind the construction of justifiable information granules is considered. First, a collection of numeric prototypes is determined with the use of fuzzy clustering. Second, the lengths of the semi-axes of ellipsoidal information granules to be formed around such prototypes are optimized. Two optimization criteria are introduced and studied. Experimental studies involving synthetic data set and data sets coming from the machine learning repository are reported.
Collapse
|
20
|
|
21
|
Ramos-Guajardo AB, Grzegorzewski P. Distance-based linear discriminant analysis for interval-valued data. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.08.068] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
22
|
|
23
|
Jia H, Cheung YM, Liu J. A New Distance Metric for Unsupervised Learning of Categorical Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:1065-79. [PMID: 26068881 DOI: 10.1109/tnnls.2015.2436432] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Distance metric is the basis of many learning algorithms, and its effectiveness usually has a significant influence on the learning results. In general, measuring distance for numerical data is a tractable task, but it could be a nontrivial problem for categorical data sets. This paper, therefore, presents a new distance metric for categorical data based on the characteristics of categorical values. In particular, the distance between two values from one attribute measured by this metric is determined by both the frequency probabilities of these two values and the values of other attributes that have high interdependence with the calculated one. Dynamic attribute weight is further designed to adjust the contribution of each attribute-distance to the distance between the whole data objects. Promising experimental results on different real data sets have shown the effectiveness of the proposed distance metric.
Collapse
|
24
|
Jolliffe IT, Cadima J. Principal component analysis: a review and recent developments. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2016; 374:20150202. [PMID: 26953178 PMCID: PMC4792409 DOI: 10.1098/rsta.2015.0202] [Citation(s) in RCA: 1775] [Impact Index Per Article: 221.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 01/19/2016] [Indexed: 05/19/2023]
Abstract
Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori, hence making PCA an adaptive data analysis technique. It is adaptive in another sense too, since variants of the technique have been developed that are tailored to various different data types and structures. This article will begin by introducing the basic ideas of PCA, discussing what it can and cannot do. It will then describe some variants of PCA and their application.
Collapse
Affiliation(s)
- Ian T Jolliffe
- College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, UK
| | - Jorge Cadima
- Secção de Matemática (DCEB), Instituto Superior de Agronomia, Universidade de Lisboa, Tapada da Ajuda, Lisboa 1340-017, Portugal Centro de Estatística e Aplicações da Universidade de Lisboa (CEAUL), Lisboa, Portugal
| |
Collapse
|
25
|
Carvalho FDAD, Bertrand P, Simões EC. Batch SOM algorithms for interval-valued data with automatic weighting of the variables. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.11.084] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
26
|
Kim J. A Divisive Clustering for Mixed Feature-Type Symbolic Data. KOREAN JOURNAL OF APPLIED STATISTICS 2015. [DOI: 10.5351/kjas.2015.28.6.1147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
27
|
Filiz M, Trumpower D. Exploring the Mobile Structural Assessment Tool: Concept Maps for Learning Website. REVISTA COLOMBIANA DE ESTADÍSTICA 2014. [DOI: 10.15446/rce.v37n2spe.47939] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
28
|
Principal component analysis for probabilistic symbolic data: a more generic and accurate algorithm. ADV DATA ANAL CLASSI 2014. [DOI: 10.1007/s11634-014-0178-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
29
|
|
30
|
D’Urso P, De Giovanni L, Massari R. Trimmed fuzzy clustering for interval-valued data. ADV DATA ANAL CLASSI 2014. [DOI: 10.1007/s11634-014-0169-3] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
31
|
Ferraro MB, Giordani P. On possibilistic clustering with repulsion constraints for imprecise data. Inf Sci (N Y) 2013. [DOI: 10.1016/j.ins.2013.04.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
32
|
|
33
|
Kim J, Billard L. Dissimilarity measures and divisive clustering for symbolic multimodal-valued data. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2012.03.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
34
|
Trépos R, Salleb-Aouissi A, Cordier MO, Masson V, Gascuel-Odoux C. Building actions from classification rules. Knowl Inf Syst 2012. [DOI: 10.1007/s10115-011-0466-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
35
|
Su SF, Chuang CC, Tao CW, Jeng JT, Hsiao CC. Radial basis function networks with linear interval regression weights for symbolic interval data. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS. PART B, CYBERNETICS : A PUBLICATION OF THE IEEE SYSTEMS, MAN, AND CYBERNETICS SOCIETY 2012; 42:69-80. [PMID: 21859627 DOI: 10.1109/tsmcb.2011.2161468] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
This paper introduces a new structure of radial basis function networks (RBFNs) that can successfully model symbolic interval-valued data. In the proposed structure, to handle symbolic interval data, the Gaussian functions required in the RBFNs are modified to consider interval distance measure, and the synaptic weights of the RBFNs are replaced by linear interval regression weights. In the linear interval regression weights, the lower and upper bounds of the interval-valued data as well as the center and range of the interval-valued data are considered. In addition, in the proposed approach, two stages of learning mechanisms are proposed. In stage 1, an initial structure (i.e., the number of hidden nodes and the adjustable parameters of radial basis functions) of the proposed structure is obtained by the interval competitive agglomeration clustering algorithm. In stage 2, a gradient-descent kind of learning algorithm is applied to fine-tune the parameters of the radial basis function and the coefficients of the linear interval regression weights. Various experiments are conducted, and the average behavior of the root mean square error and the square of the correlation coefficient in the framework of a Monte Carlo experiment are considered as the performance index. The results clearly show the effectiveness of the proposed structure.
Collapse
Affiliation(s)
- Shun-Feng Su
- Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan
| | | | | | | | | |
Collapse
|
36
|
D’Urso P, De Giovanni L. Midpoint radius self-organizing maps for interval-valued data with telecommunications application. Appl Soft Comput 2011. [DOI: 10.1016/j.asoc.2011.01.006] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
37
|
Kim J, Billard L. A polythetic clustering process and cluster validity indexes for histogram-valued objects. Comput Stat Data Anal 2011. [DOI: 10.1016/j.csda.2011.01.011] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
38
|
|
39
|
|
40
|
Zuccolotto P. Symbolic missing data imputation in principal component analysis. Stat Anal Data Min 2011. [DOI: 10.1002/sam.10101] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
41
|
Constructing Stable Clustering Structure for Uncertain Data Set. ACTA ELECTROTECHNICA ET INFORMATICA 2011. [DOI: 10.2478/v10198-011-0028-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
42
|
de Carvalho FDA, de Souza RM. Unsupervised pattern recognition models for mixed feature-type symbolic data. Pattern Recognit Lett 2010. [DOI: 10.1016/j.patrec.2009.11.007] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
43
|
|
44
|
Lima Neto EDA, de Carvalho FDA. Constrained linear regression models for symbolic interval-valued variables. Comput Stat Data Anal 2010. [DOI: 10.1016/j.csda.2009.08.010] [Citation(s) in RCA: 138] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
45
|
|
46
|
Detection of chain structures embedded in multidimensional symbolic data. Pattern Recognit Lett 2009. [DOI: 10.1016/j.patrec.2009.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
47
|
|
48
|
Chavent M, Saracco J. On Central Tendency and Dispersion Measures for Intervals and Hypercubes. COMMUN STAT-THEOR M 2008. [DOI: 10.1080/03610920701678984] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
49
|
Lima Neto EDA, de Carvalho FDA. Centre and Range method for fitting a linear regression model to symbolic interval data. Comput Stat Data Anal 2008. [DOI: 10.1016/j.csda.2007.04.014] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
50
|
|