1
|
Zaheer MZ, Lee JH, Mahmood A, Astrid M, Lee SI. Stabilizing Adversarially Learned One-Class Novelty Detection Using Pseudo Anomalies. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2022; 31:5963-5975. [PMID: 36094978 DOI: 10.1109/tip.2022.3204217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Recently, anomaly scores have been formulated using reconstruction loss of the adversarially learned generators and/or classification loss of discriminators. Unavailability of anomaly examples in the training data makes optimization of such networks challenging. Attributed to the adversarial training, performance of such models fluctuates drastically with each training step, making it difficult to halt the training at an optimal point. In the current study, we propose a robust anomaly detection framework that overcomes such instability by transforming the fundamental role of the discriminator from identifying real vs. fake data to distinguishing good vs. bad quality reconstructions. For this purpose, we propose a method that utilizes the current state as well as an old state of the same generator to create good and bad quality reconstruction examples. The discriminator is trained on these examples to detect the subtle distortions that are often present in the reconstructions of anomalous data. In addition, we propose an efficient generic criterion to stop the training of our model, ensuring elevated performance. Extensive experiments performed on six datasets across multiple domains including image and video based anomaly detection, medical diagnosis, and network security, have demonstrated excellent performance of our approach.
Collapse
|
2
|
ILRA: Novelty Detection in Face-Based Intervener Re-Identification. Symmetry (Basel) 2019. [DOI: 10.3390/sym11091154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Transparency laws facilitate citizens to monitor the activities of political representatives. In this sense, automatic or manual diarization of parliamentary sessions is required, the latter being time consuming. In the present work, this problem is addressed as a person re-identification problem. Re-identification is defined as the process of matching individuals under different camera views. This paper, in particular, deals with open world person re-identification scenarios, where the captured probe in one camera is not always present in the gallery collected in another one, i.e., determining whether the probe belongs to a novel identity or not. This procedure is mandatory before matching the identity. In most cases, novelty detection is tackled applying a threshold founded in a linear separation of the identities. We propose a threshold-less approach to solve the novelty detection problem, which is based on a one-class classifier and therefore it does not need any user defined threshold. Unlike other approaches that combine audio-visual features, an Isometric LogRatio transformation of a posteriori (ILRA) probabilities is applied to local and deep computed descriptors extracted from the face, which exhibits symmetry and can be exploited in the re-identification process unlike audio streams. These features are used to train the one-class classifier to detect the novelty of the individual. The proposal is evaluated in real parliamentary session recordings that exhibit challenging variations in terms of pose and location of the interveners. The experimental evaluation explores different configuration sets where our system achieves significant improvement on the given scenario, obtaining an average F measure of 71.29% for online analyzed videos. In addition, ILRA performs better than face descriptors used in recent face-based closed world recognition approaches, achieving an average improvement of 1.6% with respect to a deep descriptor.
Collapse
|
3
|
Irigoien I, Arenas C. Diagnosis using clinical/pathological and molecular information. Stat Methods Med Res 2016; 25:2878-2894. [DOI: 10.1177/0962280214534410] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In diagnosis and classification diseases multiple outcomes, both molecular and clinical/pathological are routinely gathered on patients. In recent years, many approaches have been suggested for integrating gene expression (continuous data) with clinical/pathological data (usually categorical and ordinal data). This new area of research integrates both clinical and genomic data in order to improve our knowledge about diseases, and to capture the information which is lost in independent clinical or genomic studies. The related metric scaling distance is a not well-known, but very valuable distance to integrate clinical/pathological and molecular information. In this article, we present the use of the related metric scaling distance in biomedical research. We describe how this distance works, and we also explain why it may sometimes be preferred. We discuss the choice of the related metric scaling distance and compare it with other proximity measures to include both clinical and genetic information. Furthermore, we comment the choice of the related metric scaling distance when classical clustering or discriminant analysis based on distances are performed and compare the results with more complex cluster or discriminant procedures specially constructed for integrating clinical and molecular information. The use of the related metric scaling distance is illustrated on simulated experimental and four real data sets, a heart disease, and three cancer studies. The results present the flexibility and availability of this distance which gives competitive results.
Collapse
Affiliation(s)
- Itziar Irigoien
- Department of Computation and Artificial Intelligence, Euskal Herriko Unibertsitatea UPV-EHU, Donostia, Spain
| | - Concepción Arenas
- Departament d’Estadística, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
4
|
Towards application of one-class classification methods to medical data. ScientificWorldJournal 2014; 2014:730712. [PMID: 24778600 PMCID: PMC3980920 DOI: 10.1155/2014/730712] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Accepted: 02/24/2014] [Indexed: 11/17/2022] Open
Abstract
In the problem of one-class classification (OCC) one of the classes, the target class, has to be distinguished from all other possible objects, considered as nontargets. In many biomedical problems this situation arises, for example, in diagnosis, image based tumor recognition or analysis of electrocardiogram data. In this paper an approach to OCC based on a typicality test is experimentally compared with reference state-of-the-art OCC techniques--Gaussian, mixture of Gaussians, naive Parzen, Parzen, and support vector data description-using biomedical data sets. We evaluate the ability of the procedures using twelve experimental data sets with not necessarily continuous data. As there are few benchmark data sets for one-class classification, all data sets considered in the evaluation have multiple classes. Each class in turn is considered as the target class and the units in the other classes are considered as new units to be classified. The results of the comparison show the good performance of the typicality approach, which is available for high dimensional data; it is worth mentioning that it can be used for any kind of data (continuous, discrete, or nominal), whereas state-of-the-art approaches application is not straightforward when nominal variables are present.
Collapse
|
5
|
Kaddi CD, Parry RM, Wang MD. Multivariate hypergeometric similarity measure. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1505-16. [PMID: 24407308 PMCID: PMC4983430 DOI: 10.1109/tcbb.2013.28] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
We propose a similarity measure based on the multivariate hypergeometric distribution for the pairwise comparison of images and data vectors. The formulation and performance of the proposed measure are compared with other similarity measures using synthetic data. A method of piecewise approximation is also implemented to facilitate application of the proposed measure to large samples. Example applications of the proposed similarity measure are presented using mass spectrometry imaging data and gene expression microarray data. Results from synthetic and biological data indicate that the proposed measure is capable of providing meaningful discrimination between samples, and that it can be a useful tool for identifying potentially related samples in large-scale biological data sets.
Collapse
Affiliation(s)
- Chanchala D. Kaddi
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332
| | - R. Mitchell Parry
- Department of Computer Science, Appalachian State University, Boone, NC 28608
| | - May D. Wang
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332
| |
Collapse
|
6
|
Hennig C, Liao TF. How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification. J R Stat Soc Ser C Appl Stat 2013. [DOI: 10.1111/j.1467-9876.2012.01066.x] [Citation(s) in RCA: 146] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
7
|
Irigoien I, Mestres F, Arenas C. The depth problem: identifying the most representative units in a data group. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:161-172. [PMID: 23702552 DOI: 10.1109/tcbb.2012.147] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
This paper presents a solution to the problem of how to identify the units in groups or clusters that have the greatest degree of centrality and best characterize each group. This problem frequently arises in the classification of data such as types of tumor, gene expression profiles or general biomedical data. It is particularly important in the common context that many units do not properly belong to any cluster. Furthermore, in gene expression data classification, good identification of the most central units in a cluster enables recognition of the most important samples in a particular pathological process. We propose a new depth function that allows us to identify central units. As our approach is based on a measure of distance or dissimilarity between any pair of units, it can be applied to any kind of multivariate data (continuous, binary or multiattribute data). Therefore, it is very valuable in many biomedical applications, which usually involve noncontinuous data, such as clinical, pathological, or biological data sources. We validate the approach using artificial examples and apply it to empirical data. The results show the good performance of our statistical approach.
Collapse
Affiliation(s)
- Itziar Irigoien
- Department of Computation Science and Artificial Intelligence, University of the Basque Country, Donostia, Spain.
| | | | | |
Collapse
|
8
|
Irigoien I, Sierra B, Arenas C. ICGE: an R package for detecting relevant clusters and atypical units in gene expression. BMC Bioinformatics 2012; 13:30. [PMID: 22330431 PMCID: PMC3364157 DOI: 10.1186/1471-2105-13-30] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2011] [Accepted: 02/13/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Gene expression technologies have opened up new ways to diagnose and treat cancer and other diseases. Clustering algorithms are a useful approach with which to analyze genome expression data. They attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. An important problem associated with gene classification is to discern whether the clustering process can find a relevant partition as well as the identification of new genes classes. There are two key aspects to classification: the estimation of the number of clusters, and the decision as to whether a new unit (gene, tumor sample...) belongs to one of these previously identified clusters or to a new group. RESULTS ICGE is a user-friendly R package which provides many functions related to this problem: identify the number of clusters using mixed variables, usually found by applied biomedical researchers; detect whether the data have a cluster structure; identify whether a new unit belongs to one of the pre-identified clusters or to a novel group, and classify new units into the corresponding cluster. The functions in the ICGE package are accompanied by help files and easy examples to facilitate its use. CONCLUSIONS We demonstrate the utility of ICGE by analyzing simulated and real data sets. The results show that ICGE could be very useful to a broad research community.
Collapse
Affiliation(s)
- Itziar Irigoien
- Department of Computation Science and Artificial Intelligence, University of the Basque Country, Donostia, Spain
| | | | | |
Collapse
|
9
|
Irigoien I, Vives S, Arenas C. Microarray time course experiments: finding profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:464-475. [PMID: 21233526 DOI: 10.1109/tcbb.2009.79] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Time course studies with microarray techniques and experimental replicates are very useful in biomedical research. We present, in replicate experiments, an alternative approach to select and cluster genes according to a new measure for association between genes. First, the procedure normalizes and standardizes the expression profile of each gene, and then, identifies scaling parameters that will further minimize the distance between replicates of the same gene. Then, the procedure filters out genes with a flat profile, detects differences between replicates, and separates genes without significant differences from the rest. For this last group of genes, we define a mean profile for each gene and use it to compute the distance between two genes. Next, a hierarchical clustering procedure is proposed, a statistic is computed for each cluster to determine its compactness, and the total number of classes is determined. For the rest of the genes, those with significant differences between replicates, the procedure detects where the differences between replicates lie, and assigns each gene to the best fitting previously identified profile or defines a new profile. We illustrate this new procedure using simulated data and a representative data set arising from a microarray experiment with replication, and report interesting results.
Collapse
Affiliation(s)
- Itziar Irigoien
- Department of Computation Science and Artificial Intelligence, University of the Basque Country, Manuel de Lardizabal Pasealckua 1, 20080 Donostia, Spain.
| | | | | |
Collapse
|
10
|
Wu FX, Huan J. Guest editorial: Special focus on bioinformatics and systems biology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:292-293. [PMID: 21298823 DOI: 10.1109/tcbb.2011.16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
|
11
|
|
12
|
Umek L, Zupan B, Toplak M, Morin A, Chauchat JH, Makovec G, Smrke D. Subgroup Discovery in Data Sets with Multi–dimensional Responses: A Method and a Case Study in Traumatology. Artif Intell Med 2009. [DOI: 10.1007/978-3-642-02976-9_39] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|