1
|
Simeon A, Radovanović M, Lončar-Turukalo T, Ceci M, Brdar S, Pio G. Multi-class boosting for the analysis of multiple incomplete views on microbiome data. BMC Bioinformatics 2024; 25:188. [PMID: 38745112 PMCID: PMC11092168 DOI: 10.1186/s12859-024-05767-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 04/04/2024] [Indexed: 05/16/2024] Open
Abstract
BACKGROUND Microbiome dysbiosis has recently been associated with different diseases and disorders. In this context, machine learning (ML) approaches can be useful either to identify new patterns or learn predictive models. However, data to be fed to ML methods can be subject to different sampling, sequencing and preprocessing techniques. Each different choice in the pipeline can lead to a different view (i.e., feature set) of the same individuals, that classical (single-view) ML approaches may fail to simultaneously consider. Moreover, some views may be incomplete, i.e., some individuals may be missing in some views, possibly due to the absence of some measurements or to the fact that some features are not available/applicable for all the individuals. Multi-view learning methods can represent a possible solution to consider multiple feature sets for the same individuals, but most existing multi-view learning methods are limited to binary classification tasks or cannot work with incomplete views. RESULTS We propose irBoost.SH, an extension of the multi-view boosting algorithm rBoost.SH, based on multi-armed bandits. irBoost.SH solves multi-class classification tasks and can analyze incomplete views. At each iteration, it identifies one winning view using adversarial multi-armed bandits and uses its predictions to update a shared instance weight distribution in a learning process based on boosting. In our experiments, performed on 5 multi-view microbiome datasets, the model learned by irBoost.SH always outperforms the best model learned from a single view, its closest competitor rBoost.SH, and the model learned by a multi-view approach based on feature concatenation, reaching an improvement of 11.8% of the F1-score in the prediction of the Autism Spectrum disorder and of 114% in the prediction of the Colorectal Cancer disease. CONCLUSIONS The proposed method irBoost.SH exhibited outstanding performances in our experiments, also compared to competitor approaches. The obtained results confirm that irBoost.SH can fruitfully be adopted for the analysis of microbiome data, due to its capability to simultaneously exploit multiple feature sets obtained through different sequencing and preprocessing pipelines.
Collapse
Affiliation(s)
- Andrea Simeon
- BioSense Institute, University of Novi Sad, dr Zorana Djindjića 1, Novi Sad, 21000, Serbia.
| | - Miloš Radovanović
- Faculty of Sciences, University of Novi Sad, Trg Dositeja Obradovića 3, Novi Sad, 21000, Serbia
| | - Tatjana Lončar-Turukalo
- Faculty of Technical Sciences, University of Novi Sad, Trg Dositeja Obradovića 6, Novi Sad, 21000, Serbia
| | - Michelangelo Ceci
- Department of Computer Science, University Bari Aldo Moro, Via Orabona 4, 70125, Bari, Italy
- Big Data Laboratory, National Interuniversity Consortium for Informatics (CINI), Via Ariosto 25, 00185, Rome, Italy
- Department of Knowledge Technologies, Jožef Stefan Institute, Jamova cesta 39, 1000, Ljubljana, Slovenia
| | - Sanja Brdar
- BioSense Institute, University of Novi Sad, dr Zorana Djindjića 1, Novi Sad, 21000, Serbia
| | - Gianvito Pio
- Department of Computer Science, University Bari Aldo Moro, Via Orabona 4, 70125, Bari, Italy.
- Big Data Laboratory, National Interuniversity Consortium for Informatics (CINI), Via Ariosto 25, 00185, Rome, Italy.
| |
Collapse
|
2
|
When Multi-view Classification Meets Ensemble Learning. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.02.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
3
|
Tang J, Xu W, Li J, Tian Y, Xu S. Multi-view learning methods with the LINEX loss for pattern classification. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107285] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
4
|
Zhao J, Liu N. A Safe Semi-supervised Classification Algorithm Using Multiple Classifiers Ensemble. Neural Process Lett 2021. [DOI: 10.1007/s11063-020-10191-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
5
|
Abstract
AbstractWe introduce a novel boosting algorithm called ‘KTBoost’ which combines kernel boosting and tree boosting. In each boosting iteration, the algorithm adds either a regression tree or reproducing kernel Hilbert space (RKHS) regression function to the ensemble of base learners. Intuitively, the idea is that discontinuous trees and continuous RKHS regression functions complement each other, and that this combination allows for better learning of functions that have parts with varying degrees of regularity such as discontinuities and smooth parts. We empirically show that KTBoost significantly outperforms both tree and kernel boosting in terms of predictive accuracy in a comparison on a wide array of data sets.
Collapse
|
6
|
Zhao W, Xu C, Guan Z, Liu Y. Multiview Concept Learning Via Deep Matrix Factorization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:814-825. [PMID: 32275617 DOI: 10.1109/tnnls.2020.2979532] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Multiview representation learning (MVRL) leverages information from multiple views to obtain a common representation summarizing the consistency and complementarity in multiview data. Most previous matrix factorization-based MVRL methods are shallow models that neglect the complex hierarchical information. The recently proposed deep multiview factorization models cannot explicitly capture consistency and complementarity in multiview data. We present the deep multiview concept learning (DMCL) method, which hierarchically factorizes the multiview data, and tries to explicitly model consistent and complementary information and capture semantic structures at the highest abstraction level. We explore two variants of the DMCL framework, DMCL-L and DMCL-N, with respectively linear/nonlinear transformations between adjacent layers. We propose two block coordinate descent-based optimization methods for DMCL-L and DMCL-N. We verify the effectiveness of DMCL on three real-world data sets for both clustering and classification tasks.
Collapse
|
7
|
Abstract
In the proposed study, we examined a multimodal biometric system having the utmost capability against spoof attacks. An enhanced anti-spoof capability is successfully demonstrated by choosing hand-related intrinsic modalities. In the proposed system, pulse response, hand geometry, and finger–vein biometrics are the three modalities of focus. The three modalities are combined using a fuzzy rule-based system that provides an accuracy of 92% on near-infrared (NIR) images. Besides that, we propose a new NIR hand images dataset containing a total of 111,000 images. In this research, hand geometry is treated as an intrinsic biometric modality by employing near-infrared imaging for human hands to locate the interphalangeal joints of human fingers. The L2 norm is calculated using the centroid of four pixel clusters obtained from the finger joint locations. This method produced an accuracy of 86% on the new NIR image dataset. We also propose finger–vein biometric identification using convolutional neural networks (CNNs). The CNN provided 90% accuracy on the new NIR image dataset. Moreover, we propose a robust system known as the pulse response biometric against spoof attacks involving fake or artificial human hands. The pulse response system identifies a live human body by applying a specific frequency pulse on the human hand. About 99% of the frequency response samples obtained from the human and non-human subjects were correctly classified by the pulse response biometric. Finally, we propose to combine all three modalities using the fuzzy inference system on the confidence score level, yielding 92% accuracy on the new near-infrared hand images dataset.
Collapse
|
8
|
Marino S, Zhao Y, Zhou N, Zhou Y, Toga AW, Zhao L, Jian Y, Yang Y, Chen Y, Wu Q, Wild J, Cummings B, Dinov ID. Compressive Big Data Analytics: An ensemble meta-algorithm for high-dimensional multisource datasets. PLoS One 2020; 15:e0228520. [PMID: 32857775 PMCID: PMC7455041 DOI: 10.1371/journal.pone.0228520] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 08/11/2020] [Indexed: 11/18/2022] Open
Abstract
Health advances are contingent on continuous development of new methods and approaches to foster data-driven discovery in the biomedical and clinical sciences. Open-science and team-based scientific discovery offer hope for tackling some of the difficult challenges associated with managing, modeling, and interpreting of large, complex, and multisource data. Translating raw observations into useful information and actionable knowledge depends on effective domain-independent reproducibility, area-specific replicability, data curation, analysis protocols, organization, management and sharing of health-related digital objects. This study expands the functionality and utility of an ensemble semi-supervised machine learning technique called Compressive Big Data Analytics (CBDA). Applied to high-dimensional data, CBDA (1) identifies salient features and key biomarkers enabling reliable and reproducible forecasting of binary, multinomial and continuous outcomes (i.e., feature mining); and (2) suggests the most accurate algorithms/models for predictive analytics of the observed data (i.e., model mining). The method relies on iterative subsampling, combines function optimization and statistical inference, and generates ensemble predictions for observed univariate outcomes. The novelty of this study is highlighted by a new and expanded set of CBDA features including (1) efficiently handling extremely large datasets (>100,000 cases and >1,000 features); (2) generalizing the internal and external validation steps; (3) expanding the set of base-learners for joint ensemble prediction; (4) introducing an automated selection of CBDA specifications; and (5) providing mechanisms to assess CBDA convergence, evaluate the prediction accuracy, and measure result consistency. To ground the mathematical model and the corresponding computational algorithm, CBDA 2.0 validation utilizes synthetic datasets as well as a population-wide census-like study. Specifically, an empirical validation of the CBDA technique is based on a translational health research using a large-scale clinical study (UK Biobank), which includes imaging, cognitive, and clinical assessment data. The UK Biobank archive presents several difficult challenges related to the aggregation, harmonization, modeling, and interrogation of the information. These problems are related to the complex longitudinal structure, variable heterogeneity, feature multicollinearity, incongruency, and missingness, as well as violations of classical parametric assumptions. Our results show the scalability, efficiency, and usability of CBDA to interrogate complex data into structural information leading to derived knowledge and translational action. Applying CBDA 2.0 to the UK Biobank case-study allows predicting various outcomes of interest, e.g., mood disorders and irritability, and suggests new and exciting avenues of evidence-based research in the context of identifying, tracking, and treating mental health and aging-related diseases. Following open-science principles, we share the entire end-to-end protocol, source-code, and results. This facilitates independent validation, result reproducibility, and team-based collaborative discovery.
Collapse
Affiliation(s)
- Simeone Marino
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yi Zhao
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Nina Zhou
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yiwang Zhou
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Arthur W. Toga
- Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, Los Angeles, California, United States of America
| | - Lu Zhao
- Laboratory of Neuro Imaging, USC Stevens Neuroimaging and Informatics Institute, Keck School of Medicine of USC, University of Southern California, Los Angeles, California, United States of America
| | - Yingsi Jian
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yichen Yang
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yehu Chen
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Qiucheng Wu
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jessica Wild
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Brandon Cummings
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
- Michigan Center for Integrative Research in Critical Care, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Ivo D. Dinov
- Statistics Online Computational Resource, Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Michigan Institute for Data Science, University of Michigan, Ann Arbor, Michigan, United States of America
- Neuroscience Graduate Program, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
9
|
Zhang C, Cheng J, Tian Q. Multi-View Image Classification With Visual, Semantic And View Consistency. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2019; 29:617-627. [PMID: 31425078 DOI: 10.1109/tip.2019.2934576] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Multi-view visual classification methods have been widely applied to use discriminative information of different views. This strategy has been proven very effective by many researchers. On the one hand, images are often treated independently without fully considering their visual and semantic correlations. On the other hand, view consistency is often ignored. To solve these problems, in this paper, we propose a novel multi-view image classification method with visual, semantic and view consistency (VSVC). For each image, we linearly combine multi-view information for image classification. The combination parameters are determined by considering both the classification loss and the visual, semantic and view consistency. Visual consistency is imposed by ensuring that visually similar images of the same view are predicted to have similar values. For semantic consistency, we impose the locality constraint that nearby images should be predicted to have the same class by multiview combination. View consistency is also used to ensure that similar images have consistent multi-view combination parameters. An alternative optimization strategy is used to learn the combination parameters. To evaluate the effectiveness of VSVC, we perform image classification experiments on several public datasets. The experimental results on these datasets show the effectiveness of the proposed VSVC method.
Collapse
|
10
|
|
11
|
Mohaghegh Neyshabouri M, Gokcesu K, Gokcesu H, Ozkan H, Kozat SS. Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:923-937. [PMID: 30072350 DOI: 10.1109/tnnls.2018.2854796] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
We propose an online algorithm for sequential learning in the contextual multiarmed bandit setting. Our approach is to partition the context space and, then, optimally combine all of the possible mappings between the partition regions and the set of bandit arms in a data-driven manner. We show that in our approach, the best mapping is able to approximate the best arm selection policy to any desired degree under mild Lipschitz conditions. Therefore, we design our algorithm based on the optimal adaptive combination and asymptotically achieve the performance of the best mapping as well as the best arm selection policy. This optimality is also guaranteed to hold even in adversarial environments since we do not rely on any statistical assumptions regarding the contexts or the loss of the bandit arms. Moreover, we design an efficient implementation for our algorithm using various hierarchical partitioning structures, such as lexicographical or arbitrary position splitting and binary trees (BTs) (and several other partitioning examples). For instance, in the case of BT partitioning, the computational complexity is only log-linear in the number of regions in the finest partition. In conclusion, we provide significant performance improvements by introducing upper bounds (with respect to the best arm selection policy) that are mathematically proven to vanish in the average loss per round sense at a faster rate compared to the state of the art. Our experimental work extensively covers various scenarios ranging from bandit settings to multiclass classification with real and synthetic data. In these experiments, we show that our algorithm is highly superior to the state-of-the-art techniques while maintaining the introduced mathematical guarantees and a computationally decent scalability.
Collapse
|
12
|
El-Manzalawy Y, Hsieh TY, Shivakumar M, Kim D, Honavar V. Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data. BMC Med Genomics 2018; 11:71. [PMID: 30255801 PMCID: PMC6157248 DOI: 10.1186/s12920-018-0388-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Large-scale collaborative precision medicine initiatives (e.g., The Cancer Genome Atlas (TCGA)) are yielding rich multi-omics data. Integrative analyses of the resulting multi-omics data, such as somatic mutation, copy number alteration (CNA), DNA methylation, miRNA, gene expression, and protein expression, offer tantalizing possibilities for realizing the promise and potential of precision medicine in cancer prevention, diagnosis, and treatment by substantially improving our understanding of underlying mechanisms as well as the discovery of novel biomarkers for different types of cancers. However, such analyses present a number of challenges, including heterogeneity, and high-dimensionality of omics data. METHODS We propose a novel framework for multi-omics data integration using multi-view feature selection. We introduce a novel multi-view feature selection algorithm, MRMR-mv, an adaptation of the well-known Min-Redundancy and Maximum-Relevance (MRMR) single-view feature selection algorithm to the multi-view setting. RESULTS We report results of experiments using an ovarian cancer multi-omics dataset derived from the TCGA database on the task of predicting ovarian cancer survival. Our results suggest that multi-view models outperform both view-specific models (i.e., models trained and tested using a single type of omics data) and models based on two baseline data fusion methods. CONCLUSIONS Our results demonstrate the potential of multi-view feature selection in integrative analyses and predictive modeling from multi-omics data.
Collapse
Affiliation(s)
- Yasser El-Manzalawy
- Artificial Intelligence Research Laboratory, College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA.,The Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA, 16802, USA.,The Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, PA, 16802, USA
| | - Tsung-Yu Hsieh
- Artificial Intelligence Research Laboratory, College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA.,School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, PA, 16802, USA.,The Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA, 16802, USA
| | - Manu Shivakumar
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - Dokyoon Kim
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, USA. .,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA.
| | - Vasant Honavar
- Artificial Intelligence Research Laboratory, College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, 16802, USA. .,The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16802, USA. .,School of Electrical Engineering and Computer Science, Pennsylvania State University, University Park, PA, 16802, USA. .,The Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA, 16802, USA. .,The Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
13
|
Tang J, Tian Y, Liu X, Li D, Lv J, Kou G. Improved multi-view privileged support vector machine. Neural Netw 2018; 106:96-109. [PMID: 30048781 DOI: 10.1016/j.neunet.2018.06.017] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2017] [Revised: 05/24/2018] [Accepted: 06/29/2018] [Indexed: 10/28/2022]
Abstract
Multi-view learning (MVL) concentrates on the problem of learning from the data represented by multiple distinct feature sets. The consensus and complementarity principles play key roles in multi-view modeling. By exploiting the consensus principle or the complementarity principle among different views, various successful support vector machine (SVM)-based multi-view learning models have been proposed for performance improvement. Recently, a framework of learning using privileged information (LUPI) has been proposed to model data with complementary information. By bridging connections between the LUPI paradigm and multi-view learning, we have presented a privileged SVM-based two-view classification model, named PSVM-2V, satisfying both principles simultaneously. However, it can be further improved in these three aspects: (1) fully unleash the power of the complementary information among different views; (2) extend to multi-view case; (3) construct a more efficient optimization solver. Therefore, in this paper, we propose an improved privileged SVM-based model for multi-view learning, termed as IPSVM-MV. It directly follows the standard LUPI model to fully utilize the multi-view complementary information; also it is a general model for multi-view scenario, and an alternating direction method of multipliers (ADMM) is employed to solve the corresponding optimization problem efficiently. Further more, we theoretically analyze the performance of IPSVM-MV from the viewpoints of the consensus principle and the generalization error bound. Experimental results on 75 binary data sets demonstrate the effectiveness of the proposed method; here we mainly concentrate on two-view case to compare with state-of-the-art methods.
Collapse
Affiliation(s)
- Jingjing Tang
- School of Business Administration, Southwestern University of Finance and Economics, Chengdu 611130, China.
| | - Yingjie Tian
- Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing 100190, China; School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190, China.
| | - Xiaohui Liu
- Department of Computer Science, Brunel University London, Uxbridge, Middlesex, UB8 3PH, UK.
| | - Dewei Li
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Jia Lv
- College of Computer and Information Sciences, Chongqing Normal University, Chongqing, 401331, China.
| | - Gang Kou
- School of Business Administration, Southwestern University of Finance and Economics, Chengdu 611130, China.
| |
Collapse
|