1
|
Jayashree SR, Dias G, Andrew JJ, Saha S, Maurel F, Ferrari S. Multimodal Web Page Segmentation Using Self-organized Multi-objective Clustering. ACM T INFORM SYST 2022. [DOI: 10.1145/3480966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Web page segmentation (WPS) aims to break a web page into different segments with coherent intra- and inter-semantics. By evidencing the morpho-dispositional semantics of a web page, WPS has traditionally been used to demarcate informative from non-informative content, but it has also evidenced its key role within the context of non-linear access to web information for visually impaired people. For that purpose, a great deal of ad hoc solutions have been proposed that rely on visual, logical, and/or text cues. However, such methodologies highly depend on manually tuned heuristics and are parameter-dependent. To overcome these drawbacks, principled frameworks have been proposed that provide the theoretical bases to achieve optimal solutions. However, existing methodologies only combine few discriminant features and do not define strategies to automatically select the optimal number of segments. In this article, we present a multi-objective clustering technique called MCS that relies on
\( K \)
-means, in which (1) visual, logical, and text cues are all combined in a early fusion manner and (2) an evolutionary process automatically discovers the optimal number of clusters (segments) as well as the correct positioning of seeds. As such, our proposal is parameter-free, combines many different modalities, does not depend on manually tuned heuristics, and can be run on any web page without any constraint. An exhaustive evaluation over two different tasks, where (1) the number of segments must be discovered or (2) the number of clusters is fixed with respect to the task at hand, shows that MCS drastically improves over most competitive and up-to-date algorithms for a wide variety of external and internal validation indices. In particular, results clearly evidence the impact of the visual and logical modalities towards segmentation performance.
Collapse
Affiliation(s)
| | - Gaël Dias
- Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, Caen, France
| | | | - Sriparna Saha
- Indian Institute of Technology Patna, Bihar, Patna, India
| | - Fabrice Maurel
- Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, Caen, France
| | | |
Collapse
|
2
|
Ouadfel S, Abd Elaziz M. A multi-objective gradient optimizer approach-based weighted multi-view clustering. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2021; 106:104480. [DOI: 10.1016/j.engappai.2021.104480] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
|
3
|
Song W, Wang W, Dai DQ. Subtype-WESLR: identifying cancer subtype with weighted ensemble sparse latent representation of multi-view data. Brief Bioinform 2021; 23:6381248. [PMID: 34607358 DOI: 10.1093/bib/bbab398] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 08/30/2021] [Accepted: 09/01/2021] [Indexed: 12/13/2022] Open
Abstract
The discovery of cancer subtypes has become much-researched topic in oncology. Dividing cancer patients into subtypes can provide personalized treatments for heterogeneous patients. High-throughput technologies provide multiple omics data for cancer subtyping. Integration of multi-view data is used to identify cancer subtypes in many computational methods, which obtain different subtypes for the same cancer, even using the same multi-omics data. To a certain extent, these subtypes from distinct methods are related, which may have certain guiding significance for cancer subtyping. It is a challenge to effectively utilize the valuable information of distinct subtypes to produce more accurate and reliable subtypes. A weighted ensemble sparse latent representation (subtype-WESLR) is proposed to detect cancer subtypes on heterogeneous omics data. Using a weighted ensemble strategy to fuse base clustering obtained by distinct methods as prior knowledge, subtype-WESLR projects each sample feature profile from each data type to a common latent subspace while maintaining the local structure of the original sample feature space and consistency with the weighted ensemble and optimizes the common subspace by an iterative method to identify cancer subtypes. We conduct experiments on various synthetic datasets and eight public multi-view datasets from The Cancer Genome Atlas. The results demonstrate that subtype-WESLR is better than competing methods by utilizing the integration of base clustering of exist methods for more precise subtypes.
Collapse
Affiliation(s)
- Wenjing Song
- Intelligent Data Center, School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Weiwen Wang
- Intelligent Data Center, School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China
| | - Dao-Qing Dai
- Intelligent Data Center, School of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China
| |
Collapse
|
4
|
Adossa N, Khan S, Rytkönen KT, Elo LL. Computational strategies for single-cell multi-omics integration. Comput Struct Biotechnol J 2021; 19:2588-2596. [PMID: 34025945 PMCID: PMC8114078 DOI: 10.1016/j.csbj.2021.04.060] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 04/23/2021] [Accepted: 04/24/2021] [Indexed: 02/06/2023] Open
Abstract
Single-cell omics technologies are currently solving biological and medical problems that earlier have remained elusive, such as discovery of new cell types, cellular differentiation trajectories and communication networks across cells and tissues. Current advances especially in single-cell multi-omics hold high potential for breakthroughs by integration of multiple different omics layers. To pair with the recent biotechnological developments, many computational approaches to process and analyze single-cell multi-omics data have been proposed. In this review, we first introduce recent developments in single-cell multi-omics in general and then focus on the available data integration strategies. The integration approaches are divided into three categories: early, intermediate, and late data integration. For each category, we describe the underlying conceptual principles and main characteristics, as well as provide examples of currently available tools and how they have been applied to analyze single-cell multi-omics data. Finally, we explore the challenges and prospective future directions of single-cell multi-omics data integration, including examples of adopting multi-view analysis approaches used in other disciplines to single-cell multi-omics.
Collapse
Affiliation(s)
- Nigatu Adossa
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Sofia Khan
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
| | - Kalle T. Rytkönen
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
- Institute of Biomedicine, University of Turku, 20520 Turku, Finland
| | - Laura L. Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, 20520 Turku, Finland
- Institute of Biomedicine, University of Turku, 20520 Turku, Finland
| |
Collapse
|
5
|
Siebert JC, Saint-Cyr M, Borengasser SJ, Wagner BD, Lozupone CA, Görg C. CANTARE: finding and visualizing network-based multi-omic predictive models. BMC Bioinformatics 2021; 22:80. [PMID: 33607938 PMCID: PMC7896366 DOI: 10.1186/s12859-021-04016-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Accepted: 02/05/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND One goal of multi-omic studies is to identify interpretable predictive models for outcomes of interest, with analytes drawn from multiple omes. Such findings could support refined biological insight and hypothesis generation. However, standard analytical approaches are not designed to be "ome aware." Thus, some researchers analyze data from one ome at a time, and then combine predictions across omes. Others resort to correlation studies, cataloging pairwise relationships, but lacking an obvious approach for cohesive and interpretable summaries of these catalogs. METHODS We present a novel workflow for building predictive regression models from network neighborhoods in multi-omic networks. First, we generate pairwise regression models across all pairs of analytes from all omes, encoding the resulting "top table" of relationships in a network. Then, we build predictive logistic regression models using the analytes in network neighborhoods of interest. We call this method CANTARE (Consolidated Analysis of Network Topology And Regression Elements). RESULTS We applied CANTARE to previously published data from healthy controls and patients with inflammatory bowel disease (IBD) consisting of three omes: gut microbiome, metabolomics, and microbial-derived enzymes. We identified 8 unique predictive models with AUC > 0.90. The number of predictors in these models ranged from 3 to 13. We compare the results of CANTARE to random forests and elastic-net penalized regressions, analyzing AUC, predictions, and predictors. CANTARE AUC values were competitive with those generated by random forests and penalized regressions. The top 3 CANTARE models had a greater dynamic range of predicted probabilities than did random forests and penalized regressions (p-value = 1.35 × 10-5). CANTARE models were significantly more likely to prioritize predictors from multiple omes than were the alternatives (p-value = 0.005). We also showed that predictive models from a network based on pairwise models with an interaction term for IBD have higher AUC than predictive models built from a correlation network (p-value = 0.016). R scripts and a CANTARE User's Guide are available at https://sourceforge.net/projects/cytomelodics/files/CANTARE/ . CONCLUSION CANTARE offers a flexible approach for building parsimonious, interpretable multi-omic models. These models yield quantitative and directional effect sizes for predictors and support the generation of hypotheses for follow-up investigation.
Collapse
Affiliation(s)
- Janet C Siebert
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| | - Martine Saint-Cyr
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Sarah J Borengasser
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Brandie D Wagner
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA
| | - Catherine A Lozupone
- Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Carsten Görg
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA
| |
Collapse
|
6
|
Simultaneous feature selection and clustering of micro-array and RNA-sequence gene expression data using multiobjective optimization. INT J MACH LEARN CYB 2020. [DOI: 10.1007/s13042-020-01139-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
7
|
Abstract
In this chapter we discuss the past, present and future of clinical biomarker development. We explore the advent of new technologies, paving the way in which health, medicine and disease is understood. This review includes the identification of physicochemical assays, current regulations, the development and reproducibility of clinical trials, as well as, the revolution of omics technologies and state-of-the-art integration and analysis approaches.
Collapse
|
8
|
Dutta P, Mishra P, Saha S. Incomplete multi-view gene clustering with data regeneration using Shape Boltzmann Machine. Comput Biol Med 2020; 125:103965. [PMID: 32931989 DOI: 10.1016/j.compbiomed.2020.103965] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 08/08/2020] [Accepted: 08/08/2020] [Indexed: 11/17/2022]
Abstract
Deciphering patterns in the structural and functional anatomy of genes can prove to be very helpful in understanding genetic biology and genomics. Also, the availability of the multiple omics data, along with the advent of machine learning techniques, aids medical professionals in gaining insights about various biological regulations. Gene clustering is one of the many such computation techniques that can help in understanding gene behavior. However, more comprehensive and reliable insights can be gained if different modalities/views of biomedical data are considered. However, in most multi-view cases, each view contains some missing data, leading to incomplete multi-view clustering. In this study, we have presented a deep Boltzmann machine-based incomplete multi-view clustering framework for gene clustering. Here, we seek to regenerate the data of the three NCBI datasets in the incomplete modalities using Shape Boltzmann Machines. The overall performance of the proposed multi-view clustering technique has been evaluated using the Silhouette index and Davies-Bouldin index, and the comparative analysis shows an improvement over state-of-the-art methods. Finally, to prove that the improvement attained by the proposed incomplete multi-view clustering is statistically significant, we perform Welch's t-test. AVAILABILITY OF DATA AND MATERIALS: https://github.com/piyushmishra12/IMC.
Collapse
Affiliation(s)
- Pratik Dutta
- Department of Computer Science and Engineering, Indian Institute of Technology, Patna, India
| | - Piyush Mishra
- Department of Computer Science and Engineering, IIIT, Bhubaneswar, India
| | - Sriparna Saha
- Department of Computer Science and Engineering, Indian Institute of Technology, Patna, India.
| |
Collapse
|
9
|
Mitra S, Saha S, Hasanuzzaman M. Multi-view clustering for multi-omics data using unified embedding. Sci Rep 2020; 10:13654. [PMID: 32788601 PMCID: PMC7423957 DOI: 10.1038/s41598-020-70229-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 07/13/2020] [Indexed: 12/14/2022] Open
Abstract
In real world applications, data sets are often comprised of multiple views, which provide consensus and complementary information to each other. Embedding learning is an effective strategy for nearest neighbour search and dimensionality reduction in large data sets. This paper attempts to learn a unified probability distribution of the points across different views and generates a unified embedding in a low-dimensional space to optimally preserve neighbourhood identity. Probability distributions generated for each point for each view are combined by conflation method to create a single unified distribution. The goal is to approximate this unified distribution as much as possible when a similar operation is performed on the embedded space. As a cost function, the sum of Kullback-Leibler divergence over the samples is used, which leads to a simple gradient adjusting the position of the samples in the embedded space. The proposed methodology can generate embedding from both complete and incomplete multi-view data sets. Finally, a multi-objective clustering technique (AMOSA) is applied to group the samples in the embedded space. The proposed methodology, Multi-view Neighbourhood Embedding (MvNE), shows an improvement of approximately 2−3% over state-of-the-art models when evaluated on 10 omics data sets.
Collapse
Affiliation(s)
- Sayantan Mitra
- Department of Computer Science, Indian Institute of Technology Patna, Bihta, Bihar, 801103, India.
| | - Sriparna Saha
- Department of Computer Science, Indian Institute of Technology Patna, Bihta, Bihar, 801103, India
| | | |
Collapse
|
10
|
Gal J, Bailleux C, Chardin D, Pourcher T, Gilhodes J, Jing L, Guigonis JM, Ferrero JM, Milano G, Mograbi B, Brest P, Chateau Y, Humbert O, Chamorey E. Comparison of unsupervised machine-learning methods to identify metabolomic signatures in patients with localized breast cancer. Comput Struct Biotechnol J 2020; 18:1509-1524. [PMID: 32637048 PMCID: PMC7327012 DOI: 10.1016/j.csbj.2020.05.021] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 05/15/2020] [Accepted: 05/16/2020] [Indexed: 02/08/2023] Open
Abstract
Genomics and transcriptomics have led to the widely-used molecular classification of breast cancer (BC). However, heterogeneous biological behaviors persist within breast cancer subtypes. Metabolomics is a rapidly-expanding field of study dedicated to cellular metabolisms affected by the environment. The aim of this study was to compare metabolomic signatures of BC obtained by 5 different unsupervised machine learning (ML) methods. Fifty-two consecutive patients with BC with an indication for adjuvant chemotherapy between 2013 and 2016 were retrospectively included. We performed metabolomic profiling of tumor resection samples using liquid chromatography-mass spectrometry. Here, four hundred and forty-nine identified metabolites were selected for further analysis. Clusters obtained using 5 unsupervised ML methods (PCA k-means, sparse k-means, spectral clustering, SIMLR and k-sparse) were compared in terms of clinical and biological characteristics. With an optimal partitioning parameter k = 3, the five methods identified three prognosis groups of patients (favorable, intermediate, unfavorable) with different clinical and biological profiles. SIMLR and K-sparse methods were the most effective techniques in terms of clustering. In-silico survival analysis revealed a significant difference for 5-year predicted OS between the 3 clusters. Further pathway analysis using the 449 selected metabolites showed significant differences in amino acid and glucose metabolism between BC histologic subtypes. Our results provide proof-of-concept for the use of unsupervised ML metabolomics enabling stratification and personalized management of BC patients. The design of novel computational methods incorporating ML and bioinformatics techniques should make available tools particularly suited to improving the outcome of cancer treatment and reducing cancer-related mortalities.
Collapse
Affiliation(s)
- Jocelyn Gal
- University Côte d’Azur, Epidemiology and Biostatistics Department, Centre Antoine Lacassagne, Nice F-06189, France
| | - Caroline Bailleux
- University Côte d’Azur, Medical Oncology Department Centre Antoine Lacassagne, Nice F-06189, France
| | - David Chardin
- University Côte d’Azur, Nuclear Medicine Department, Centre Antoine Lacassagne, Nice F-06189, France
- University Côte d’Azur, Commissariat à l’Energie Atomique, Institut de Biosciences et Biotechnologies d'Aix-Marseille, Laboratory Transporters in Imaging and Radiotherapy in Oncology, Faculty of Medicine, Nice F-06100, France
| | - Thierry Pourcher
- University Côte d’Azur, Commissariat à l’Energie Atomique, Institut de Biosciences et Biotechnologies d'Aix-Marseille, Laboratory Transporters in Imaging and Radiotherapy in Oncology, Faculty of Medicine, Nice F-06100, France
| | - Julia Gilhodes
- Department of Biostatistics, Institut Claudius Regaud, IUCT-O Toulouse, France
| | - Lun Jing
- University Côte d’Azur, Commissariat à l’Energie Atomique, Institut de Biosciences et Biotechnologies d'Aix-Marseille, Laboratory Transporters in Imaging and Radiotherapy in Oncology, Faculty of Medicine, Nice F-06100, France
| | - Jean-Marie Guigonis
- University Côte d’Azur, Commissariat à l’Energie Atomique, Institut de Biosciences et Biotechnologies d'Aix-Marseille, Laboratory Transporters in Imaging and Radiotherapy in Oncology, Faculty of Medicine, Nice F-06100, France
| | - Jean-Marc Ferrero
- University Côte d’Azur, Medical Oncology Department Centre Antoine Lacassagne, Nice F-06189, France
| | - Gerard Milano
- University Côte d’Azur, Centre Antoine Lacassagne, Oncopharmacology Unit, Nice F-06189, France
| | - Baharia Mograbi
- University Côte d’Azur, CNRS UMR7284, INSERM U1081, IRCAN TEAM4 Centre Antoine Lacassagne FHU-Oncoage, Nice F-06189, France
| | - Patrick Brest
- University Côte d’Azur, CNRS UMR7284, INSERM U1081, IRCAN TEAM4 Centre Antoine Lacassagne FHU-Oncoage, Nice F-06189, France
| | - Yann Chateau
- University Côte d’Azur, Epidemiology and Biostatistics Department, Centre Antoine Lacassagne, Nice F-06189, France
| | - Olivier Humbert
- University Côte d’Azur, Nuclear Medicine Department, Centre Antoine Lacassagne, Nice F-06189, France
- University Côte d’Azur, Commissariat à l’Energie Atomique, Institut de Biosciences et Biotechnologies d'Aix-Marseille, Laboratory Transporters in Imaging and Radiotherapy in Oncology, Faculty of Medicine, Nice F-06100, France
| | - Emmanuel Chamorey
- University Côte d’Azur, Epidemiology and Biostatistics Department, Centre Antoine Lacassagne, Nice F-06189, France
| |
Collapse
|
11
|
Aziz F, Ahmad T, Malik AH, Uddin MI, Ahmad S, Sharaf M. Reversible data hiding techniques with high message embedding capacity in images. PLoS One 2020; 15:e0231602. [PMID: 32469877 PMCID: PMC7259517 DOI: 10.1371/journal.pone.0231602] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 03/26/2020] [Indexed: 11/24/2022] Open
Abstract
Reversible Data Hiding (RDH) techniques have gained popularity over the last two decades, where data is embedded in an image in such a way that the original image can be restored. Earlier works on RDH was based on the Image Histogram Modification that uses the peak point to embed data in the image. More recent works focus on the Difference Image Histogram Modification that exploits the fact that the neighbouring pixels of an image are highly correlated and therefore the difference of image makes more space to embed large amount of data. In this paper we propose a framework to increase the embedding capacity of reversible data hiding techniques that use a difference of image to embed data. The main idea is that, instead of taking the difference of the neighboring pixels, we rearrange the columns (or rows) of the image in a way that enhances the smooth regions of an image. Any difference based technique to embed data can then be used in the transformed image. The proposed method is applied on different types of images including textures, patterns and publicly available images. Experimental results demonstrate that the proposed method not only increases the message embedding capacity of a given image by more than 50% but also the visual quality of the marked image containing the message is more than the visual quality obtained by existing state-of-the-art reversible data hiding technique. The proposed technique is also verified by Pixel Difference Histogram (PDH) Stegoanalysis and results demonstrate that marked images generated by proposed method is undetectable by PDH analysis.
Collapse
Affiliation(s)
- Furqan Aziz
- Center of Excellence in IT, Institute of Management Sciences, Peshawar, Pakistan
- Centre for Computational Biology, University of Birmingham, Birmingham, England, United Kingdom
| | - Taeeb Ahmad
- Center of Excellence in IT, Institute of Management Sciences, Peshawar, Pakistan
| | - Abdul Haseeb Malik
- Department of Computer Science, University of Peshawar, Peshawar, Pakistan
| | - M. Irfan Uddin
- Institute of Computing, Kohat University of Science and Technology, Kohat, Pakistan
- * E-mail:
| | - Shafiq Ahmad
- Department of Industrial Engineering, College of Engineering, King Saud University, Riyadh, Kingdom of Saudi Arabia
| | - Mohamed Sharaf
- Department of Industrial Engineering, College of Engineering, King Saud University, Riyadh, Kingdom of Saudi Arabia
| |
Collapse
|