1
|
Gliozzo J, Soto-Gomez M, Guarino V, Bonometti A, Cabri A, Cavalleri E, Reese J, Robinson PN, Mesiti M, Valentini G, Casiraghi E. Intrinsic-dimension analysis for guiding dimensionality reduction and data fusion in multi-omics data processing. Artif Intell Med 2025; 160:103049. [PMID: 39673960 DOI: 10.1016/j.artmed.2024.103049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/03/2024] [Accepted: 12/04/2024] [Indexed: 12/16/2024]
Abstract
Multi-omics data have revolutionized biomedical research by providing a comprehensive understanding of biological systems and the molecular mechanisms of disease development. However, analyzing multi-omics data is challenging due to high dimensionality and limited sample sizes, necessitating proper data-reduction pipelines to ensure reliable analyses. Additionally, its multimodal nature requires effective data-integration pipelines. While several dimensionality reduction and data fusion algorithms have been proposed, crucial aspects are often overlooked. Specifically, the choice of projection space dimension is typically heuristic and uniformly applied across all omics, neglecting the unique high dimension small sample size challenges faced by individual omics. This paper introduces a novel multi-modal dimensionality reduction pipeline tailored to individual views. By leveraging intrinsic dimensionality estimators, we assess the curse-of-dimensionality impact on each view and propose a two-step reduction strategy for significantly affected views, combining feature selection with feature extraction. Compared to traditional uniform reduction pipelines in a crucial and supervised multi-omics analysis setting, our approach shows significant improvement. Additionally, we explore three effective unsupervised multi-omics data fusion methods rooted in the main data fusion strategies to gain insights into their performance under crucial, yet overlooked, settings.
Collapse
Affiliation(s)
- Jessica Gliozzo
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; European Commission, Joint Research Centre (JRC), Ispra, Italy
| | - Mauricio Soto-Gomez
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Valentina Guarino
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Arturo Bonometti
- Department of Biomedical Sciences, Humanitas University, Milan, Italy; Department of Pathology, IRCCS Humanitas Clinical and Research Hospital, Milan, Italy
| | - Alberto Cabri
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Emanuele Cavalleri
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy
| | - Justin Reese
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Marco Mesiti
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Giorgio Valentini
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; CINI, Infolife National Laboratory, Roma, Italy
| | - Elena Casiraghi
- AnacletoLab, Computer Science Department, Università degli Studi di Milano, Milan, Italy; Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA; CINI, Infolife National Laboratory, Roma, Italy; Department of Computer Science, Aalto University, Espoo, Finland.
| |
Collapse
|
2
|
Zhou X, Chen Y, Heidari AA, Chen H, Chen X. Rough hypervolume-driven feature selection with groupwise intelligent sampling for detecting clinical characterization of lupus nephritis. Artif Intell Med 2025; 160:103042. [PMID: 39673961 DOI: 10.1016/j.artmed.2024.103042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 09/06/2024] [Accepted: 11/23/2024] [Indexed: 12/16/2024]
Abstract
Systemic lupus erythematosus (SLE) is an autoimmune inflammatory disease. Lupus nephritis (LN) is a major risk factor for morbidity and mortality in SLE. Proliferative and pure membranous LN have different prognoses and may require different treatments. This study proposes a binary rough hypervolume-driven spherical evolution algorithm with groupwise intelligent sampling (bRGSE). The efficient dimensionality reduction capability of the bRGSE is verified across twelve datasets. These datasets are from the public datasets, with feature dimensions ranging from seven hundred to fifty thousand. The experimental results indicate that bRGSE performs better than seven high-performing alternatives. Then, the bRGSE was combined with adaptive boosting (AdaBoost) to form a new model (bRGSE_AdaBoost), which analyzed clinical records collected from 110 patients with LN. Experimental results show that the proposed bRGSE_AdaBoost can identify the most critical indicators, including urine latent blood, white blood cells, endogenous creatinine clearing rate, and age. These indicators may help differentiate between proliferative LN and membranous LN. The proposed bRGSE algorithm is an efficient dimensionality reduction method. The developed bRGSE_AdaBoost model, a computer-aided model, achieved an accuracy of 96.687 % and is expected to provide early warning for the treatment and diagnosis of LN.
Collapse
Affiliation(s)
- Xinsen Zhou
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Yi Chen
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Ali Asghar Heidari
- School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran, Iran
| | - Huiling Chen
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China.
| | - Xiaowei Chen
- Department of Rheumatology and Immunology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou 325000, China.
| |
Collapse
|
3
|
Xia S, Wang C, Wang G, Gao X, Ding W, Yu J, Zhai Y, Chen Z. GBRS: A Unified Granular-Ball Learning Model of Pawlak Rough Set and Neighborhood Rough Set. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025; 36:1719-1733. [PMID: 37943647 DOI: 10.1109/tnnls.2023.3325199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Pawlak rough set (PRS) and neighborhood rough set (NRS) are the two most common rough set theoretical models. Although the PRS can use equivalence classes to represent knowledge, it is unable to process continuous data. On the other hand, NRSs, which can process continuous data, rather lose the ability of using equivalence classes to represent knowledge. To remedy this deficit, this article presents a granular-ball rough set (GBRS) based on the granular-ball computing combining the robustness and the adaptability of the granular-ball computing. The GBRS can simultaneously represent both the PRS and the NRS, enabling it not only to be able to deal with continuous data and to use equivalence classes for knowledge representation as well. In addition, we propose an implementation algorithm of the GBRS by introducing the positive region of GBRS into the PRS framework. The experimental results on benchmark datasets demonstrate that the learning accuracy of the GBRS has been significantly improved compared with the PRS and the traditional NRS. The GBRS also outperforms nine popular or the state-of-the-art feature selection methods. We have open-sourced all the source codes of this article at https://www.cquptshuyinxia.com/GBRS.html, https://github.com/syxiaa/GBRS.
Collapse
|
5
|
Rajeh TM, Li T, Li C, Javed MH, Luo Z, Alhaek F. Modeling multi-regional temporal correlation with gated recurrent unit and multiple linear regression for urban traffic flow prediction. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|