1
|
Giudice L, Mohamed A, Malm T. StellarPath: Hierarchical-vertical multi-omics classifier synergizes stable markers and interpretable similarity networks for patient profiling. PLoS Comput Biol 2024; 20:e1012022. [PMID: 38607982 DOI: 10.1371/journal.pcbi.1012022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 04/24/2024] [Accepted: 03/25/2024] [Indexed: 04/14/2024] Open
Abstract
The Patient Similarity Network paradigm implies modeling the similarity between patients based on specific data. The similarity can summarize patients' relationships from high-dimensional data, such as biological omics. The end PSN can undergo un/supervised learning tasks while being strongly interpretable, tailored for precision medicine, and ready to be analyzed with graph-theory methods. However, these benefits are not guaranteed and depend on the granularity of the summarized data, the clarity of the similarity measure, the complexity of the network's topology, and the implemented methods for analysis. To date, no patient classifier fully leverages the paradigm's inherent benefits. PSNs remain complex, unexploited, and meaningless. We present StellarPath, a hierarchical-vertical patient classifier that leverages pathway analysis and patient similarity concepts to find meaningful features for both classes and individuals. StellarPath processes omics data, hierarchically integrates them into pathways, and uses a novel similarity to measure how patients' pathway activity is alike. It selects biologically relevant molecules, pathways, and networks, considering molecule stability and topology. A graph convolutional neural network then predicts unknown patients based on known cases. StellarPath excels in classification performances and computational resources across sixteen datasets. It demonstrates proficiency in inferring the class of new patients described in external independent studies, following its initial training and testing phases on a local dataset. It advances the PSN paradigm and provides new markers, insights, and tools for in-depth patient profiling.
Collapse
Affiliation(s)
- Luca Giudice
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Ahmed Mohamed
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| | - Tarja Malm
- A.I. Virtanen Institute for Molecular Sciences, University of Eastern Finland, Kuopio, Finland
| |
Collapse
|
2
|
Sheng M, Qi Y, Gao Z, Lin X. Analyzing omics data based on sample network. J Bioinform Comput Biol 2024; 22:2450002. [PMID: 38567387 DOI: 10.1142/s0219720024500021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Identifying valuable features from complex omics data is of great significance for disease diagnosis study. This paper proposes a new feature selection algorithm based on sample network (FS-SN) to mine important information from omics data. The sample network is constructed according to the sample neighbor relationship at the molecular (feature) expression level, and the distinguishing ability of the feature is evaluated based on the topology of the sample network. The sample network established on a feature with a strong discriminating ability tends to have many edges between the same group samples and few edges between the different group samples. At the same time, FS-SN removes redundant features according to the gravitational interaction between features. To show the validation of FS-SN, it was compared on ten public datasets with ERGS, mRMR, ReliefF, ATSD-DN, and INDEED which are efficient in omics data analysis. Experimental results show that FS-SN performed better than the compared methods in accuracy, sensitivity and specificity in most cases. Hence, FS-SN making use of the topology of the sample network is effective for analyzing omics data, it can identify key features that reflect the occurrence and development of diseases, and reveal the underlying biological mechanism.
Collapse
Affiliation(s)
- Meizhen Sheng
- School of Computer Science & Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning Province 116024, P. R. China
| | - Yanpeng Qi
- School of Computer Science & Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning Province 116024, P. R. China
| | - Zhenbo Gao
- School of Computer Science & Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning Province 116024, P. R. China
| | - Xiaohui Lin
- School of Computer Science & Technology, Dalian University of Technology, No. 2 Linggong Road, Dalian, Liaoning Province 116024, P. R. China
| |
Collapse
|
3
|
Cao J, Liu Z, Yuan J, Luo Y, Wang J, Liu J, Bo H, Guo J. Subgrouping testicular germ cell tumors based on immunotherapy and chemotherapy associated lncRNAs. Heliyon 2024; 10:e24320. [PMID: 38298718 PMCID: PMC10827771 DOI: 10.1016/j.heliyon.2024.e24320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 12/01/2023] [Accepted: 01/07/2024] [Indexed: 02/02/2024] Open
Abstract
Testicular germ cell tumors (TGCT) are the most common reproductive system malignancies in men aged 15-44 years, accounting for 95 % of all testicular tumors. Our previous studies have been shown that long non-coding RNAs (lncRNAs), such as LINC00313, TTTY14 and RFPL3S, were associated with development of TGCT. Subgrouping TGCT according to differential expressed lncRNAs and immunological characteristics is helpful to comprehensively describe the characteristics of TGCT and implement precise treatment. In this study, the TGCT transcriptome data in The Cancer Genome Atlas Program (TCGA) database was used to perform consensus clustering analysis to construct a prognostic model for TGCT. TGCT was divided into 3 subtypes C1, C2, and C3 based on the differentially expressed lncRNAs. C1 subtype was sensitive to chemotherapy drugs, while the C2 subtype was not sensitive to chemotherapy drugs, and C3 subtype may benefit from immunotherapy. We defined the C1 subtype as epidermal progression subtype, the C2 subtype as mesenchymal progression subtype, and the C3 subtype as T cell activation subtype. Subgrouping based on differentially expressed genes (DEGs) and immunological characteristics is helpful for the precise treatment of TGCT.
Collapse
Affiliation(s)
- Jian Cao
- Hunan Cancer Hospital, Department of Urology, The Affiliated Cancer Hospital of Xiangya School of Medicine of Central South University, Changsha, 410013, Hunan, China
| | - Zhizhong Liu
- Hunan Cancer Hospital, Department of Urology, The Affiliated Cancer Hospital of Xiangya School of Medicine of Central South University, Changsha, 410013, Hunan, China
| | - Junbin Yuan
- Department of Urology, Xiangya Hospital, Central South University, Changsha, 410013, Hunan, China
| | - Yanwei Luo
- Department of Blood Transfusion, the Third Xiangya Hospital of Central South University, Changsha, 410013, Hunan, China
| | - Jinrong Wang
- Department of Urology, The Third Xiangya Hospital of Central South University, No.138, Tongzipo Road, Changsha, 410013, Hunan, China
| | - Jianye Liu
- Department of Urology, The Third Xiangya Hospital of Central South University, No.138, Tongzipo Road, Changsha, 410013, Hunan, China
| | - Hao Bo
- Clinical Research Center for Reproduction and Genetics in Hunan Province, Reproductive and Genetic Hospital of CITIC-Xiangya, Changsha, 410078, Hunan, China
- NHC Key Laboratory of Human Stem Cell and Reproductive Engineering, Institute of Reproductive and Stem Cell Engineering, Central South University, Changsha, 410078, Hunan, China
| | - Jie Guo
- National Institution of Drug Clinical Trial, Xiangya Hospital, Central South University, Changsha, Hunan, China
- China National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China
- International Science and Technology Innovation Cooperation Base for Early Clinical Trials of Biological Agents in Hunan Province, Changsha, Hunan, China
| |
Collapse
|
4
|
Mirzaei G. Constructing gene similarity networks using co-occurrence probabilities. BMC Genomics 2023; 24:697. [PMID: 37990157 PMCID: PMC10662556 DOI: 10.1186/s12864-023-09780-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 11/01/2023] [Indexed: 11/23/2023] Open
Abstract
Gene similarity networks play important role in unraveling the intricate associations within diverse cancer types. Conventionally, gauging the similarity between genes has been approached through experimental methodologies involving chemical and molecular analyses, or through the lens of mathematical techniques. However, in our work, we have pioneered a distinctive mathematical framework, one rooted in the co-occurrence of attribute values and single point mutations, thereby establishing a novel approach for quantifying the dissimilarity or similarity among genes. Central to our approach is the recognition of mutations as key players in the evolutionary trajectory of cancer. Anchored in this understanding, our methodology hinges on the consideration of two categorical attributes: mutation type and nucleotide change. These attributes are pivotal, as they encapsulate the critical variations that can precipitate substantial changes in gene behavior and ultimately influence disease progression. Our study takes on the challenge of formulating similarity measures that are intrinsic to genes' categorical data. Taking into account the co-occurrence probability of attribute values within single point mutations, our innovative mathematical approach surpasses the boundaries of conventional methods. We thereby provide a robust and comprehensive means to assess gene similarity and take a significant step forward in refining the tools available for uncovering the subtle yet impactful associations within the complex realm of gene interactions in cancer.
Collapse
Affiliation(s)
- Golrokh Mirzaei
- Department of Computer Science and Engineering, The Ohio State University, Marion, USA.
| |
Collapse
|
5
|
Ferrè L, Clarelli F, Pignolet B, Mascia E, Frasca M, Santoro S, Sorosina M, Bucciarelli F, Moiola L, Martinelli V, Comi G, Liblau R, Filippi M, Valentini G, Esposito F. Combining Clinical and Genetic Data to Predict Response to Fingolimod Treatment in Relapsing Remitting Multiple Sclerosis Patients: A Precision Medicine Approach. J Pers Med 2023; 13:jpm13010122. [PMID: 36675783 PMCID: PMC9861774 DOI: 10.3390/jpm13010122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/30/2022] [Accepted: 12/30/2022] [Indexed: 01/11/2023] Open
Abstract
A personalized approach is strongly advocated for treatment selection in Multiple Sclerosis patients due to the high number of available drugs. Machine learning methods proved to be valuable tools in the context of precision medicine. In the present work, we applied machine learning methods to identify a combined clinical and genetic signature of response to fingolimod that could support the prediction of drug response. Two cohorts of fingolimod-treated patients from Italy and France were enrolled and divided into training, validation, and test set. Random forest training and robust feature selection were performed in the first two sets respectively, and the independent test set was used to evaluate model performance. A genetic-only model and a combined clinical-genetic model were obtained. Overall, 381 patients were classified according to the NEDA-3 criterion at 2 years; we identified a genetic model, including 123 SNPs, that was able to predict fingolimod response with an AUROC= 0.65 in the independent test set. When combining clinical data, the model accuracy increased to an AUROC= 0.71. Integrating clinical and genetic data by means of machine learning methods can help in the prediction of response to fingolimod, even though further studies are required to definitely extend this approach to clinical applications.
Collapse
Affiliation(s)
- Laura Ferrè
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Hospital, 20132 Milan, Italy
- Laboratory of Human Genetics of Neurological Disorders, IRCCS San Raffaele Hospital, 20132 Milan, Italy
- Vita-Salute San Raffaele University, 20132 Milan, Italy
| | - Ferdinando Clarelli
- Laboratory of Human Genetics of Neurological Disorders, IRCCS San Raffaele Hospital, 20132 Milan, Italy
| | - Beatrice Pignolet
- Centre Hospitalier Universitaire de Toulouse, CEDEX 9, 31059 Toulouse, France
- Institut Toulousain des Maladies Infectieuses et Inflammatoires (Infinity), INSERM UMR1291–CNRS UMR5051—Université Toulouse III, CEDEX 3, 31024 Toulouse, France
| | - Elisabetta Mascia
- Laboratory of Human Genetics of Neurological Disorders, IRCCS San Raffaele Hospital, 20132 Milan, Italy
| | - Marco Frasca
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, 20133 Milan, Italy
- Data Science Research Center, Università degli Studi di Milano, 20133 Milan, Italy
- Infolife National Lab, CINI, 00185 Rome, Italy
| | - Silvia Santoro
- Laboratory of Human Genetics of Neurological Disorders, IRCCS San Raffaele Hospital, 20132 Milan, Italy
| | - Melissa Sorosina
- Laboratory of Human Genetics of Neurological Disorders, IRCCS San Raffaele Hospital, 20132 Milan, Italy
| | - Florence Bucciarelli
- Centre Hospitalier Universitaire de Toulouse, CEDEX 9, 31059 Toulouse, France
- Institut Toulousain des Maladies Infectieuses et Inflammatoires (Infinity), INSERM UMR1291–CNRS UMR5051—Université Toulouse III, CEDEX 3, 31024 Toulouse, France
| | - Lucia Moiola
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Hospital, 20132 Milan, Italy
| | - Vittorio Martinelli
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Hospital, 20132 Milan, Italy
| | | | - Roland Liblau
- Institut Toulousain des Maladies Infectieuses et Inflammatoires (Infinity), INSERM UMR1291–CNRS UMR5051—Université Toulouse III, CEDEX 3, 31024 Toulouse, France
- Department of Immunology, Toulouse University Hospitals, CEDEX 3, 31024 Toulouse, France
| | - Massimo Filippi
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Hospital, 20132 Milan, Italy
- Vita-Salute San Raffaele University, 20132 Milan, Italy
- Neuroimaging Research Unit, IRCCS San Raffaele Hospital, 20132 Milan, Italy
- Neurophisiology Unit, IRCCS San Raffaele Hospital, 20132 Milan, Italy
| | - Giorgio Valentini
- AnacletoLab, Dipartimento di Informatica, Università degli Studi di Milano, 20133 Milan, Italy
- Data Science Research Center, Università degli Studi di Milano, 20133 Milan, Italy
- Infolife National Lab, CINI, 00185 Rome, Italy
| | - Federica Esposito
- Neurology and Neurorehabilitation Unit, IRCCS San Raffaele Hospital, 20132 Milan, Italy
- Laboratory of Human Genetics of Neurological Disorders, IRCCS San Raffaele Hospital, 20132 Milan, Italy
- Correspondence:
| |
Collapse
|
6
|
Amaro A, Pfeffer M, Pfeffer U, Reggiani F. Evaluation and Comparison of Multi-Omics Data Integration Methods for Subtyping of Cutaneous Melanoma. Biomedicines 2022; 10. [PMID: 36551996 DOI: 10.3390/biomedicines10123240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Revised: 11/29/2022] [Accepted: 12/07/2022] [Indexed: 12/15/2022] Open
Abstract
There is a growing number of multi-domain genomic datasets for human tumors. Multi-domain data are usually interpreted after separately analyzing single-domain data and integrating the results post hoc. Data fusion techniques allow for the real integration of multi-domain data to ideally improve the tumor classification results for the prognosis and prediction of response to therapy. We have previously described the joint singular value decomposition (jSVD) technique as a means of data fusion. Here, we report on the development of these methods in open source code based on R and Python and on the application of these data fusion methods. The Cancer Genome Atlas (TCGA) Skin Cutaneous Melanoma (SKCM) dataset was used as a benchmark to evaluate the potential of the data fusion approaches to improve molecular classification of cancers in a clinically relevant manner. Our data show that the data fusion approach does not generate classification results superior to those obtained using single-domain data. Data from different domains are not entirely independent from each other, and molecular classes are characterized by features that penetrate different domains. Data fusion techniques might be better suited for response prediction, where they could contribute to the identification of predictive features in a domain-independent manner to be used as biomarkers.
Collapse
|