1
|
Cheng C, Messerschmidt L, Bravo I, Waldbauer M, Bhavikatti R, Schenk C, Grujic V, Model T, Kubinec R, Barceló J. A General Primer for Data Harmonization. Sci Data 2024; 11:152. [PMID: 38297013 PMCID: PMC10831085 DOI: 10.1038/s41597-024-02956-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 01/11/2024] [Indexed: 02/02/2024] Open
Affiliation(s)
- Cindy Cheng
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany.
| | - Luca Messerschmidt
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany
| | - Isaac Bravo
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany
| | - Marco Waldbauer
- Hochschule für Politik, Technical University of Munich, Richard-Wagner Str. 1, Munich, 80333, Bavaria, Germany
| | | | - Caress Schenk
- School of Humanities and Social Sciences, Nazarbayev University, Kabanbay Batry Ave., 53, Astana, 010000, Kazakhstan
| | - Vanja Grujic
- Faculty of Law, University of Brasilia, Campus Universitário Darcy Ribeiro Asa Norte, Brasília, 10587, Brazil
| | - Tim Model
- Delve, 2225 3rd St, San Francisco, 94107, California, USA
| | - Robert Kubinec
- Division of Social Science, New York University Abu Dhabi, Social Science Building (A5), Abu Dhabi, 129188, United Arab Emirates
| | - Joan Barceló
- Division of Social Science, New York University Abu Dhabi, Social Science Building (A5), Abu Dhabi, 129188, United Arab Emirates
| |
Collapse
|
2
|
Diao Y, Zhao Y, Li X, Li B, Huo R, Han X. A simplified machine learning model utilizing platelet-related genes for predicting poor prognosis in sepsis. Front Immunol 2023; 14:1286203. [PMID: 38054005 PMCID: PMC10694245 DOI: 10.3389/fimmu.2023.1286203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 11/03/2023] [Indexed: 12/07/2023] Open
Abstract
Background Thrombocytopenia is a known prognostic factor in sepsis, yet the relationship between platelet-related genes and sepsis outcomes remains elusive. We developed a machine learning (ML) model based on platelet-related genes to predict poor prognosis in sepsis. The model underwent rigorous evaluation on six diverse platforms, ensuring reliable and versatile findings. Methods A retrospective analysis of platelet data from 365 sepsis patients confirmed the predictive role of platelet count in prognosis. We employed COX analysis, Least Absolute Shrinkage and Selection Operator (LASSO) and Support Vector Machine (SVM) techniques to identify platelet-related genes from the GSE65682 dataset. Subsequently, these genes were trained and validated on six distinct platforms comprising 719 patients, and compared against the Acute Physiology and Chronic Health Evaluation II (APACHE II) and Sequential Organ-Failure Assessment (SOFA) score. Results A PLT count <100×109/L independently increased the risk of death in sepsis patients (OR = 2.523; 95% CI: 1.084-5.872). The ML model, based on five platelet-related genes, demonstrated impressive area under the curve (AUC) values ranging from 0.5 to 0.795 across various validation platforms. On the GPL6947 platform, our ML model outperformed the APACHE II score with an AUC of 0.795 compared to 0.761. Additionally, by incorporating age, the model's performance was further improved to an AUC of 0.812. On the GPL4133 platform, the initial AUC of the machine learning model based on five platelet-related genes was 0.5. However, after including age, the AUC increased to 0.583. In comparison, the AUC of the APACHE II score was 0.604, and the AUC of the SOFA score was 0.542. Conclusion Our findings highlight the broad applicability of this ML model, based on platelet-related genes, in facilitating early treatment decisions for sepsis patients with poor outcomes. Our study paves the way for advancements in personalized medicine and improved patient care.
Collapse
Affiliation(s)
| | | | | | | | | | - Xiaoxu Han
- National Clinical Research Center for Laboratory Medicine, Department of Laboratory Medicine, The First Hospital of China Medical University, Shenyang, China
| |
Collapse
|
3
|
Borisov N, Tkachev V, Simonov A, Sorokin M, Kim E, Kuzmin D, Karademir-Yilmaz B, Buzdin A. Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns. Front Mol Biosci 2023; 10:1237129. [PMID: 37745690 PMCID: PMC10511763 DOI: 10.3389/fmolb.2023.1237129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 08/28/2023] [Indexed: 09/26/2023] Open
Abstract
Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced. Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores. Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers. Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods.
Collapse
Affiliation(s)
- Nicolas Borisov
- Omicsway Corp, Walnut, CA, United States
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | | | - Alexander Simonov
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Oncobox Ltd., Moscow, Russia
| | - Maxim Sorokin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- Oncobox Ltd., Moscow, Russia
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, Moscow, Russia
| | - Ella Kim
- Clinic for Neurosurgery, Laboratory of Experimental Neurooncology, Johannes Gutenberg University Medical Centre, Mainz, Germany
| | - Denis Kuzmin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Betul Karademir-Yilmaz
- Department of Biochemistry, School of Medicine/Genetic and Metabolic Diseases Research and Investigation Center (GEMHAM) Marmara University, Istanbul, Türkiye
| | - Anton Buzdin
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
- World-Class Research Center “Digital Biodesign and Personalized Healthcare”, Sechenov First Moscow State Medical University, Moscow, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| |
Collapse
|
4
|
Sorokin M, Buzdin AA, Guryanova A, Efimov V, Suntsova MV, Zolotovskaia MA, Koroleva EV, Sekacheva MI, Tkachev VS, Garazha A, Kremenchutckaya K, Drobyshev A, Seryakov A, Gudkov A, Alekseenko IV, Rakitina O, Kostina MB, Vladimirova U, Moisseev A, Bulgin D, Radomskaya E, Shestakov V, Baklaushev VP, Prassolov V, Shegay PV, Li X, Poddubskaya EV, Gaifullin N. Large-scale assessment of pros and cons of autopsy-derived or tumor-matched tissues as the norms for gene expression analysis in cancers. Comput Struct Biotechnol J 2023; 21:3964-3986. [PMID: 37635765 PMCID: PMC10448432 DOI: 10.1016/j.csbj.2023.07.040] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 07/17/2023] [Accepted: 07/30/2023] [Indexed: 08/29/2023] Open
Abstract
Normal tissues are essential for studying disease-specific differential gene expression. However, healthy human controls are typically available only in postmortal/autopsy settings. In cancer research, fragments of pathologically normal tissue adjacent to tumor site are frequently used as the controls. However, it is largely underexplored how cancers can systematically influence gene expression of the neighboring tissues. Here we performed a comprehensive pan-cancer comparison of molecular profiles of solid tumor-adjacent and autopsy-derived "healthy" normal tissues. We found a number of systemic molecular differences related to activation of the immune cells, intracellular transport and autophagy, cellular respiration, telomerase activation, p38 signaling, cytoskeleton remodeling, and reorganization of the extracellular matrix. The tumor-adjacent tissues were deficient in apoptotic signaling and negative regulation of cell growth including G2/M cell cycle transition checkpoint. We also detected an extensive rearrangement of the chemical perception network. Molecular targets of 32 and 37 cancer drugs were over- or underexpressed, respectively, in the tumor-adjacent norms. These processes may be driven by molecular events that are correlated between the paired cancer and adjacent normal tissues, that mostly relate to inflammation and regulation of intracellular molecular pathways such as the p38, MAPK, Notch, and IGF1 signaling. However, using a model of macaque postmortal tissues we showed that for the 30 min - 24-hour time frame at 4ºC, an RNA degradation pattern in lung biosamples resulted in an artifact "differential" expression profile for 1140 genes, although no differences could be detected in liver. Thus, such concerns should be addressed in practice.
Collapse
Affiliation(s)
- Maksim Sorokin
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
- Omicsway Corp., Walnut, CA 91789, USA
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow 117997, Russia
| | - Anton A. Buzdin
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow 117997, Russia
- World-Class Research Center "Digital biodesign and personalized healthcare", Sechenov First Moscow State Medical University, Moscow, Russia
- PathoBiology Group, European Organization for Research and Treatment of Cancer (EORTC), Brussels, Belgium
| | - Anastasia Guryanova
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
| | - Victor Efimov
- World-Class Research Center "Digital biodesign and personalized healthcare", Sechenov First Moscow State Medical University, Moscow, Russia
| | - Maria V. Suntsova
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| | - Marianna A. Zolotovskaia
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
- Omicsway Corp., Walnut, CA 91789, USA
| | - Elena V. Koroleva
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
| | - Marina I. Sekacheva
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region 141701, Russia
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| | - Victor S. Tkachev
- Omicsway Corp., Walnut, CA 91789, USA
- Oncobox Ltd., Moscow 121205, Russia
| | - Andrew Garazha
- Omicsway Corp., Walnut, CA 91789, USA
- Oncobox Ltd., Moscow 121205, Russia
| | | | - Aleksey Drobyshev
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| | | | - Alexander Gudkov
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| | - Irina V. Alekseenko
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow 117997, Russia
- Institute of Molecular Genetics of National Research Centre "Kurchatov Institute", 2, Kurchatov Square, Moscow 123182, Russian
- FSBI "National Medical Research Center for Obstetrics, Gynecology and Perinatology named after Academician V.I. Kulakov" Ministry of Healthcare of the Russian Federation, Moscow 117198, Russia
| | - Olga Rakitina
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow 117997, Russia
| | - Maria B. Kostina
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow 117997, Russia
| | - Uliana Vladimirova
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
- Oncobox Ltd., Moscow 121205, Russia
| | - Aleksey Moisseev
- I.M. Sechenov First Moscow State Medical University, Moscow 119991, Russia
| | - Dmitry Bulgin
- Research Institute of Medical Primatology, 177 Mira str., Veseloye, Sochi 354376, Russia
| | - Elena Radomskaya
- Research Institute of Medical Primatology, 177 Mira str., Veseloye, Sochi 354376, Russia
| | - Viktor Shestakov
- Research Institute of Medical Primatology, 177 Mira str., Veseloye, Sochi 354376, Russia
| | | | - Vladimir Prassolov
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 32 Vavilova str., Moscow 119991, Russia
| | - Petr V. Shegay
- National Medical Research Radiological Center of the Ministry of Health of the Russian Federation, 249036 Obninsk, Russia
| | - Xinmin Li
- UCLA Technology Center for Genomics & Bioinformatics, Department of Pathology & Laboratory Medicine, 650 Charles E Young Dr., Los Angeles, CA 90095, USA
| | | | - Nurshat Gaifullin
- Department of Physiology and General Pathology, Faculty of Medicine, Lomonosov Moscow State University, Moscow 119991, Russia
| |
Collapse
|