1
|
Yun J, Song JS, Yoo JJ, Kweon S, Choi YY, Lim D, Kuk JC, Kim HJ, Park SK. Microbial and Immune Landscape of Malignant Ascites: Insights from Gut, Bladder, and Ascitic Fluid Analyses. Cancers (Basel) 2025; 17:1280. [PMID: 40282458 PMCID: PMC12025743 DOI: 10.3390/cancers17081280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2025] [Revised: 04/01/2025] [Accepted: 04/07/2025] [Indexed: 04/29/2025] Open
Abstract
BACKGROUND/OBJECTIVES Malignant ascites frequently arises in advanced cancers with peritoneal metastasis and is associated with poor outcomes. Known mechanisms include lymphatic obstruction by tumor cells, increased vascular permeability, and sodium retention via the renin-angiotensin-aldosterone system; however, the pathogenesis remains not fully understood. We investigated whether gut and bladder microbiomes correlate with malignant ascites development or progression and whether the immune microenvironment in ascitic fluid is altered. METHODS We enrolled 66 histologically confirmed cancer patients, dividing them into malignant ascites (n = 20) and non-ascites (n = 46) groups. Stool, urine, and ascitic fluid samples were analyzed using 16S rRNA next-generation sequencing. Immune cell subsets in ascitic fluid were characterized using flow cytometry. RESULTS In 19 of the 20 malignant ascites samples, the bacterial load was too low for reliable 16S rRNA sequencing, suggesting that malignant ascites is largely sterile. The overall gut microbiome diversity did not differ significantly by ascites status, although a trend emerged in patients with peritoneal metastasis, including the enrichment of class Clostridia and Gammaproteobacteria. Bladder microbiome analysis also showed no significant differences in ascites or metastasis status. Flow cytometry revealed reduced T-cell (CD3+, CD4+, CD8+) and NK cell (CD56+) populations compared to data from cirrhotic ascites. CONCLUSIONS Malignant ascites exhibit minimal bacterial biomass, making comprehensive microbiome analysis challenging. Although no major global changes were noted in gut and bladder microbiomes, specific taxa were linked to peritoneal metastasis. These findings highlight an immunosuppressive ascitic environment and suggest that larger-scale or multi-omics approaches may help elucidate the role of microbiota in malignant ascites.
Collapse
Affiliation(s)
- Jina Yun
- Division of Hematology-Oncology, Department of Medicine, Soonchunhyang University Bucheon Hospital, Soonchunhyang University College of Medicine, Bucheon 14584, Republic of Korea; (H.-J.K.); (S.-K.P.)
| | - Ju-Sun Song
- GC Genome, Department of Laboratory Medicine, Green Cross Laboratories, Seoul 16924, Republic of Korea; (J.-S.S.); (S.K.)
| | - Jeong-Ju Yoo
- Division of Hepatology, Department of Medicine, Soonchunhyang University Bucheon Hospital, Bucheon 14584, Republic of Korea;
| | - Solbi Kweon
- GC Genome, Department of Laboratory Medicine, Green Cross Laboratories, Seoul 16924, Republic of Korea; (J.-S.S.); (S.K.)
| | - Yoon-Young Choi
- Department of Surgery, Soonchunhyang University Bucheon Hospital, Soonchunhyang University College of Medicine, Bucheon 14584, Republic of Korea; (Y.-Y.C.); (D.L.); (J.-C.K.)
| | - Daero Lim
- Department of Surgery, Soonchunhyang University Bucheon Hospital, Soonchunhyang University College of Medicine, Bucheon 14584, Republic of Korea; (Y.-Y.C.); (D.L.); (J.-C.K.)
| | - Jung-Cheol Kuk
- Department of Surgery, Soonchunhyang University Bucheon Hospital, Soonchunhyang University College of Medicine, Bucheon 14584, Republic of Korea; (Y.-Y.C.); (D.L.); (J.-C.K.)
| | - Hyun-Jung Kim
- Division of Hematology-Oncology, Department of Medicine, Soonchunhyang University Bucheon Hospital, Soonchunhyang University College of Medicine, Bucheon 14584, Republic of Korea; (H.-J.K.); (S.-K.P.)
| | - Seong-Kyu Park
- Division of Hematology-Oncology, Department of Medicine, Soonchunhyang University Bucheon Hospital, Soonchunhyang University College of Medicine, Bucheon 14584, Republic of Korea; (H.-J.K.); (S.-K.P.)
| |
Collapse
|
2
|
Smyth J, Godet J, Choudhary A, Das A, Gkoutos GV, Acharjee A. Microbiome-Based Colon Cancer Patient Stratification and Survival Analysis. Cancer Med 2024; 13:e70434. [PMID: 39569620 PMCID: PMC11579663 DOI: 10.1002/cam4.70434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 06/23/2024] [Accepted: 10/30/2024] [Indexed: 11/22/2024] Open
Abstract
BACKGROUND Colorectal cancer (CRC) is any cancer that starts in the colon or the rectum and presents a significant health concern. It is the third most diagnosed and the second deadliest cancer, with an estimated 153,020 new cases and 52,550 deaths in 2023. The severity of colon cancer may be attributed to its ability to avoid the host immune system and growth suppressors, its asymptomatic nature in the early stages, its association with a continually ageing population and unfavourable diet and obesity. The composition of the gut microbiome plays an important role in the development of CRC and presents as an important target in early detection and in predicting treatment outcomes in CRC. This study aims to identify microbiome-specific derived clusters in CRC patients and conduct subsequent survival analysis using the specific microbiome features within clusters. METHODS Consensus clustering and feature selection, involving a Kruskal-Wallis test, a random forest and least absolute shrinkage and selection operator (LASSO) were applied resulting in the identification of differently expressed microbiomes between clusters. Lastly, survival analysis was performed on the selected features using Kaplan-Meier curves and Cox regression. K-means clustering, as selected using consensus clustering interpretation, presented three distinct clusters with clear differences in alpha and beta diversity and baseline clinical variables. RESULTS A total 1311 of the 1406 microbes were selected using the Kruskal Wallis and passed to the random forest and LASSO, which narrowed the dataset to 140 features. Following the survival analysis, eight microbiome species, namely N4likevirus, Ambidensovirus, Synechococcus, Thermithiobacillus, Hydrocarboniphaga, Rhodovibrio, Gloeobacter and Candidatus Nitrosotenuis, were selected as significant in clustering and survival. CONCLUSION This study reveals the heterogeneity of the CRC microbiome and its effect on disease prognosis and necessitates further exploration of the biological mechanisms of these selected microbiomes as well further investigation of whether the approach depicted here is applicable to other cancer types.
Collapse
Affiliation(s)
- Joshua Smyth
- College of Medical and Health, School of Medical Sciences, Cancer and Genomic SciencesUniversity of BirminghamBirminghamUK
| | - Julien Godet
- Faculty of PharmacyUniversity of StrasbourgStrasbourgFrance
- ICube UMR 7357CNRS, FMTS, University of StrasbourgIllkirchFrance
- Medical Information DepartmentClinical Research Methods Group, University Hospitals of StrasbourgStrasbourgFrance
| | - Anisa Choudhary
- College of Medical and HealthInstitute of Clinical SciencesBirminghamUK
| | - Anubrata Das
- College of Medical and Health, School of Medical Sciences, Cancer and Genomic SciencesUniversity of BirminghamBirminghamUK
| | - Georgios V. Gkoutos
- College of Medical and Health, School of Medical Sciences, Cancer and Genomic SciencesUniversity of BirminghamBirminghamUK
- Institute of Translational MedicineUniversity Hospitals Birmingham NHS Foundation TrustBirminghamUK
- MRC Health Data Research UK (HDR)BirminghamUK
- Centre for Health Data ResearchUniversity of BirminghamBirminghamUK
| | - Animesh Acharjee
- College of Medical and Health, School of Medical Sciences, Cancer and Genomic SciencesUniversity of BirminghamBirminghamUK
- Institute of Translational MedicineUniversity Hospitals Birmingham NHS Foundation TrustBirminghamUK
- MRC Health Data Research UK (HDR)BirminghamUK
- Centre for Health Data ResearchUniversity of BirminghamBirminghamUK
| |
Collapse
|
3
|
Prasath ST, Navaneethan C. Colorectal cancer prognosis based on dietary pattern using synthetic minority oversampling technique with K-nearest neighbors approach. Sci Rep 2024; 14:17709. [PMID: 39085324 PMCID: PMC11292025 DOI: 10.1038/s41598-024-67848-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 07/16/2024] [Indexed: 08/02/2024] Open
Abstract
Generally, a person's life span depends on their food consumption because it may cause deadly diseases like colorectal cancer (CRC). In 2020, colorectal cancer accounted for one million fatalities globally, representing 10% of all cancer casualties. 76,679 males and 78,213 females over the age of 59 from ten states in the United States participated in this analysis. During follow-up, 1378 men and 981 women were diagnosed with colon cancer. This prospective cohort study used 231 food items and their variants as input features to identify CRC patients. Before labelling any foods as colorectal cancer-causing foods, it is ethical to analyse facts like how many grams of food should be consumed daily and how many times a week. This research examines five classification algorithms on real-time datasets: K-Nearest Neighbour (KNN), Decision Tree (DT), Random Forest (RF), Logistic Regression with Classifier Chain (LRCC), and Logistic Regression with Label Powerset (LRLC). Then, the SMOTE algorithm is applied to deal with and identify imbalances in the data. Our study shows that eating more than 10 g/d of low-fat butter in bread (RR 1.99, CI 0.91-4.39) and more than twice a week (RR 1.49, CI 0.93-2.38) increases CRC risk. Concerning beef, eating in excess of 74 g of beef steak daily (RR 0.88, CI 0.50-1.55) and having it more than once a week (RR 0.88, CI 0.62-1.23) decreases the risk of CRC, respectively. While eating beef and dairy products in a daily diet should be cautious about quantity. Consuming those items in moderation on a regular basis will protect us against CRC risk. Meanwhile, a high intake of poultry (RR 0.2, CI 0.05-0.81), fish (RR 0.82, CI 0.31-2.16), and pork (RR 0.67, CI 0.17-2.65) consumption negatively correlates to CRC hazards.
Collapse
Affiliation(s)
- S Thanga Prasath
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - C Navaneethan
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, Tamil Nadu, India.
| |
Collapse
|
4
|
Teixeira M, Silva F, Ferreira RM, Pereira T, Figueiredo C, Oliveira HP. A review of machine learning methods for cancer characterization from microbiome data. NPJ Precis Oncol 2024; 8:123. [PMID: 38816569 PMCID: PMC11139966 DOI: 10.1038/s41698-024-00617-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 05/17/2024] [Indexed: 06/01/2024] Open
Abstract
Recent studies have shown that the microbiome can impact cancer development, progression, and response to therapies suggesting microbiome-based approaches for cancer characterization. As cancer-related signatures are complex and implicate many taxa, their discovery often requires Machine Learning approaches. This review discusses Machine Learning methods for cancer characterization from microbiome data. It focuses on the implications of choices undertaken during sample collection, feature selection and pre-processing. It also discusses ML model selection, guiding how to choose an ML model, and model validation. Finally, it enumerates current limitations and how these may be surpassed. Proposed methods, often based on Random Forests, show promising results, however insufficient for widespread clinical usage. Studies often report conflicting results mainly due to ML models with poor generalizability. We expect that evaluating models with expanded, hold-out datasets, removing technical artifacts, exploring representations of the microbiome other than taxonomical profiles, leveraging advances in deep learning, and developing ML models better adapted to the characteristics of microbiome data will improve the performance and generalizability of models and enable their usage in the clinic.
Collapse
Affiliation(s)
- Marco Teixeira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal.
- Faculty of Engineering, University of Porto, Porto, Portugal.
| | - Francisco Silva
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Science, University of Porto, Porto, Portugal
| | - Rui M Ferreira
- Ipatimup - Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Instituto de Investigação e Inovação em Saúde, University of Porto, Porto, Portugal
| | - Tania Pereira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Sciences and Technology, University of Coimbra, Coimbra, Portugal
| | - Ceu Figueiredo
- Ipatimup - Institute of Molecular Pathology and Immunology of the University of Porto, Porto, Portugal
- Instituto de Investigação e Inovação em Saúde, University of Porto, Porto, Portugal
- Faculty of Medicine, University of Porto, Porto, Portugal
| | - Hélder P Oliveira
- Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
- Faculty of Science, University of Porto, Porto, Portugal
| |
Collapse
|
5
|
Díez López C, Montiel González D, Vidaki A, Kayser M. Prediction of Smoking Habits From Class-Imbalanced Saliva Microbiome Data Using Data Augmentation and Machine Learning. Front Microbiol 2022; 13:886201. [PMID: 35928158 PMCID: PMC9343866 DOI: 10.3389/fmicb.2022.886201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 06/21/2022] [Indexed: 11/24/2022] Open
Abstract
Human microbiome research is moving from characterization and association studies to translational applications in medical research, clinical diagnostics, and others. One of these applications is the prediction of human traits, where machine learning (ML) methods are often employed, but face practical challenges. Class imbalance in available microbiome data is one of the major problems, which, if unaccounted for, leads to spurious prediction accuracies and limits the classifier's generalization. Here, we investigated the predictability of smoking habits from class-imbalanced saliva microbiome data by combining data augmentation techniques to account for class imbalance with ML methods for prediction. We collected publicly available saliva 16S rRNA gene sequencing data and smoking habit metadata demonstrating a serious class imbalance problem, i.e., 175 current vs. 1,070 non-current smokers. Three data augmentation techniques (synthetic minority over-sampling technique, adaptive synthetic, and tree-based associative data augmentation) were applied together with seven ML methods: logistic regression, k-nearest neighbors, support vector machine with linear and radial kernels, decision trees, random forest, and extreme gradient boosting. K-fold nested cross-validation was used with the different augmented data types and baseline non-augmented data to validate the prediction outcome. Combining data augmentation with ML generally outperformed baseline methods in our dataset. The final prediction model combined tree-based associative data augmentation and support vector machine with linear kernel, and achieved a classification performance expressed as Matthews correlation coefficient of 0.36 and AUC of 0.81. Our method successfully addresses the problem of class imbalance in microbiome data for reliable prediction of smoking habits.
Collapse
Affiliation(s)
| | | | | | - Manfred Kayser
- Department of Genetic Identification, Erasmus MC University Medical Center Rotterdam, Rotterdam, Netherlands
| |
Collapse
|
6
|
Dahan E, Martin VM, Yassour M. EasyMap - An Interactive Web Tool for Evaluating and Comparing Associations of Clinical Variables and Microbiome Composition. Front Cell Infect Microbiol 2022; 12:854164. [PMID: 35646745 PMCID: PMC9136407 DOI: 10.3389/fcimb.2022.854164] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 04/05/2022] [Indexed: 12/30/2022] Open
Abstract
One of the most common tasks in microbiome studies is comparing microbial profiles across various groups of people (e.g., sick vs. healthy). Routinely, researchers use multivariate linear regression models to address these challenges, such as linear regression packages, MaAsLin2, LEfSe, etc. In many cases, it is unclear which metadata variables should be included in the linear model, as many human-associated variables are correlated with one another. Thus, multiple models are often tested, each including a different set of variables, however the challenge of selecting the metadata variables in the final model remains. Here, we present EasyMap, an interactive online tool allowing for (1) running multiple multivariate linear regression models, on the same features and metadata; (2) visualizing the associations between microbial features and clinical metadata found in each model; and (3) comparing across the various models to identify the critical metadata variables and select the optimal model. EasyMap provides a side-by-side visualization of association results across the various models, each with additional metadata variables, enabling us to evaluate the impact of each metadata variable on the associated feature. EasyMap’s interface enables filtering associations by significance, focusing on specific microbes and finding the robust associations that are found across multiple models. While EasyMap was designed to analyze microbiome data, it can handle any other tabular data with numeric features and metadata variables. EasyMap takes the common task of multivariate linear regression to the next level, with an intuitive and simple user interface, allowing for wide comparisons of multiple models to identify the robust microbial feature associations. EasyMap is available at http://yassour.rcs.huji.ac.il/easymap.
Collapse
Affiliation(s)
- Ehud Dahan
- Microbiology and Molecular Genetics, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Victoria M. Martin
- Department of Pediatrics, Massachusetts General Hospital, Boston, MA, United States
| | - Moran Yassour
- Microbiology and Molecular Genetics, Faculty of Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel
- School of Computer Science & Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
- *Correspondence: Moran Yassour,
| |
Collapse
|
7
|
You L, Zhou J, Xin Z, Hauck JS, Na F, Tang J, Zhou X, Lei Z, Ying B. Novel directions of precision oncology: circulating microbial DNA emerging in cancer-microbiome areas. PRECISION CLINICAL MEDICINE 2022; 5:pbac005. [PMID: 35692444 PMCID: PMC9026200 DOI: 10.1093/pcmedi/pbac005] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 01/26/2022] [Accepted: 01/27/2022] [Indexed: 02/05/2023] Open
Abstract
Microbiome research has extended into the cancer area in the past decades. Microbes can affect oncogenesis, progression, and treatment response through various mechanisms, including direct regulation and indirect impacts. Microbiota-associated detection methods and agents have been developed to facilitate cancer diagnosis and therapy. Additionally, the cancer microbiome has recently been redefined. The identification of intra-tumoral microbes and cancer-related circulating microbial DNA (cmDNA) has promoted novel research in the cancer-microbiome area. In this review, we define the human system of commensal microbes and the cancer microbiome from a brand-new perspective and emphasize the potential value of cmDNA as a promising biomarker in cancer liquid biopsy. We outline all existing studies on the relationship between cmDNA and cancer and the outlook for potential preclinical and clinical applications of cmDNA in cancer precision medicine, as well as critical problems to be overcome in this burgeoning field.
Collapse
Affiliation(s)
- Liting You
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Juan Zhou
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Zhaodan Xin
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| | - J Spencer Hauck
- Department of Pathology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Feifei Na
- Department of Thoracic Cancer, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Jie Tang
- Department of Clinical Laboratory, Mianyang Central Hospital, School of Medicine, University of Electronic Science and Technology of China, Mianyang 621000,China
| | - Xiaohan Zhou
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Zichen Lei
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Binwu Ying
- Department of Laboratory Medicine, West China Hospital, Sichuan University, Chengdu 610041, China
| |
Collapse
|
8
|
Li J, Liang K, Song X. Logistic regression with adaptive sparse group lasso penalty and its application in acute leukemia diagnosis. Comput Biol Med 2021; 141:105154. [PMID: 34952336 DOI: 10.1016/j.compbiomed.2021.105154] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 12/14/2021] [Accepted: 12/15/2021] [Indexed: 01/15/2023]
Abstract
Cancer diagnosis based on gene expression profile data has attracted extensive attention in computational biology and medicine. It suffers from three challenges in practical applications: noise, gene grouping, and adaptive gene selection. This paper aims to solve the above problems by developing the logistic regression with adaptive sparse group lasso penalty (LR-ASGL). A noise information processing method for cancer gene expression profile data is first presented via robust principal component analysis. Genes are then divided into groups by performing weighted gene co-expression network analysis on the clean matrix. By approximating the relative value of the noise size, gene reliability criterion and robust evaluation criterion are proposed. Finally, LR-ASGL is presented for simultaneous cancer diagnosis and adaptive gene selection. The performance of the proposed method is compared with the other four methods in three simulation settings: Gaussian noise, uniformly distributed noise, and mixed noise. The acute leukemia data are adopted as an experimental example to demonstrate the advantages of LR-ASGL in prediction and gene selection.
Collapse
Affiliation(s)
- Juntao Li
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China.
| | - Ke Liang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, China.
| | - Xuekun Song
- College of Information Technology, Henan University of Chinese Medicine, Zhengzhou, 450046, China.
| |
Collapse
|
9
|
Wang Y, Guo H, Gao X, Wang J. The Intratumor Microbiota Signatures Associate With Subtype, Tumor Stage, and Survival Status of Esophageal Carcinoma. Front Oncol 2021; 11:754788. [PMID: 34778069 PMCID: PMC8578860 DOI: 10.3389/fonc.2021.754788] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 10/11/2021] [Indexed: 12/23/2022] Open
Abstract
Altered human microbiome characteristic has been linked with esophageal carcinoma (ESCA), analysis of microbial profiling directly derived from ESCA tumor tissue is beneficial for studying the microbial functions in tumorigenesis and development of ESCA. In this study, we identified the intratumor microbiome signature and investigated the correlation between microbes and clinical characteristics of patients with ESCA, on the basis of data and information obtained from The Cancer Microbiome Atlas (TCMA) and The Cancer Genome Atlas (TCGA) databases. A total of 82 samples were analyzed for microbial composition at various taxonomic levels, including 40 tumor samples of esophageal squamous cell carcinoma (ESCC), 20 tumor samples of esophageal adenocarcinoma (EAD), and 22 adjacent normal samples. The results showed that the relative abundance of several microbes changed in tumors compared to their paired normal tissues, such as Firmicutes increased significantly while Proteobacteria decreased in tumor samples. We also identified a microbial signature composed of ten microbes that may help in the classification of ESCC and EAD, the two subtypes of ESCA. Correlation analysis demonstrated that compositions of microbes Fusobacteria/Fusobacteriia/Fusobacteriales, Lactobacillales/Lactobacillaceae/Lactobacillus, Clostridia/Clostridiales, Proteobacteria, and Negativicutes were correlated with the clinical characteristics of ESCA patients. In summary, this study supports the feasibility of detecting intratumor microbial composition derived from tumor sequencing data, and it provides novel insights into the roles of microbiota in tumors. Ultimately, as the second genome of human body, microbiome signature analysis may help to add more information to the blueprint of human biology.
Collapse
Affiliation(s)
- Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China
| | - Hua Guo
- Department of Nursing, Shaanxi Provincial People's Hospital, Xi'an, China
| | - Xiaoguang Gao
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an, China
| | - Jihan Wang
- Institute of Medical Research, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
10
|
Convolutional neural network for human cancer types prediction by integrating protein interaction networks and omics data. Sci Rep 2021; 11:20691. [PMID: 34667236 PMCID: PMC8526703 DOI: 10.1038/s41598-021-98814-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Accepted: 09/14/2021] [Indexed: 02/07/2023] Open
Abstract
Many studies have proven the power of gene expression profile in cancer identification, however, the explosive growth of genomics data increasing needs of tools for cancer diagnosis and prognosis in high accuracy and short times. Here, we collected 6136 human samples from 11 cancer types, and integrated their gene expression profiles and protein-protein interaction (PPI) network to generate 2D images with spectral clustering method. To predict normal samples and 11 cancer tumor types, the images of these 6136 human cancer network were separated into training and validation dataset to develop convolutional neural network (CNN). Our model showed 97.4% and 95.4% accuracies in identification of normal versus tumors and 11 cancer types, respectively. We also provided the results that tumors located in neighboring tissues or in the same cell types, would induce machine make error classification due to the similar gene expression profiles. Furthermore, we observed some patients may exhibit better prognosis if their tumors often misjudged into normal samples. As far as we know, we are the first to generate thousands of cancer networks to predict and classify multiple cancer types with CNN architecture. We believe that our model not only can be applied to cancer diagnosis and prognosis, but also promote the discovery of multiple cancer biomarkers.
Collapse
|
11
|
Yuan F, Li Z, Chen L, Zeng T, Zhang YH, Ding S, Huang T, Cai YD. Identifying the Signatures and Rules of Circulating Extracellular MicroRNA for Distinguishing Cancer Subtypes. Front Genet 2021; 12:651610. [PMID: 33767734 PMCID: PMC7985347 DOI: 10.3389/fgene.2021.651610] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 02/10/2021] [Indexed: 12/24/2022] Open
Abstract
Cancer is one of the most threatening diseases to humans. It can invade multiple significant organs, including lung, liver, stomach, pancreas, and even brain. The identification of cancer biomarkers is one of the most significant components of cancer studies as the foundation of clinical cancer diagnosis and related drug development. During the large-scale screening for cancer prevention and early diagnosis, obtaining cancer-related tissues is impossible. Thus, the identification of cancer-associated circulating biomarkers from liquid biopsy targeting has been proposed and has become the most important direction for research on clinical cancer diagnosis. Here, we analyzed pan-cancer extracellular microRNA profiles by using multiple machine-learning models. The extracellular microRNA profiles on 11 cancer types and non-cancer were first analyzed by Boruta to extract important microRNAs. Selected microRNAs were then evaluated by the Max-Relevance and Min-Redundancy feature selection method, resulting in a feature list, which were fed into the incremental feature selection method to identify candidate circulating extracellular microRNA for cancer recognition and classification. A series of quantitative classification rules was also established for such cancer classification, thereby providing a solid research foundation for further biomarker exploration and functional analyses of tumorigenesis at the level of circulating extracellular microRNA.
Collapse
Affiliation(s)
- Fei Yuan
- School of Life Sciences, Shanghai University, Shanghai, China
- Department of Science and Technology, Binzhou Medical University Hospital, Binzhou, China
| | - Zhandong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Tao Zeng
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Hang Zhang
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Shijian Ding
- School of Life Sciences, Shanghai University, Shanghai, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
12
|
Identifying the Immunological Gene Signatures of Immune Cell Subtypes. BIOMED RESEARCH INTERNATIONAL 2021. [DOI: 10.1155/2021/6639698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The immune system is a complicated defensive system that comprises multiple functional cells and molecules acting against endogenous and exogenous pathogenic factors. Identifying immune cell subtypes and recognizing their unique immunological functions are difficult because of the complicated cellular components and immunological functions of the immune system. With the development of transcriptomics and high-throughput sequencing, the gene expression profiling of immune cells can provide a new strategy to explore the immune cell subtyping. On the basis of the new profiling data of mouse immune cell gene expression from the Immunological Genome Project (ImmGen), a novel computational pipeline was applied to identify different immune cell subtypes, including αβ T cells, B cells, γδ T cells, and innate lymphocytes. First, the profiling data was analyzed by a powerful feature selection method, Monte-Carlo Feature Selection, resulting in a feature list and some informative features. For the list, the two-stage incremental feature selection method, incorporating random forest as the classification algorithm, was applied to extract essential gene signatures and build an efficient classifier. On the other hand, a rule learning scheme was applied on the informative features to construct quantitative expression rules. A group of gene signatures was found as qualitatively related to the biological processes of four immune cell subtypes. The quantitative expression rules can efficiently cluster immune cells. This work provides a novel computational tool for immune cell quantitative subtyping and biomarker recognition.
Collapse
|
13
|
Pan X, Li H, Zeng T, Li Z, Chen L, Huang T, Cai YD. Identification of Protein Subcellular Localization With Network and Functional Embeddings. Front Genet 2021; 11:626500. [PMID: 33584818 PMCID: PMC7873866 DOI: 10.3389/fgene.2020.626500] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 12/21/2020] [Indexed: 01/15/2023] Open
Abstract
The functions of proteins are mainly determined by their subcellular localizations in cells. Currently, many computational methods for predicting the subcellular localization of proteins have been proposed. However, these methods require further improvement, especially when used in protein representations. In this study, we present an embedding-based method for predicting the subcellular localization of proteins. We first learn the functional embeddings of KEGG/GO terms, which are further used in representing proteins. Then, we characterize the network embeddings of proteins on a protein–protein network. The functional and network embeddings are combined as novel representations of protein locations for the construction of the final classification model. In our collected benchmark dataset with 4,861 proteins from 16 locations, the best model shows a Matthews correlation coefficient of 0.872 and is thus superior to multiple conventional methods.
Collapse
Affiliation(s)
- Xiaoyong Pan
- School of Life Sciences, Shanghai University, Shanghai, China.,Key Laboratory of System Control and Information Processing, Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Ministry of Education of China, Shanghai, China
| | - Hao Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Zeng
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Zhandong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Tao Huang
- Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
14
|
Zhang YH, Li H, Zeng T, Chen L, Li Z, Huang T, Cai YD. Identifying Transcriptomic Signatures and Rules for SARS-CoV-2 Infection. Front Cell Dev Biol 2021; 8:627302. [PMID: 33505977 PMCID: PMC7829664 DOI: 10.3389/fcell.2020.627302] [Citation(s) in RCA: 51] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 12/14/2020] [Indexed: 12/26/2022] Open
Abstract
The world-wide Coronavirus Disease 2019 (COVID-19) pandemic was triggered by the widespread of a new strain of coronavirus named as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Multiple studies on the pathogenesis of SARS-CoV-2 have been conducted immediately after the spread of the disease. However, the molecular pathogenesis of the virus and related diseases has still not been fully revealed. In this study, we attempted to identify new transcriptomic signatures as candidate diagnostic models for clinical testing or as therapeutic targets for vaccine design. Using the recently reported transcriptomics data of upper airway tissue with acute respiratory illnesses, we integrated multiple machine learning methods to identify effective qualitative biomarkers and quantitative rules for the distinction of SARS-CoV-2 infection from other infectious diseases. The transcriptomics data was first analyzed by Boruta so that important features were selected, which were further evaluated by the minimum redundancy maximum relevance method. A feature list was produced. This list was fed into the incremental feature selection, incorporating some classification algorithms, to extract qualitative biomarker genes and construct quantitative rules. Also, an efficient classifier was built to identify patients infected with SARS-COV-2. The findings reported in this study may help in revealing the potential pathogenic mechanisms of COVID-19 and finding new targets for vaccine design.
Collapse
Affiliation(s)
- Yu-Hang Zhang
- School of Life Sciences, Shanghai University, Shanghai, China
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, United States
| | - Hao Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Zeng
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, China
| | - Zhandong Li
- College of Food Engineering, Jilin Engineering Normal University, Changchun, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|