Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Li Y, Umbach DM, Bingham A, Li QJ, Zhuang Y, Li L. Putative biomarkers for predicting tumor sample purity based on gene expression data. BMC Genomics 2019;20:1021. [PMID: 31881847 PMCID: PMC6933652 DOI: 10.1186/s12864-019-6412-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 12/18/2019] [Indexed: 12/29/2022] Open

For:	Li Y, Umbach DM, Bingham A, Li QJ, Zhuang Y, Li L. Putative biomarkers for predicting tumor sample purity based on gene expression data. BMC Genomics 2019;20:1021. [PMID: 31881847 PMCID: PMC6933652 DOI: 10.1186/s12864-019-6412-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 12/18/2019] [Indexed: 12/29/2022] Open

Number	Cited by Other Article(s)
1	Modeling type 1 diabetes progression using machine learning and single-cell transcriptomic measurements in human islets. Cell Rep Med 2024;5:101535. [PMID: 38677282 DOI: 10.1016/j.xcrm.2024.101535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 01/22/2024] [Accepted: 04/07/2024] [Indexed: 04/29/2024] Abstract Type 1 diabetes (T1D) is a chronic condition in which beta cells are destroyed by immune cells. Despite progress in immunotherapies that could delay T1D onset, early detection of autoimmunity remains challenging. Here, we evaluate the utility of machine learning for early prediction of T1D using single-cell analysis of islets. Using gradient-boosting algorithms, we model changes in gene expression of single cells from pancreatic tissues in T1D and non-diabetic organ donors. We assess if mathematical modeling could predict the likelihood of T1D development in non-diabetic autoantibody-positive donors. While most autoantibody-positive donors are predicted to be non-diabetic, select donors with unique gene signatures are classified as T1D. Our strategy also reveals a shared gene signature in distinct T1D-associated models across cell types, suggesting a common effect of the disease on transcriptional outputs of these cells. Our study establishes a precedent for using machine learning in early detection of T1D. Collapse Key Words autoantibody-positive human islets machine learning single-cell RNA-seq type 1 diabetes Collapse MESH Headings Humans Diabetes Mellitus, Type 1/genetics Diabetes Mellitus, Type 1/immunology Diabetes Mellitus, Type 1/pathology Machine Learning Single-Cell Analysis/methods Islets of Langerhans/metabolism Islets of Langerhans/immunology Transcriptome/genetics Disease Progression Autoantibodies/immunology Gene Expression Profiling/methods Male Female Insulin-Secreting Cells/metabolism Adult Collapse Grants Collapse Affiliation(s) Collapse
2	Identification of cancer risk groups through multi-omics integration using autoencoder and tensor analysis. Sci Rep 2024;14:11263. [PMID: 38760420 PMCID: PMC11101416 DOI: 10.1038/s41598-024-59670-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/12/2024] [Indexed: 05/19/2024] Open Abstract Identifying cancer risk groups by multi-omics has attracted researchers in their quest to find biomarkers from diverse risk-related omics. Stratifying the patients into cancer risk groups using genomics is essential for clinicians for pre-prevention treatment to improve the survival time for patients and identify the appropriate therapy strategies. This study proposes a multi-omics framework that can extract the features from various omics simultaneously. The framework employs autoencoders to learn the non-linear representation of the data and applies tensor analysis for feature learning. Further, the clustering method is used to stratify the patients into multiple cancer risk groups. Several omics were included in the experiments, namely methylation, somatic copy-number variation (SCNV), micro RNA (miRNA) and RNA sequencing (RNAseq) from two cancer types, including Glioma and Breast Invasive Carcinoma from the TCGA dataset. The results of this study are promising, as evidenced by the survival analysis and classification models, which outperformed the state-of-the-art. The patients can be significantly (p-value<0.05) divided into risk groups using extracted latent variables from the fused multi-omics data. The pipeline is open source to help researchers and clinicians identify the patients' risk groups using genomics. Collapse Key Words breast cancer cancer genomics cancer prevention computational models data mining machine learning predictive medicine Collapse MESH Headings Humans Genomics/methods DNA Copy Number Variations DNA Methylation Neoplasms/genetics MicroRNAs/genetics Female Biomarkers, Tumor/genetics Glioma/genetics Glioma/pathology Breast Neoplasms/genetics Breast Neoplasms/pathology Multiomics Collapse Grants Collapse Affiliation(s) Collapse
3	High expression of CCDC69 is correlated with immunotherapy response and protective effects on breast cancer. BMC Cancer 2023;23:974. [PMID: 37828454 PMCID: PMC10571395 DOI: 10.1186/s12885-023-11411-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 09/16/2023] [Indexed: 10/14/2023] Open Abstract BACKGROUND As a molecule controlling the assembly of central spindles and recruitment of midzone component, coiled-coil domain-containing protein 69 (CCDC69) plays an important role in multiple cancers. Currently, the relationships between CCDC69 and immune infiltration or immunotherapy in breast cancer remain unclear. METHODS The expression and prognostic significance of CCDC69 in breast cancer were comprehensively analyzed by quantitative real-time PCR, immunohistochemical staining and various databases. The data source of differentially expressed genes, gene set enrichment analysis, and immune cell infiltration analysis came from The Cancer Genome Atlas (TCGA) database. Single-cell analysis based on IMMUcan database was used. The protein-protein interaction network was developed applying STRING, Cytoscape, CytoHubba, and GeneMANIA. TISIDB was employed in analyzing the CCDC69 co-expressed immune related genes. The correlations between CCDC69 and immunotherapy or immune-related scores were analyzed by CAMOIP and TISMO. Ctr-db was also used to conduct drug sensitivity analysis. RESULTS The mRNA of CCDC69 was downregulated in breast cancer tissues compared with normal tissues. Higher CCDC69 expression was associated with a better breast cancer prognosis. Enrichment analysis showed that the co-expression genes of CCDC69 were mainly related to immune-related pathways. The expression of CCDC69 was found to be positively correlated with multiple tumor-suppression immune infiltration cells, especially T cells and dendritic cells. Meanwhile, high CCDC69 expression can predict better immunotherapy responses when compared with low CCDC69 expression. After the interferon-gamma treatment, the CCDC69 expression was elevated in vitro. CCDC69 expression was a reliable predictor for the response status of two therapeutic strategies in breast cancer. CONCLUSIONS Our research revealed the clinical significance of CCDC69 in breast cancer and validated the critical roles of CCDC69 in the tumor immune infiltration and immunotherapy responses. Collapse Key Words Biomarker Breast cancer CCDC69 Immune infiltration Immunotherapy Collapse MESH Headings Humans Female Breast Neoplasms/genetics Breast Neoplasms/therapy Immunotherapy Breast Clinical Relevance Cytoskeleton Prognosis Microtubule-Associated Proteins Collapse Grants 21-173-9-07 2021 Science and Technology project of Shenyang 2023JH2/101300048 2023 Applied Basic research project of Liaoning province Collapse Affiliation(s) Collapse
4	Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches. Genes (Basel) 2023;14:1820. [PMID: 37761960 PMCID: PMC10530902 DOI: 10.3390/genes14091820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/14/2023] [Accepted: 09/15/2023] [Indexed: 09/29/2023] Open Abstract Cancer metastasis accounts for approximately 90% of cancer deaths, and elucidating markers in metastasis is the first step in its prevention. To characterize metastasis marker genes (MGs) of breast cancer, XGBoost models that classify metastasis status were trained with gene expression profiles from TCGA. Then, a metastasis score (MS) was assigned to each gene by calculating the inner product between the feature importance and the AUC performance of the models. As a result, 54, 202, and 357 genes with the highest MS were characterized as MGs by empirical p-value cutoffs of 0.001, 0.005, and 0.01, respectively. The three sets of MGs were compared with those from existing metastasis marker databases, which provided significant results in most comparisons (p-value < 0.05). They were also significantly enriched in biological processes associated with breast cancer metastasis. The three MGs, SPPL2C, KRT23, and RGS7, showed highly significant results (p-value < 0.01) in the survival analysis. The MGs that could not be identified by statistical analysis (e.g., GOLM1, ELAVL1, UBP1, and AZGP1), as well as the MGs with the highest MS (e.g., ZNF676, FAM163B, LDOC2, IRF1, and STK40), were verified via the literature. Additionally, we checked how close the MGs were to each other in the protein-protein interaction networks. We expect that the characterized markers will help understand and prevent breast cancer metastasis. Collapse Key Words XGBoost breast cancer feature importance gene expression machine learning metastasis marker Collapse MESH Headings Humans Female Breast Neoplasms/pathology Transcriptome Protein Interaction Maps Machine Learning Neoplasms, Second Primary Membrane Proteins/genetics RGS Proteins/genetics Collapse Grants Collapse Affiliation(s) Collapse
5	AIVariant: a deep learning-based somatic variant detector for highly contaminated tumor samples. Exp Mol Med 2023;55:1734-1742. [PMID: 37524869 PMCID: PMC10474289 DOI: 10.1038/s12276-023-01049-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/10/2023] [Accepted: 04/24/2023] [Indexed: 08/02/2023] Open Abstract The detection of somatic DNA variants in tumor samples with low tumor purity or sequencing depth remains a daunting challenge despite numerous attempts to address this problem. In this study, we constructed a substantially extended set of actual positive variants originating from a wide range of tumor purities and sequencing depths, as well as actual negative variants derived from sequencer-specific sequencing errors. A deep learning model named AIVariant, trained on this extended dataset, outperforms previously reported methods when tested under various tumor purities and sequencing depths, especially low tumor purity and sequencing depth. Collapse Key Words machine learning cancer data processing Collapse MESH Headings Humans Gene Frequency Computational Biology/methods Deep Learning Algorithms Neoplasms/genetics Neoplasms/diagnosis Mutation Collapse Grants National Research Foundation of Korea (NRF) Korea Health Industry Development Institute (KHIDI) Collapse Affiliation(s) Collapse
6	Hyper-methylation of ABCG1 as an epigenetics biomarker in non-small cell lung cancer. Funct Integr Genomics 2023;23:256. [PMID: 37523012 DOI: 10.1007/s10142-023-01185-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 07/22/2023] [Accepted: 07/24/2023] [Indexed: 08/01/2023] Abstract Non-small cell lung cancer (NSCLC) is the most prevalent histological type of lung cancer and the leading cause of death globally. Patients with NSCLC have a poor prognosis for various factors, and a late diagnosis is one of them. The DNA methylation of CpG island sequences found in the promoter regions of tumor suppressor genes has recently received attention as a potential biomarker of human cancer. In this study, we report DNA methylation changes of the adenosine triphosphate (ATP)-binding cassette transporter G1 (ABCG1), which belongs to the ATP cassette transporter family in NSCLC patients. Our results demonstrate that ABCG1 is hyper-methylation in NSCLC samples, and these changes are negatively correlated to gene and protein expression. Furthermore, the expression of the ABCG1 gene is significantly associated with the survival time of lung adenocarcinoma (LUAD) patients; however, it did not show a correlation to overall survival (OS) of lung squamous cell carcinoma (LUSC) patients. Notably, we found ABCG1 methylation status at locus cg20214535 is strongly associated with the survival time and consistently observed hyper-methylation in LUAD samples. This novel finding suggests ABCG1 is a potential candidate for targeted therapy in lung cancer via this specific probe. In addition, we illustrate the protein-protein interaction (PPI) of ABCG1 with other proteins and the strong communication of ABCG1 with immune cells. Collapse Key Words ABCG1 Bioinformatics DNA methylation Epigenetics Genomics analysis Non-small cell lung cancer Collapse MESH Headings Humans Carcinoma, Non-Small-Cell Lung/genetics Carcinoma, Non-Small-Cell Lung/pathology Lung Neoplasms/pathology Adenocarcinoma of Lung/genetics Adenocarcinoma of Lung/pathology DNA Methylation Epigenesis, Genetic Biomarkers, Tumor/genetics Biomarkers, Tumor/metabolism ATP Binding Cassette Transporter, Subfamily G, Member 1/genetics ATP Binding Cassette Transporter, Subfamily G, Member 1/metabolism Collapse Grants Collapse Affiliation(s) Collapse
7	Quantifying Intratumoral Heterogeneity and Immunoarchitecture Generated In-Silico by a Spatial Quantitative Systems Pharmacology Model. Cancers (Basel) 2023;15:2750. [PMID: 37345087 DOI: 10.3390/cancers15102750] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 05/05/2023] [Accepted: 05/11/2023] [Indexed: 06/23/2023] Open Abstract Spatial heterogeneity is a hallmark of cancer. Tumor heterogeneity can vary with time and location. The tumor microenvironment (TME) encompasses various cell types and their interactions that impart response to therapies. Therefore, a quantitative evaluation of tumor heterogeneity is crucial for the development of effective treatments. Different approaches, such as multiregional sequencing, spatial transcriptomics, analysis of autopsy samples, and longitudinal analysis of biopsy samples, can be used to analyze the intratumoral heterogeneity (ITH) and temporal evolution and to reveal the mechanisms of therapeutic response. However, because of the limitations of these data and the uncertainty associated with the time points of sample collection, having a complete understanding of intratumoral heterogeneity role is challenging. Here, we used a hybrid model that integrates a whole-patient compartmental quantitative-systems-pharmacology (QSP) model with a spatial agent-based model (ABM) describing the TME; we applied four spatial metrics to quantify model-simulated intratumoral heterogeneity and classified the TME immunoarchitecture for representative cases of effective and ineffective anti-PD-1 therapy. The four metrics, adopted from computational digital pathology, included mixing score, average neighbor frequency, Shannon's entropy and area under the curve (AUC) of the G-cross function. A fifth non-spatial metric was used to supplement the analysis, which was the ratio of the number of cancer cells to immune cells. These metrics were utilized to classify the TME as "cold", "compartmentalized" and "mixed", which were related to treatment efficacy. The trends in these metrics for effective and ineffective treatments are in qualitative agreement with the clinical literature, indicating that compartmentalized immunoarchitecture is likely to result in more efficacious treatment outcomes. Collapse Key Words agent-based Model (ABM) computational digital pathology immune checkpoint inhibitor immunoarchitecture intratumoral heterogeneity quantitative systems pharmacology (QSP) Collapse MESH Headings Collapse Grants R01CA138264 National Institute of Health U01CA212007 National Institute of Health OAC1920103 National Science Foundation Collapse Affiliation(s) Collapse
8	CCDC69 is a prognostic marker of breast cancer and correlates with tumor immune cell infiltration. Front Surg 2022;9:879921. [PMID: 35910470 PMCID: PMC9334777 DOI: 10.3389/fsurg.2022.879921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 06/28/2022] [Indexed: 12/24/2022] Open Abstract Purpose Breast cancer (BC) is the most common malignancy and the leading cause of cancer-related death among women worldwide. Early detection, treatment, and metastasis monitoring are very important for the prognosis of BC patients. Therefore, effective biomarkers need to be explored to help monitor the prognosis of BC patients and guide treatment decisions. Methods In this study, the relationship between CCDC69 expression levels and tumor clinical characteristics were analyzed using RNA-seq information in BC samples from the TCGA database. Kaplan-Meier survival analysis was performed to analyze the prognostic value of CCDC69 in BC patients. Besides, gene enrichment analysis in BC samples was used to confirm the main function of CCDC69 in BC. The correlation between the expression of CCDC69 and the number of tumor-infiltrating lymphocytes was confirmed by interaction analysis of TIMER and GEPIA. Results The results showed that CCDC69 expression was significantly lower in cancer samples than in normal tissues, and was significantly lower in highly invasive BC than in carcinoma in situ. Meanwhile, low levels of CCDC69 were associated with a further poor prognosis. CDCC69 expression was positively correlated with the amount of different tumor-infiltrating lymphocytes. Mechanically, it could be presumed that the low expression of CCDC69 in BC might be caused by hypermethylation of the promoter region. Conclusions Summarily, CDCC69 could be used as a potential biomarker to predict the prognosis of BC and the sensitivity to immunotherapy such as PD-1/PD-L1 checkpoint inhibitors. Collapse Key Words Collapse MESH Headings Collapse Grants Wuhan Science and Technology Bureau Collapse Affiliation(s) Collapse
9	Assessment of MicroRNAs Associated with Tumor Purity by Random Forest Regression. BIOLOGY 2022;11:biology11050787. [PMID: 35625515 PMCID: PMC9138977 DOI: 10.3390/biology11050787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 05/17/2022] [Accepted: 05/18/2022] [Indexed: 11/16/2022] Abstract Simple Summary Cancer is a disease with high mortality and recurrence rates. To understand cancer biology, it is important to accurately determine the proportion of tumor and non-tumor cells in tumor tissues. In this study, the proportion of tumor cells in tumor tissues was predicted using miRNA expression data that had not been sufficiently studied before. Using a random forest regression model, the tumor purity was predicted accurately, and subsequent investigations into the association between the informative microRNAs and tumor purity could be conducted. Abstract Tumor purity refers to the proportion of tumor cells in tumor tissue samples. This value plays an important role in understanding the mechanisms of the tumor microenvironment. Although various attempts have been made to predict tumor purity, attempts to predict tumor purity using miRNAs are still lacking. We predicted tumor purity using miRNA expression data for 16 TCGA tumor types using random forest regression. In addition, we identified miRNAs with high feature-importance scores and examined the extent of the change in predictive performance using informative miRNAs. The predictive performance obtained using only 10 miRNAs with high feature importance was close to the result obtained using all miRNAs. Furthermore, we also found genes targeted by miRNAs and confirmed that these genes were mainly related to immune and cancer pathways. Therefore, we found that the miRNA expression data could predict tumor purity well, and the results suggested the possibility that 10 miRNAs with high feature importance could be used as potential markers to predict tumor purity and to help improve our understanding of the tumor microenvironment. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
10	Obtaining spatially resolved tumor purity maps using deep multiple instance learning in a pan-cancer study. PATTERNS (NEW YORK, N.Y.) 2022;3:100399. [PMID: 35199060 PMCID: PMC8848022 DOI: 10.1016/j.patter.2021.100399] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 09/07/2021] [Accepted: 11/03/2021] [Indexed: 02/07/2023] Abstract Tumor purity is the percentage of cancer cells within a tissue section. Pathologists estimate tumor purity to select samples for genomic analysis by manually reading hematoxylin-eosin (H&E)-stained slides, which is tedious, time consuming, and prone to inter-observer variability. Besides, pathologists' estimates do not correlate well with genomic tumor purity values, which are inferred from genomic data and accepted as accurate for downstream analysis. We developed a deep multiple instance learning model predicting tumor purity from H&E-stained digital histopathology slides. Our model successfully predicted tumor purity in eight The Cancer Genome Atlas (TCGA) cohorts and a local Singapore cohort. The predictions were highly consistent with genomic tumor purity values. Thus, our model can be utilized to select samples for genomic analysis, which will help reduce pathologists' workload and decrease inter-observer variability. Furthermore, our model provided tumor purity maps showing the spatial variation within sections. They can help better understand the tumor microenvironment. • MIL model successfully predicts a sample's tumor purity from histopathology slides • MIL model learns to spatially resolve tumor purity from sample-level labels • Tumor purity varies spatially within a sample • Pathologists’ region selection is vital for correct percentage tumor nuclei estimation Given some big data and coarse-level labels, extracting fine-level information is a demanding yet rewarding challenge in data science. This study develops a machine learning model utilizing big data and exploiting coarse-level labels to reveal fine-level details within the data. Although it can be applied to different data science tasks with enormous data and coarse labels, we applied it to a computational histopathology task with gigapixel histopathology slides and sample-level labels. Specifically, the model revealed spatial resolution of tumor purity within histopathology slides using only sample-level genomic tumor purity values during training. This can also be extended to other omics features, providing precious information about cancer biology and promising personalized, precision medicine. Such studies are of great clinical importance in discovering imaging biomarkers and better understanding the tumor microenvironment. Collapse Key Words computational pathology deep learning digital histopathology digital pathology genomic sequencing multiple instance learning spatial omics tumor microenvironment tumor purity whole-slide images Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
11	The Analysis of Gene Expression Data Incorporating Tumor Purity Information. Front Genet 2021;12:642759. [PMID: 34497631 PMCID: PMC8419469 DOI: 10.3389/fgene.2021.642759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 07/30/2021] [Indexed: 12/03/2022] Open Abstract The tumor microenvironment is composed of tumor cells, stroma cells, immune cells, blood vessels, and other associated non-cancerous cells. Gene expression measurements on tumor samples are an average over cells in the microenvironment. However, research questions often seek answers about tumor cells rather than the surrounding non-tumor tissue. Previous studies have suggested that the tumor purity (TP)-the proportion of tumor cells in a solid tumor sample-has a confounding effect on differential expression (DE) analysis of high vs. low survival groups. We investigate three ways incorporating the TP information in the two statistical methods used for analyzing gene expression data, namely, differential network (DN) analysis and DE analysis. Analysis 1 ignores the TP information completely, Analysis 2 uses a truncated sample by removing the low TP samples, and Analysis 3 uses TP as a covariate in the underlying statistical models. We use three gene expression data sets related to three different cancers from the Cancer Genome Atlas (TCGA) for our investigation. The networks from Analysis 2 have greater amount of differential connectivity in the two networks than that from Analysis 1 in all three cancer datasets. Similarly, Analysis 1 identified more differentially expressed genes than Analysis 2. Results of DN and DE analyses using Analysis 3 were mostly consistent with those of Analysis 1 across three cancers. However, Analysis 3 identified additional cancer-related genes in both DN and DE analyses. Our findings suggest that using TP as a covariate in a linear model is appropriate for DE analysis, but a more robust model is needed for DN analysis. However, because true DN or DE patterns are not known for the empirical datasets, simulated datasets can be used to study the statistical properties of these methods in future studies. Collapse Key Words RNA-seq data confounding effects differential gene expression analysis differential network analysis gene expression data tumor purity Collapse MESH Headings Collapse Grants T32 AA025877 NIAAA NIH HHS Collapse Affiliation(s) Collapse
12	Knockoff boosted tree for model-free variable selection. Bioinformatics 2021;37:976-983. [PMID: 32966559 DOI: 10.1093/bioinformatics/btaa770] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2020] [Revised: 08/17/2020] [Accepted: 09/09/2020] [Indexed: 11/13/2022] Open Abstract MOTIVATION The recently proposed knockoff filter is a general framework for controlling the false discovery rate (FDR) when performing variable selection. This powerful new approach generates a 'knockoff' of each variable tested for exact FDR control. Imitation variables that mimic the correlation structure found within the original variables serve as negative controls for statistical inference. Current applications of knockoff methods use linear regression models and conduct variable selection only for variables existing in model functions. Here, we extend the use of knockoffs for machine learning with boosted trees, which are successful and widely used in problems where no prior knowledge of model function is required. However, currently available importance scores in tree models are insufficient for variable selection with FDR control. RESULTS We propose a novel strategy for conducting variable selection without prior model topology knowledge using the knockoff method with boosted tree models. We extend the current knockoff method to model-free variable selection through the use of tree-based models. Additionally, we propose and evaluate two new sampling methods for generating knockoffs, namely the sparse covariance and principal component knockoff methods. We test and compare these methods with the original knockoff method regarding their ability to control type I errors and power. In simulation tests, we compare the properties and performance of importance test statistics of tree models. The results include different combinations of knockoffs and importance test statistics. We consider scenarios that include main-effect, interaction, exponential and second-order models while assuming the true model structures are unknown. We apply our algorithm for tumor purity estimation and tumor classification using Cancer Genome Atlas (TCGA) gene expression data. Our results show improved discrimination between difficult-to-discriminate cancer types. AVAILABILITY AND IMPLEMENTATION The proposed algorithm is included in the KOBT package, which is available at https://cran.r-project.org/web/packages/KOBT/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
13	Prediction of tumor purity from gene expression data using machine learning. Brief Bioinform 2021;22:6265216. [PMID: 33954576 DOI: 10.1093/bib/bbab163] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 04/06/2021] [Accepted: 04/07/2021] [Indexed: 01/11/2023] Open Abstract MOTIVATION Bulk tumor samples used for high-throughput molecular profiling are often an admixture of cancer cells and non-cancerous cells, which include immune and stromal cells. The mixed composition can confound the analysis and affect the biological interpretation of the results, and thus, accurate prediction of tumor purity is critical. Although several methods have been proposed to predict tumor purity using high-throughput molecular data, there has been no comprehensive study on machine learning-based methods for the estimation of tumor purity. RESULTS We applied various machine learning models to estimate tumor purity. Overall, the models predicted the tumor purity accurately and showed a high correlation with well-established gold standard methods. In addition, we identified a small group of genes and demonstrated that they could predict tumor purity well. Finally, we confirmed that these genes were mainly involved in the immune system. AVAILABILITY The machine learning models constructed for this study are available at https://github.com/BonilKoo/ML_purity. Collapse Key Words cancer genomics machine learning regression tumor purity Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse
14	Study on <em>CCDC69</em> interfering with the prognosis of patients with breast cancer through PPAR signal pathway. Eur J Histochem 2021;65. [PMID: 33634680 PMCID: PMC7922363 DOI: 10.4081/ejh.2021.3207] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 01/27/2021] [Indexed: 12/31/2022] Open Abstract Coiled-coil domain-containing protein 69 (CCDC69) is a novel gene and limited knowledge in known in breast cancer. In the present study, we aimed to explore the relationship between CCDC69 and breast cancer, demonstrate the clinicopathological significance and prognostic role of CCDC69 in breast cancer, and analyze the possible mechanism of CCDC69 affecting the prognosis of breast cancer. First, from GEO database, TIMER, GEPIA, and OncoLnc, we selected CCDC69 as the potential gene which closely involved in breast cancer progression. Next, by real-time PCR detection, the expression of CCDC69 in breast cancer tissue was notably lower than that in normal breast tissues (p=0.0002). In addition, our immunohistochemistry indicated that the positive expression rate of CCDC69 in the triple-negative breast cancer (TNBC) was lower than that in the non-TNBC (p=0.0362), and it was negatively correlated with the expression of Ki67 (p=0.001). Further enrichment analysis of CCDC69 and the similar genes performed on FunRich3.1.3 revealed that these genes were significantly associated with fat differentiation, and most of them were related to peroxisome proliferator-activated receptor (PPAR) signal pathway. Collectively, our findings suggest that CCDC69 is down regulated in breast cancer tissue especially in TNBC which has higher malignant grade and poorer clinical prognosis. Collapse Key Words Collapse MESH Headings Collapse Grants Collapse Affiliation(s) Collapse