1
|
Hwangbo S, Lee S, Hosain MM, Goo T, Lee S, Kim I, Park T. Kernel-based hierarchical structural component models for pathway analysis on survival phenotype. Genes Genomics 2024; 46:1415-1421. [PMID: 39327384 DOI: 10.1007/s13258-024-01569-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 09/07/2024] [Indexed: 09/28/2024]
Abstract
BACKGROUND High-throughput sequencing, particularly RNA-sequencing (RNA-seq), has advanced differential gene expression analysis, revealing pathways involved in various biological conditions. Traditional pathway-based methods generally consider pathways independently, overlooking the correlations among them and ignoring quite a few overlapping biomarkers between pathways. In addition, most pathway-based approaches assume that biomarkers have linear effects on the phenotype of interest. OBJECTIVE This study aims to develop the HisCoM-KernelS model to identify survival phenotype-related pathways by accommodating complex, nonlinear relationships between genes and survival outcomes, while accounting for inter-pathway correlations. METHODS We applied HisCoM-KernelS model to the TCGA pancreatic ductal adenocarcinoma (PDAC) RNA-seq dataset, comprising 4,498 protein-coding genes mapped to 186 KEGG pathways from 148 PDAC samples. Kernel machine regression was used to model pathway effects on survival outcomes, incorporating hierarchical gene-pathway structures. Model parameters were estimated using the alternating least squares algorithm, and the significance of pathways was assessed through a permutation test. RESULTS HisCoM-KernelS identified several pathways significantly associated with pancreatic cancer survival, including those corroborated by previous studies. HisCoM-KernelS, especially with the Gaussian kernel, showed a better balance of detection rate and number of significant pathways compared to four other existing pathway-based methods: HisCoM-PAGE, Global Test, GSEA, and CoxKM. CONCLUSION HisCoM-KernelS successfully extends pathway-based analysis to survival outcomes, capturing complex nonlinear gene effects and inter-pathway correlations. Its application to the TCGA PDAC dataset emphasizes its utility in identifying biologically relevant pathways, offering a robust tool for survival phenotype research in high-throughput sequencing data.
Collapse
Affiliation(s)
- Suhyun Hwangbo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Sungyoung Lee
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Md Mozaffar Hosain
- Department of Statistics, Seoul National University, Seoul, 151-747, Korea
| | - Taewan Goo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Sejong, 05006, Korea
| | - Inyoung Kim
- Department of Statistics, Virginia Tech, Blacksburg, VA, 24060, USA
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea.
- Department of Statistics, Seoul National University, Seoul, 151-747, Korea.
| |
Collapse
|
2
|
Wang L, Shen J, Wang Y, Bi J. Identification of fatty acid metabolism-based molecular subtypes and prognostic signature to predict immune landscape and guide clinical drug treatment in renal clear cell carcinoma. Int Immunopharmacol 2023; 116:109735. [PMID: 36716517 DOI: 10.1016/j.intimp.2023.109735] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 01/04/2023] [Accepted: 01/11/2023] [Indexed: 01/29/2023]
Abstract
Three subtypes of samples were generated based on genes involved in fatty acid metabolism in The Cancer Genome Atlas (TCGA)-RCC patients using a non-negative matrix factorization (NMF) algorithm. 32 co-expressed modules were identified using WCGNA. We constructed a four-gene signature in our training set using least absolute shrinkage selection operator regression analysis and verified it in our testing and overall sets. A relevant study analysis in clinical trials was conducted, which showed the model had good stability and potential application value for predicting outcomes. We analyzed the immune microenvironment using MCPcounter, CIBERSORT, quanTIseq, TIMER and ESTIMATE algorithms, and the result indicated risk was positively related to T cells, B-lineage, and fibroblasts and negatively correlated with monocytic lineage, myeloid dendritic cells, neutrophils, and endothelial cells, and CPT1B was positively related to T cells, CD8 + T cells, Cytotoxic lymphocytes and NK cells, and negatively correlated with myeloid dendritic cells, fibroblasts, endothelial cells. Tumor mutation burden was positively related to risk score and the expression of CPT1B using the R packages corrplot, circlize. Through the R package pRRophetic, drug sensitivity tests showed that the low-risk score group would benefit more from sunitinib and less from pazopanib, sorafenib, temsirolimus, gemcitabine and doxorubicin than the high-risk score group. We performed the relevant basic assay validation for CPT1B, and the proliferation ability of RCC cells was inhibited after the knockdown of protein expression of CPT1B. In conclusion, we established a four-gene model that can predict outcomes of RCC with potential applications in diagnosis and treatment.
Collapse
Affiliation(s)
- Linhui Wang
- Department of Urology, The First Hospital of China Medical University, Shenyang, Liaoning, China
| | - Junlin Shen
- Department of Urology, The First Hospital of China Medical University, Shenyang, Liaoning, China
| | - Yutao Wang
- Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jianbin Bi
- Department of Urology, The First Hospital of China Medical University, Shenyang, Liaoning, China.
| |
Collapse
|
3
|
Hwangbo S, Lee S, Lee S, Hwang H, Kim I, Park T. Kernel-based hierarchical structural component models for pathway analysis. Bioinformatics 2022; 38:3078-3086. [PMID: 35460238 DOI: 10.1093/bioinformatics/btac276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 04/08/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Pathway analyses have led to more insight into the underlying biological functions related to the phenotype of interest in various types of omics data. Pathway-based statistical approaches have been actively developed, but most of them do not consider correlations among pathways. Because it is well known that there are quite a few biomarkers that overlap between pathways, these approaches may provide misleading results. In addition, most pathway-based approaches tend to assume that biomarkers within a pathway have linear associations with the phenotype of interest, even though the relationships are more complex. RESULTS To model complex effects including nonlinear effects, we propose a new approach, Hierarchical structural CoMponent analysis using Kernel (HisCoM-Kernel). The proposed method models nonlinear associations between biomarkers and phenotype by extending the kernel machine regression and analyzes entire pathways simultaneously by using the biomarker-pathway hierarchical structure. HisCoM-Kernel is a flexible model that can be applied to various omics data. It was successfully applied to three omics datasets generated by different technologies. Our simulation studies showed that HisCoM-Kernel provided higher statistical power than other existing pathway-based methods in all datasets. The application of HisCoM-Kernel to three types of omics dataset showed its superior performance compared to existing methods in identifying more biologically meaningful pathways, including those reported in previous studies. AVAILABILITY AND IMPLEMENTATION Freely available at http://statgen.snu.ac.kr/software/HisCom-Kernel/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Suhyun Hwangbo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea.,Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Sungyoung Lee
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Sejong, 05006, Korea
| | - Heungsun Hwang
- Department of Psychology, McGill University, Montreal, QC, H3A 1B1, Canada
| | - Inyoung Kim
- Department of Statistics, Virginia Tech, Blacksburg, Virginia, 24060, U.S.A
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea.,Department of Statistics, Seoul National University, Seoul, 151-747, Korea
| |
Collapse
|
4
|
Warren JM, Salinas-Giegé T, Hummel G, Coots NL, Svendsen JM, Brown KC, Drouard L, Sloan DB. Combining tRNA sequencing methods to characterize plant tRNA expression and post-transcriptional modification. RNA Biol 2021; 18:64-78. [PMID: 32715941 PMCID: PMC7834048 DOI: 10.1080/15476286.2020.1792089] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Revised: 06/18/2020] [Accepted: 06/30/2020] [Indexed: 12/27/2022] Open
Abstract
Differences in tRNA expression have been implicated in a remarkable number of biological processes. There is growing evidence that tRNA genes can play dramatically different roles depending on both expression and post-transcriptional modification, yet sequencing tRNAs to measure abundance and detect modifications remains challenging. Their secondary structure and extensive post-transcriptional modifications interfere with RNA-seq library preparation methods and have limited the utility of high-throughput sequencing technologies. Here, we combine two modifications to standard RNA-seq methods by treating with the demethylating enzyme AlkB and ligating with tRNA-specific adapters in order to sequence tRNAs from four species of flowering plants, a group that has been shown to have some of the most extensive rates of post-transcriptional tRNA modifications. This protocol has the advantage of detecting full-length tRNAs and sequence variants that can be used to infer many post-transcriptional modifications. We used the resulting data to produce a modification index of almost all unique reference tRNAs in Arabidopsis thaliana, which exhibited many anciently conserved similarities with humans but also positions that appear to be 'hot spots' for modifications in angiosperm tRNAs. We also found evidence based on northern blot analysis and droplet digital PCR that, even after demethylation treatment, tRNA-seq can produce highly biased estimates of absolute expression levels most likely due to biased reverse transcription. Nevertheless, the generation of full-length tRNA sequences with modification data is still promising for assessing differences in relative tRNA expression across treatments, tissues or subcellular fractions and help elucidate the functional roles of tRNA modifications.
Collapse
Affiliation(s)
- Jessica M. Warren
- Department of Biology, Colorado State University, Fort Collins, CO, USA
| | - Thalia Salinas-Giegé
- Institut De Biologie Moléculaire Des plantes-CNRS, Université De Strasbourg, Strasbourg, France
| | - Guillaume Hummel
- Institut De Biologie Moléculaire Des plantes-CNRS, Université De Strasbourg, Strasbourg, France
| | - Nicole L. Coots
- Department of Biology, Colorado State University, Fort Collins, CO, USA
| | | | - Kristen C. Brown
- Department of Biology, Colorado State University, Fort Collins, CO, USA
| | - Laurence Drouard
- Department of Biology, Colorado State University, Fort Collins, CO, USA
- Institut De Biologie Moléculaire Des plantes-CNRS, Université De Strasbourg, Strasbourg, France
| | - Daniel B. Sloan
- Department of Biology, Colorado State University, Fort Collins, CO, USA
| |
Collapse
|
5
|
Kim B, Cho EJ, Yoon JH, Kim SS, Cheong JY, Cho SW, Park T. Pathway-Based Integrative Analysis of Metabolome and Microbiome Data from Hepatocellular Carcinoma and Liver Cirrhosis Patients. Cancers (Basel) 2020; 12:E2705. [PMID: 32967314 PMCID: PMC7563418 DOI: 10.3390/cancers12092705] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 09/14/2020] [Accepted: 09/16/2020] [Indexed: 12/12/2022] Open
Abstract
Aberrations of the human microbiome are associated with diverse liver diseases, including hepatocellular carcinoma (HCC). Even if we can associate specific microbes with particular diseases, it is difficult to know mechanistically how the microbe contributes to the pathophysiology. Here, we sought to reveal the functional potential of the HCC-associated microbiome with the human metabolome which is known to play a role in connecting host phenotype to microbiome function. To utilize both microbiome and metabolomic data sets, we propose an innovative, pathway-based analysis, Hierarchical structural Component Model for pathway analysis of Microbiome and Metabolome (HisCoM-MnM), for integrating microbiome and metabolomic data. In particular, we used pathway information to integrate these two omics data sets, thus providing insight into biological interactions between different biological layers, with regard to the host's phenotype. The application of HisCoM-MnM to data sets from 103 and 97 patients with HCC and liver cirrhosis (LC), respectively, showed that this approach could identify HCC-related pathways related to cancer metabolic reprogramming, in addition to the significant metabolome and metagenome that make up those pathways.
Collapse
Affiliation(s)
- Boram Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea;
| | - Eun Ju Cho
- Department of Internal Medicine and Liver Research Institute, Seoul National University College of Medicine, Seoul 03080, Korea; (E.J.C.); (J.-H.Y.)
| | - Jung-Hwan Yoon
- Department of Internal Medicine and Liver Research Institute, Seoul National University College of Medicine, Seoul 03080, Korea; (E.J.C.); (J.-H.Y.)
| | - Soon Sun Kim
- Department of Gastroenterology, Ajou University School of Medicine, Suwon 16499, Korea; (S.S.K.); (J.Y.C.); (S.W.C.)
| | - Jae Youn Cheong
- Department of Gastroenterology, Ajou University School of Medicine, Suwon 16499, Korea; (S.S.K.); (J.Y.C.); (S.W.C.)
| | - Sung Won Cho
- Department of Gastroenterology, Ajou University School of Medicine, Suwon 16499, Korea; (S.S.K.); (J.Y.C.); (S.W.C.)
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea;
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
6
|
Du Y, Liu L, He Y, Dou T, Jia J, Ge C. Endocrine and genetic factors affecting egg laying performance in chickens: a review. Br Poult Sci 2020; 61:538-549. [PMID: 32306752 DOI: 10.1080/00071668.2020.1758299] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
1. Egg-laying performance reflects the overall reproductive performance of breeding hens. The genetic traits for egg-laying performance have low or medium heritability, and, depending on the period involved, usually ranges from 0.16 to 0.64. Egg-laying in chickens is regulated by a combination of environmental, endocrine and genetic factors. 2. The main endocrine factors that regulate egg-laying are gonadotropin-releasing hormone (GnRH), prolactin (PRL), follicle-stimulating hormone (FSH) and luteinising hormone (LH). 3. In the last three decades, many studies have explored this aspect at a molecular genetic level. Recent studies identified 31 reproductive hormone-based candidate genes that were significantly associated with egg-laying performance. With the development of genome-sequencing technology, 64 new candidate genes and 108 single nucleotide polymorphisms (SNPs) related to egg-laying performance have been found using genome-wide association studies (GWAS), providing novel insights into the molecular genetic mechanisms governing egg production. At the same time, microRNAs that regulate genes responsible for egg-laying in chickens were reviewed. 4. Research on endocrinological and genetic factors affecting egg-laying performance will greatly improve the reproductive performance of chickens and promote the protection, development, and utilisation of poultry. This review summarises studies on the endocrine and genetic factors of egg-laying performance in chickens from 1972 to 2019.
Collapse
Affiliation(s)
- Y Du
- College of Animal Science and Technology, Yunnan Agricultural University , Kunming, Yunnan, The People's Republic of China
| | - L Liu
- School of Forensic Medicine, Kunming Medical University , Kunming, Yunnan, The People's Republic of China
| | - Y He
- College of Animal Science and Technology, Yunnan Agricultural University , Kunming, Yunnan, The People's Republic of China
| | - T Dou
- College of Animal Science and Technology, Yunnan Agricultural University , Kunming, Yunnan, The People's Republic of China
| | - J Jia
- College of Animal Science and Technology, Yunnan Agricultural University , Kunming, Yunnan, The People's Republic of China
| | - C Ge
- College of Animal Science and Technology, Yunnan Agricultural University , Kunming, Yunnan, The People's Republic of China
| |
Collapse
|
7
|
Mok L, Park T. HisCoM-PAGE: software for hierarchical structural component models for pathway analysis of gene expression data. Genomics Inform 2019; 17:e45. [PMID: 31896245 PMCID: PMC6944051 DOI: 10.5808/gi.2019.17.4.e45] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 11/22/2019] [Indexed: 12/04/2022] Open
Abstract
To identify pathways associated with survival phenotypes using gene expression data, we recently proposed the hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE) method. The HisCoM-PAGE software can consider hierarchical structural relationships between genes and pathways and analyze multiple pathways simultaneously. It can be applied to various types of gene expression data, such as microarray data or RNA sequencing data. We expect that the HisCoM-PAGE software will make our method more easily accessible to researchers who want to perform pathway analysis for survival times.
Collapse
Affiliation(s)
- Lydia Mok
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Statistics, Seoul National University, Seoul 08826, Korea
- Corresponding author: E-mail:
| |
Collapse
|