1
|
Nazeen S, Wang X, Morrow A, Strom R, Ethier E, Ritter D, Henderson A, Afroz J, Stitziel NO, Gupta RM, Luk K, Studer L, Khurana V, Sunyaev SR. NERINE reveals rare variant associations in gene networks across multiple phenotypes and implicates an SNCA-PRL-LRRK2 subnetwork in Parkinson's disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.01.07.631688. [PMID: 39829934 PMCID: PMC11741352 DOI: 10.1101/2025.01.07.631688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/22/2025]
Abstract
Gene networks encapsulate biological knowledge, often linked to polygenic diseases. While model system experiments generate many plausible gene networks, validating their role in human phenotypes requires evidence from human genetics. Rare variants provide the most straightforward path for such validation. While single-gene analyses often lack power due to rare variant sparsity, expanding the unit of association to networks offers a powerful alternative, provided it integrates network connections. Here, we introduce NERINE, a hierarchical model-based association test that integrates gene interactions that integrates gene interactions while remaining robust to network inaccuracies. Applied to biobanks, NERINE uncovers compelling network associations for breast cancer, cardiovascular diseases, and type II diabetes, undetected by single-gene tests. For Parkinson's disease (PD), NERINE newly substantiates several GWAS candidate loci with rare variant signal and synergizes human genetics with experimental screens targeting cardinal PD pathologies: dopaminergic neuron survival and alpha-synuclein pathobiology. CRISPRi-screening in human neurons and NERINE converge on PRL, revealing an intraneuronal α-synuclein/prolactin stress response that may impact resilience to PD pathologies.
Collapse
Affiliation(s)
- Sumaiya Nazeen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Xinyuan Wang
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Autumn Morrow
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Ronya Strom
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Elizabeth Ethier
- Division of Movement Disorders, Department of Neurology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Dylan Ritter
- The Center for Stem Cell Biology, Sloan-Kettering Institute for Cancer Research, New York, NY, USA
| | | | - Jalwa Afroz
- The Center for Stem Cell Biology, Sloan-Kettering Institute for Cancer Research, New York, NY, USA
| | - Nathan O Stitziel
- Cardiovascular Division, John T. Milliken Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Rajat M Gupta
- Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Division of Cardiovascular Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Kelvin Luk
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, PA, USA
| | - Lorenz Studer
- The Center for Stem Cell Biology, Sloan-Kettering Institute for Cancer Research, New York, NY, USA
| | - Vikram Khurana
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Stem Cell Institute, Cambridge, MA, USA
| | - Shamil R Sunyaev
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
2
|
Huh I, Park T. Enhanced adaptive permutation test with negative binomial distribution in genome-wide omics datasets. Genes Genomics 2025; 47:59-70. [PMID: 39503929 DOI: 10.1007/s13258-024-01584-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 10/10/2024] [Indexed: 01/16/2025]
Abstract
BACKGROUND The permutation test has been widely used to provide the p-values of statistical tests when the standard test statistics do not follow parametric null distributions. However, the permutation test may require huge numbers of iterations, especially when the detection of very small p-values is required for multiple testing adjustments in the analysis of datasets with a large number of features. OBJECTIVE To overcome this computational burden, we suggest a novel enhanced adaptive permutation test that estimates p-values using the negative binomial (NB) distribution. By the method, the number of permutations are differently determined for individual features according to their potential significance. METHODS In detail, the permutation procedure stops, when test statistics from the permuted dataset exceed the observed statistics from the original dataset by a predefined number of times. We showed that this procedure reduced the number of permutations especially when there were many insignificant features. For significant features, we enhanced the reduction with Stouffer's method after splitting datasets. RESULTS From the simulation study, we found that the enhanced adaptive permutation test dramatically reduced the number of permutations while keeping the precision of the permutation p-value within a small range, when compared to the ordinary permutation test. In real data analysis, we applied the enhanced adaptive permutation test to a genome-wide single nucleotide polymorphism (SNP) dataset of 327,872 features. CONCLUSION We found the analysis with the enhanced adaptive permutation took a feasible time for genome-wide omics datasets, and successfully identified features of highly significant p-values with reasonable confidence intervals.
Collapse
Affiliation(s)
- Iksoo Huh
- College of Nursing and Research Institute of Nursing Science, Seoul National University, Seoul, 03080, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, 08826, Korea.
| |
Collapse
|
3
|
Hwangbo S, Lee S, Hosain MM, Goo T, Lee S, Kim I, Park T. Kernel-based hierarchical structural component models for pathway analysis on survival phenotype. Genes Genomics 2024; 46:1415-1421. [PMID: 39327384 DOI: 10.1007/s13258-024-01569-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 09/07/2024] [Indexed: 09/28/2024]
Abstract
BACKGROUND High-throughput sequencing, particularly RNA-sequencing (RNA-seq), has advanced differential gene expression analysis, revealing pathways involved in various biological conditions. Traditional pathway-based methods generally consider pathways independently, overlooking the correlations among them and ignoring quite a few overlapping biomarkers between pathways. In addition, most pathway-based approaches assume that biomarkers have linear effects on the phenotype of interest. OBJECTIVE This study aims to develop the HisCoM-KernelS model to identify survival phenotype-related pathways by accommodating complex, nonlinear relationships between genes and survival outcomes, while accounting for inter-pathway correlations. METHODS We applied HisCoM-KernelS model to the TCGA pancreatic ductal adenocarcinoma (PDAC) RNA-seq dataset, comprising 4,498 protein-coding genes mapped to 186 KEGG pathways from 148 PDAC samples. Kernel machine regression was used to model pathway effects on survival outcomes, incorporating hierarchical gene-pathway structures. Model parameters were estimated using the alternating least squares algorithm, and the significance of pathways was assessed through a permutation test. RESULTS HisCoM-KernelS identified several pathways significantly associated with pancreatic cancer survival, including those corroborated by previous studies. HisCoM-KernelS, especially with the Gaussian kernel, showed a better balance of detection rate and number of significant pathways compared to four other existing pathway-based methods: HisCoM-PAGE, Global Test, GSEA, and CoxKM. CONCLUSION HisCoM-KernelS successfully extends pathway-based analysis to survival outcomes, capturing complex nonlinear gene effects and inter-pathway correlations. Its application to the TCGA PDAC dataset emphasizes its utility in identifying biologically relevant pathways, offering a robust tool for survival phenotype research in high-throughput sequencing data.
Collapse
Affiliation(s)
- Suhyun Hwangbo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Sungyoung Lee
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Md Mozaffar Hosain
- Department of Statistics, Seoul National University, Seoul, 151-747, Korea
| | - Taewan Goo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Sejong, 05006, Korea
| | - Inyoung Kim
- Department of Statistics, Virginia Tech, Blacksburg, VA, 24060, USA
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea.
- Department of Statistics, Seoul National University, Seoul, 151-747, Korea.
| |
Collapse
|
4
|
Roy A, Sharma S, Paul I, Ray S. Molecular hybridization assisted multi-technique approach for designing USP21 inhibitors to halt catalytic triad-mediated nucleophilic attack and suppress pancreatic ductal adenocarcinoma progression: A molecular dynamics study. Comput Biol Med 2024; 182:109096. [PMID: 39270458 DOI: 10.1016/j.compbiomed.2024.109096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Revised: 07/20/2024] [Accepted: 08/30/2024] [Indexed: 09/15/2024]
Abstract
AIMS Pancreatic cancer, the 12th-most common cancer, globally, is highly challenging to treat due to its complex epigenetic, metabolic, and genomic characteristics. In pancreatic ductal adenocarcinoma, USP21 acts as an oncogene by stabilizing the long isoform of Transcription Factor 7, thereby activating the Wnt signaling pathway. This study aims to inhibit activation of this pathway through computer-aided drug discovery. Accordingly, four libraries of compounds were designed to target the USP21's catalytic domain (Cys221, His518, Asp534), responsible for its deubiquitinating activity. MAIN METHODS Utilizing an array of computer-aided drug design methodologies, such as molecular docking, virtual screening, principal component analysis, molecular dynamics simulation, and dynamic cross-correlation matrix, the structural and functional characteristics of the USP21-inhibitor complex were examined. Following the evaluation of the binding affinities, 20 potential ligands were selected, and the best ligand was subjected to additional molecular dynamics simulation study. KEY FINDINGS The results indicated that the ligand-bound USP21 exhibited reduced structural fluctuations compared to the unbound form, as evident from RMSD, RMSF, Rg, and SASA graphs. ADMET analysis of the top ligand showed promising pharmacokinetic and pharmacodynamic profiles, good bioavailability, and low toxicity. The stable conformations of the proposed drug when bound to their target cavities indicate a robust binding affinity of -9.3 kcal/mol. The drug exhibits an elevated pKi value of 6.82, a noteworthy pIC50 value of 5.972, and a pKd value of 6.023 proving its high affinity and inhibitory potential towards the target. SIGNIFICANCE In-vitro testing of the top compound (MOLHYB-0436) could lead to its use as a potential treatment for pancreatic cancer.
Collapse
Affiliation(s)
- Alankar Roy
- Amity Institute of Biotechnology, Amity University, Kolkata, India
| | - Sayan Sharma
- Amity Institute of Biotechnology, Amity University, Kolkata, India
| | - Ishani Paul
- Amity Institute of Biotechnology, Amity University, Kolkata, India
| | - Sujay Ray
- Amity Institute of Biotechnology, Amity University, Kolkata, India.
| |
Collapse
|
5
|
Bendapudi PK, Nazeen S, Ryu J, Söylemez O, Robbins A, Rouaisnel B, O’Neil JK, Pokhriyal R, Yang M, Colling M, Pasko B, Bouzinier M, Tomczak L, Collier L, Barrios D, Ram S, Toth-Petroczy A, Krier J, Fieg E, Dzik WH, Hudspeth JC, Pozdnyakova O, Nardi V, Knight J, Maas R, Sunyaev S, Losman JA. Low-frequency inherited complement receptor variants are associated with purpura fulminans. Blood 2024; 143:1032-1044. [PMID: 38096369 PMCID: PMC10950473 DOI: 10.1182/blood.2023021231] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 11/15/2023] [Indexed: 03/16/2024] Open
Abstract
ABSTRACT Extreme disease phenotypes can provide key insights into the pathophysiology of common conditions, but studying such cases is challenging due to their rarity and the limited statistical power of existing methods. Herein, we used a novel approach to pathway-based mutational burden testing, the rare variant trend test (RVTT), to investigate genetic risk factors for an extreme form of sepsis-induced coagulopathy, infectious purpura fulminans (PF). In addition to prospective patient sample collection, we electronically screened over 10.4 million medical records from 4 large hospital systems and identified historical cases of PF for which archived specimens were available to perform germline whole-exome sequencing. We found a significantly increased burden of low-frequency, putatively function-altering variants in the complement system in patients with PF compared with unselected patients with sepsis (P = .01). A multivariable logistic regression analysis found that the number of complement system variants per patient was independently associated with PF after controlling for age, sex, and disease acuity (P = .01). Functional characterization of PF-associated variants in the immunomodulatory complement receptors CR3 and CR4 revealed that they result in partial or complete loss of anti-inflammatory CR3 function and/or gain of proinflammatory CR4 function. Taken together, these findings suggest that inherited defects in CR3 and CR4 predispose to the maladaptive hyperinflammation that characterizes severe sepsis with coagulopathy.
Collapse
Affiliation(s)
- Pavan K. Bendapudi
- Division of Hemostasis and Thrombosis, Beth Israel Deaconess Medical Center, Boston, MA
- Division of Hematology and Blood Transfusion Service, Massachusetts General Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Sumaiya Nazeen
- Harvard Medical School, Boston, MA
- Division of Genomic Medicine, Brigham and Women’s Hospital, Boston, MA
| | - Justine Ryu
- Division of Hemostasis and Thrombosis, Beth Israel Deaconess Medical Center, Boston, MA
- Harvard Medical School, Boston, MA
| | - Onuralp Söylemez
- Harvard Medical School, Boston, MA
- Division of Genomic Medicine, Brigham and Women’s Hospital, Boston, MA
| | - Alissa Robbins
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Betty Rouaisnel
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Jillian K. O’Neil
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Ruchika Pokhriyal
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Moua Yang
- Division of Hemostasis and Thrombosis, Beth Israel Deaconess Medical Center, Boston, MA
- Harvard Medical School, Boston, MA
| | - Meaghan Colling
- Division of Hematology and Blood Transfusion Service, Massachusetts General Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Bryce Pasko
- Department of Pathology, University of Colorado School of Medicine, Aurora, CO
| | - Michael Bouzinier
- Harvard Medical School, Boston, MA
- Division of Genomic Medicine, Brigham and Women’s Hospital, Boston, MA
| | - Lindsay Tomczak
- Division of Hemostasis and Thrombosis, Beth Israel Deaconess Medical Center, Boston, MA
| | - Lindsay Collier
- Division of Hemostasis and Thrombosis, Beth Israel Deaconess Medical Center, Boston, MA
| | - David Barrios
- Division of Hemostasis and Thrombosis, Beth Israel Deaconess Medical Center, Boston, MA
- Harvard Medical School, Boston, MA
| | - Sanjay Ram
- Division of Infectious Diseases and Immunology, University of Massachusetts Medical School, Worcester, MA
| | - Agnes Toth-Petroczy
- Harvard Medical School, Boston, MA
- Division of Genomic Medicine, Brigham and Women’s Hospital, Boston, MA
| | - Joel Krier
- Harvard Medical School, Boston, MA
- Division of Genomic Medicine, Brigham and Women’s Hospital, Boston, MA
| | - Elizabeth Fieg
- Division of Genomic Medicine, Brigham and Women’s Hospital, Boston, MA
| | - Walter H. Dzik
- Division of Hematology and Blood Transfusion Service, Massachusetts General Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - James C. Hudspeth
- Department of Medicine, Boston Medical Center, Boston, MA
- Boston University School of Medicine, Boston, MA
| | - Olga Pozdnyakova
- Harvard Medical School, Boston, MA
- Department of Pathology, Brigham and Women’s Hospital, Boston, MA
| | - Valentina Nardi
- Harvard Medical School, Boston, MA
- Department of Pathology, Massachusetts General Hospital, Boston, MA
| | - James Knight
- Yale Center for Genome Analysis, Yale University, New Haven, CT
| | - Richard Maas
- Harvard Medical School, Boston, MA
- Division of Genomic Medicine, Brigham and Women’s Hospital, Boston, MA
| | - Shamil Sunyaev
- Harvard Medical School, Boston, MA
- Division of Genomic Medicine, Brigham and Women’s Hospital, Boston, MA
| | - Julie-Aurore Losman
- Harvard Medical School, Boston, MA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
- Division of Hematology, Brigham and Women’s Hospital, Boston, MA
| |
Collapse
|
6
|
Apio C, Chung W, Moon MK, Kwon O, Park T. Gene-diet interaction analysis using novel weighted food scores discovers the adipocytokine signaling pathway associated with the development of type 2 diabetes. Front Endocrinol (Lausanne) 2023; 14:1165744. [PMID: 37680885 PMCID: PMC10482093 DOI: 10.3389/fendo.2023.1165744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 07/31/2023] [Indexed: 09/09/2023] Open
Abstract
Introduction The influence of dietary patterns measured using Recommended Food Score (RFS) with foods with high amounts of antioxidant nutrients for Type 2 diabetes (T2D) was analyzed. Our analysis aims to find associations between dietary patterns and T2D and conduct a gene-diet interaction analysis related to T2D. Methods Data analyzed in the current study were obtained from the Korean Genome and Epidemiology Study Cohort. The dietary patterns of 46 food items were assessed using a validated food frequency questionnaire. To maximize the predictive power of the RFS, we propose two weighted food scores, namely HisCoM-RFS calculated using the novel Hierarchical Structural Component model (HisCoM) and PLSDA-RFS calculated using Partial Least Squares-Discriminant Analysis (PLS-DA) method. Results Both RFS (OR: 1.11; 95% CI: 1.03- 1.20; P = 0.009) and PLSDA-RFS (OR: 1.10; 95% CI: 1.02-1.19, P = 0.011) were positively associated with T2D. Mapping of SNPs (P < 0.05) from the interaction analysis between SNPs and the food scores to genes and pathways yielded some 12 genes (CACNA2D3, RELN, DOCK2, SLIT3, CTNNA2, etc.) and pathways associated with T2D. The strongest association was observed with the adipocytokine signalling pathway, highlighting 32 genes (STAT3, MAPK10, MAPK8, IRS1, AKT1-3, ADIPOR2, etc.) most likely associated with T2D. Finally, the group of the subjects in low, intermediate and high using both the food scores and a polygenic risk score found an association between diet quality groups with issues at high genetic risk of T2D. Conclusion A dietary pattern of poor amounts of antioxidant nutrients is associated with the risk of T2D, and diet affects pathway mechanisms involved in developing T2D.
Collapse
Affiliation(s)
- Catherine Apio
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Wonil Chung
- Department of Statistics and Actuarial Science, Soongsil University, Seoul, Republic of Korea
| | - Min Kyong Moon
- Department of Internal Medicine, College of Medicine, Seoul National University, Seoul, Republic of Korea
| | - Oran Kwon
- Department of Nutritional Science and Food Management, Ewha Womans University, Seoul, Republic of Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, Republic of Korea
| |
Collapse
|
7
|
Abstract
Extended Redundancy Analysis (ERA) has recently been developed and widely applied to investigate component regression models. In this paper, we propose Copula-based Redundancy Analysis (CRA) to improve the performance of regression-based ERA. Our simulation results indicate that CRA is significantly superior to the regression-based ERA. We also discuss how to modify CRA to accommodate models with discrete, censored, truncated outcome variables, or a combination thereof, where ERA cannot be employed. For applications, we provide two empirical analyses: one on academic achievement and one on drug use and health.
Collapse
Affiliation(s)
- Ji Yeh Choi
- Department of Psychology, York University, Toronto, ON, Canada
| | - Juwon Seo
- Department of Economics, National University of Singapore, Singapore, Singapore
| |
Collapse
|
8
|
Park C, Kim B, Park T. DeepHisCoM: deep learning pathway analysis using hierarchical structural component models. Brief Bioinform 2022; 23:6590446. [DOI: 10.1093/bib/bbac171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 04/04/2022] [Accepted: 04/18/2022] [Indexed: 11/13/2022] Open
Abstract
Abstract
Many statistical methods for pathway analysis have been used to identify pathways associated with the disease along with biological factors such as genes and proteins. However, most pathway analysis methods neglect the complex nonlinear relationship between biological factors and pathways. In this study, we propose a Deep-learning pathway analysis using Hierarchical structured CoMponent models (DeepHisCoM) that utilize deep learning to consider a nonlinear complex contribution of biological factors to pathways by constructing a multilayered model which accounts for hierarchical biological structure. Through simulation studies, DeepHisCoM was shown to have a higher power in the nonlinear pathway effect and comparable power for the linear pathway effect when compared to the conventional pathway methods. Application to hepatocellular carcinoma (HCC) omics datasets, including metabolomic, transcriptomic and metagenomic datasets, demonstrated that DeepHisCoM successfully identified three well-known pathways that are highly associated with HCC, such as lysine degradation, valine, leucine and isoleucine biosynthesis and phenylalanine, tyrosine and tryptophan. Application to the coronavirus disease-2019 (COVID-19) single-nucleotide polymorphism (SNP) dataset also showed that DeepHisCoM identified four pathways that are highly associated with the severity of COVID-19, such as mitogen-activated protein kinase (MAPK) signaling pathway, gonadotropin-releasing hormone (GnRH) signaling pathway, hypertrophic cardiomyopathy and dilated cardiomyopathy. Codes are available at https://github.com/chanwoo-park-official/DeepHisCoM.
Collapse
Affiliation(s)
- Chanwoo Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| | - Boram Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
9
|
Kim SA, Kang N, Park T. Hierarchical Structured Component Analysis for Microbiome Data Using Taxonomy Assignments. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1302-1312. [PMID: 33211665 DOI: 10.1109/tcbb.2020.3039326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The recent advent of high-throughput sequencing technology has enabled us to study the associations between human microbiome and diseases. The DNA sequences of microbiome samples are clustered as operational taxonomic units (OTUs) according to their similarity. The OTU table containing counts of OTUs present in each sample is used to measure correlations between OTUs and disease status and find key microbes for prediction of the disease status. Various statistical methods have been proposed for such microbiome data analysis. However, none of these methods reflects the hierarchy of taxonomy information. In this paper, we propose a hierarchical structural component model for microbiome data (HisCoM-microb) using taxonomy information as well as OTU table data. The proposed HisCoM-microb consists of two layers: one for OTUs and the other for taxa at the higher taxonomy level. Then we calculate simultaneously coefficient estimates of OTUs and taxa of the two layers inserted in the hierarchical model. Through this analysis, we can infer the association between taxa or OTUs and disease status, considering the impact of taxonomic structure on disease status. Both simulation study and real microbiome data analysis show that HisCoM-microb can successfully reveal the relations between each taxon and disease status and identify the key OTUs of the disease at the same time.
Collapse
|
10
|
Cho G, Sarstedt M, Hwang H. A comparative evaluation of factor- and component-based structural equation modelling approaches under (in)correct construct representations. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2022; 75:220-251. [PMID: 34661902 DOI: 10.1111/bmsp.12255] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2020] [Revised: 12/09/2021] [Indexed: 06/13/2023]
Abstract
Structural equation modelling (SEM) has evolved into two domains, factor-based and component-based, dependent on whether constructs are statistically represented as common factors or components. The two SEM domains are conceptually distinct, each assuming their own population models with either of the statistical construct proxies, and statistical SEM approaches should be used for estimating models whose construct representations correspond to what they assume. However, SEM approaches have often been evaluated and compared only under population factor models, providing misleading conclusions about their relative performance. This is partly because population component models and their relationships have not been clearly formulated. Also, it is of fundamental importance to examine how robust SEM approaches can be to potential misrepresentation of constructs because researchers may often lack clear theories to determine whether a factor or component is more representative of a given construct. Addressing these issues, this study begins by clarifying several population component models and their relationships and then provides a comprehensive evaluation of four SEM approaches - the maximum likelihood approach and factor score regression for factor-based SEM as well as generalized structured component analysis (GSCA) and partial least squares path modelling (PLSPM) for component-based SEM - under various experimental conditions. We confirm that the factor-based SEM approaches should be preferred for estimating factor models, whereas the component-based SEM approaches should be chosen for component models. Importantly, the component-based approaches are generally more robust to construct misrepresentation than the factor-based ones. Of the component-based approaches, GSCA should be chosen over PLSPM, regardless of whether or not constructs are misrepresented.
Collapse
Affiliation(s)
| | - Marko Sarstedt
- Ludwig-Maximilians-University Munich, Germany
- Babeş?-Bolyai University, Cluj-Napoca, Romania
| | | |
Collapse
|
11
|
Genome-Wide Genomic and Functional Association Study for Workability and Calving Traits in Holstein Cattle. Animals (Basel) 2022; 12:ani12091127. [PMID: 35565554 PMCID: PMC9102336 DOI: 10.3390/ani12091127] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 04/25/2022] [Accepted: 04/26/2022] [Indexed: 02/04/2023] Open
Abstract
The goal of our study was to identify the SNPs, metabolic pathways (KEGG), and gene ontology (GO) terms significantly associated with calving and workability traits in dairy cattle. We analysed direct (DCE) and maternal (MCE) calving ease, direct (DSB) and maternal (MSB) stillbirth, milking speed (MSP), and temperament (TEM) based on a Holstein-Friesian dairy cattle population consisting of 35,203 individuals. The number of animals, depending on the trait, ranged from 22,301 bulls for TEM to 30,603 for DCE. We estimated the SNP effects (based on 46,216 polymorphisms from Illumina BovineSNP50 BeadChip Version 2) using a multi-SNP mixed model. The SNP positions were mapped to genes and the GO terms/KEGG pathways of the corresponding genes were assigned. The estimation of the GO term/KEGG pathway effects was based on a mixed model using the SNP effects as dependent variables. The number of significant SNPs comprised 59 for DCE, 25 for DSB and MSP, 17 for MCE and MSB, and 7 for TEM. Significant KEGG pathways were found for MSB (2), TEM (2), and MSP (1) and 11 GO terms were significant for MSP, 10 for DCE, 8 for DSB and TEM, 5 for MCE, and 3 for MSB. From the perspective of a better understanding of the genomic background of the phenotypes, traits with low heritabilities suggest that the focus should be moved from single genes to the metabolic pathways or gene ontologies significant for the phenotype.
Collapse
|
12
|
Hwangbo S, Lee S, Lee S, Hwang H, Kim I, Park T. Kernel-based hierarchical structural component models for pathway analysis. Bioinformatics 2022; 38:3078-3086. [PMID: 35460238 DOI: 10.1093/bioinformatics/btac276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 04/08/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Pathway analyses have led to more insight into the underlying biological functions related to the phenotype of interest in various types of omics data. Pathway-based statistical approaches have been actively developed, but most of them do not consider correlations among pathways. Because it is well known that there are quite a few biomarkers that overlap between pathways, these approaches may provide misleading results. In addition, most pathway-based approaches tend to assume that biomarkers within a pathway have linear associations with the phenotype of interest, even though the relationships are more complex. RESULTS To model complex effects including nonlinear effects, we propose a new approach, Hierarchical structural CoMponent analysis using Kernel (HisCoM-Kernel). The proposed method models nonlinear associations between biomarkers and phenotype by extending the kernel machine regression and analyzes entire pathways simultaneously by using the biomarker-pathway hierarchical structure. HisCoM-Kernel is a flexible model that can be applied to various omics data. It was successfully applied to three omics datasets generated by different technologies. Our simulation studies showed that HisCoM-Kernel provided higher statistical power than other existing pathway-based methods in all datasets. The application of HisCoM-Kernel to three types of omics dataset showed its superior performance compared to existing methods in identifying more biologically meaningful pathways, including those reported in previous studies. AVAILABILITY AND IMPLEMENTATION Freely available at http://statgen.snu.ac.kr/software/HisCom-Kernel/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Suhyun Hwangbo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea.,Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Sungyoung Lee
- Department of Genomic Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Sejong, 05006, Korea
| | - Heungsun Hwang
- Department of Psychology, McGill University, Montreal, QC, H3A 1B1, Canada
| | - Inyoung Kim
- Department of Statistics, Virginia Tech, Blacksburg, Virginia, 24060, U.S.A
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-747, Korea.,Department of Statistics, Seoul National University, Seoul, 151-747, Korea
| |
Collapse
|
13
|
Jung T, Jung Y, Moon MK, Kwon O, Hwang GS, Park T. Integrative Pathway Analysis of SNP and Metabolite Data Using a Hierarchical Structural Component Model. Front Genet 2022; 13:814412. [PMID: 35401680 PMCID: PMC8987531 DOI: 10.3389/fgene.2022.814412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 01/13/2022] [Indexed: 11/16/2022] Open
Abstract
Integrative multi-omics analysis has become a useful tool to understand molecular mechanisms and drug discovery for treatment. Especially, the couplings of genetics to metabolomics have been performed to identify the associations between SNP and metabolite. However, while the importance of integrative pathway analysis is increasing, there are few approaches to utilize pathway information to analyze phenotypes using SNP and metabolite. We propose an integrative pathway analysis of SNP and metabolite data using a hierarchical structural component model considering the structural relationships of SNPs, metabolites, pathways, and phenotypes. The proposed method utilizes genome-wide association studies on metabolites and constructs the genetic risk scores for metabolites referred to as genetic metabolomic scores. It is based on the hierarchical model using the genetic metabolomic scores and pathways. Furthermore, this method adopts a ridge penalty to consider the correlations between genetic metabolomic scores and between pathways. We apply our method to the SNP and metabolite data from the Korean population to identify pathways associated with type 2 diabetes (T2D). Through this application, we identified well-known pathways associated with T2D, demonstrating that this method adds biological insights into disease-related pathways using genetic predispositions of metabolites.
Collapse
Affiliation(s)
- Taeyeong Jung
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Youngae Jung
- Korea Integrated Metabolomics Research Group, Western Seoul Center, Korea Basic Science Institute, Seoul, South Korea
| | - Min Kyong Moon
- Department of Internal Medicine, Seoul National University Boramae Medical Center, Seoul, South Korea
| | - Oran Kwon
- Department of Nutritional Science and Food Management, Graduate Program in System Health Science and Engineering, Ewha Womans University, Seoul, South Korea
| | - Geum-Sook Hwang
- Korea Integrated Metabolomics Research Group, Western Seoul Center, Korea Basic Science Institute, Seoul, South Korea
- *Correspondence: Geum-Sook Hwang, ; Taesung Park,
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
- Department of Statistics, Seoul National University, Seoul, South Korea
- *Correspondence: Geum-Sook Hwang, ; Taesung Park,
| |
Collapse
|
14
|
Whole-exome sequencing with targeted analysis and epilepsy after acute symptomatic neonatal seizures. Pediatr Res 2022; 91:896-902. [PMID: 33846556 PMCID: PMC9064802 DOI: 10.1038/s41390-021-01509-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 02/02/2021] [Accepted: 03/18/2021] [Indexed: 02/02/2023]
Abstract
BACKGROUND The contribution of pathogenic gene variants with development of epilepsy after acute symptomatic neonatal seizures is not known. METHODS Case-control study of 20 trios in children with a history of acute symptomatic neonatal seizures: 10 with and 10 without post-neonatal epilepsy. We performed whole-exome sequencing (WES) and identified pathogenic de novo, transmitted, and non-transmitted variants from established and candidate epilepsy association genes and correlated prevalence of these variants with epilepsy outcomes. We performed a sensitivity analysis with genes associated with coronary artery disease (CAD). We analyzed variants throughout the exome to evaluate for differential enrichment of functional properties using exploratory KEGG searches. RESULTS Querying 200 established and candidate epilepsy genes, pathogenic variants were identified in 5 children with post-neonatal epilepsy yet in only 1 child without subsequent epilepsy. There was no difference in the number of trios with non-transmitted pathogenic variants in epilepsy or CAD genes. An exploratory KEGG analysis demonstrated a relative enrichment in cell death pathways in children without subsequent epilepsy. CONCLUSIONS In this pilot study, children with epilepsy after acute symptomatic neonatal seizures had a higher prevalence of coding variants with a targeted epilepsy gene sequencing analysis compared to those patients without subsequent epilepsy. IMPACT We performed whole-exome sequencing (WES) in 20 trios, including 10 children with epilepsy and 10 without epilepsy, both after acute symptomatic neonatal seizures. Children with post-neonatal epilepsy had a higher burden of pathogenic variants in epilepsy-associated genes compared to those without post-neonatal epilepsy. Future studies evaluating this association may lead to a better understanding of the risk of epilepsy after acute symptomatic neonatal seizures and elucidate molecular pathways that are dysregulated after brain injury and implicated in epileptogenesis.
Collapse
|
15
|
Kim S, Hwang H. Model-based recursive partitioning of extended redundancy analysis with an application to nicotine dependence among US adults. THE BRITISH JOURNAL OF MATHEMATICAL AND STATISTICAL PSYCHOLOGY 2021; 74:567-590. [PMID: 33782960 DOI: 10.1111/bmsp.12240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 02/02/2021] [Indexed: 06/12/2023]
Abstract
Extended redundancy analysis (ERA) is used to reduce multiple sets of predictors to a smaller number of components and examine the effects of these components on a response variable. In various social and behavioural studies, auxiliary covariates (e.g., gender, ethnicity) can often lead to heterogeneous subgroups of observations, each of which involves distinctive relationships between predictor and response variables. ERA is currently unable to consider such covariate-dependent heterogeneity to examine whether the model parameters vary across subgroups differentiated by covariates. To address this issue, we combine ERA with model-based recursive partitioning in a single framework. This combined method, MOB-ERA, aims to partition observations into heterogeneous subgroups recursively based on a set of covariates while fitting a specified ERA model to data. Upon the completion of the partitioning procedure, one can easily examine the difference in the estimated ERA parameters across covariate-dependent subgroups. Moreover, it produces a tree diagram that aids in visualizing a hierarchy of partitioning covariates, as well as interpreting their interactions. In the analysis of public data concerning nicotine dependence among US adults, the method uncovered heterogeneous subgroups characterized by several sociodemographic covariates, each of which yielded different directional relationships between three predictor sets and nicotine dependence.
Collapse
Affiliation(s)
- Sunmee Kim
- University of Manitoba, Winnipeg, Manitoba, Canada
| | | |
Collapse
|
16
|
Parrish PCR, Liu D, Knutsen RH, Billington CJ, Mecham RP, Fu YP, Kozel BA. Whole exome sequencing in patients with Williams-Beuren syndrome followed by disease modeling in mice points to four novel pathways that may modify stenosis risk. Hum Mol Genet 2021; 29:2035-2050. [PMID: 32412588 DOI: 10.1093/hmg/ddaa093] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 04/07/2020] [Accepted: 05/12/2020] [Indexed: 12/11/2022] Open
Abstract
Supravalvular aortic stenosis (SVAS) is a narrowing of the aorta caused by elastin (ELN) haploinsufficiency. SVAS severity varies among patients with Williams-Beuren syndrome (WBS), a rare disorder that removes one copy of ELN and 25-27 other genes. Twenty percent of children with WBS require one or more invasive and often risky procedures to correct the defect while 30% have no appreciable stenosis, despite sharing the same basic genetic lesion. There is no known medical therapy. Consequently, identifying genes that modify SVAS offers the potential for novel modifier-based therapeutics. To improve statistical power in our rare-disease cohort (N = 104 exomes), we utilized extreme-phenotype cohorting, functional variant filtration and pathway-based analysis. Gene set enrichment analysis of exome-wide association data identified increased adaptive immune system variant burden among genes associated with SVAS severity. Additional enrichment, using only potentially pathogenic variants known to differ in frequency between the extreme phenotype subsets, identified significant association of SVAS severity with not only immune pathway genes, but also genes involved with the extracellular matrix, G protein-coupled receptor signaling and lipid metabolism using both SKAT-O and RQTest. Complementary studies in Eln+/-; Rag1-/- mice, which lack a functional adaptive immune system, showed improvement in cardiovascular features of ELN insufficiency. Similarly, studies in mixed background Eln+/- mice confirmed that variations in genes that increase elastic fiber deposition also had positive impact on aortic caliber. By using tools to improve statistical power in combination with orthogonal analyses in mice, we detected four main pathways that contribute to SVAS risk.
Collapse
Affiliation(s)
- Phoebe C R Parrish
- Translational Vascular Medicine Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA.,Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Delong Liu
- Translational Vascular Medicine Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Russell H Knutsen
- Translational Vascular Medicine Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA.,Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Charles J Billington
- Translational Vascular Medicine Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA.,National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Robert P Mecham
- Department of Cell Biology and Physiology, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Yi-Ping Fu
- Office of Biostatistics Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Beth A Kozel
- Translational Vascular Medicine Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
17
|
Kim Y, Lee S, Jang JY, Lee S, Park T. Identifying miRNA-mRNA Integration Set Associated With Survival Time. Front Genet 2021; 12:634922. [PMID: 34267778 PMCID: PMC8276759 DOI: 10.3389/fgene.2021.634922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 04/06/2021] [Indexed: 11/26/2022] Open
Abstract
In the “personalized medicine” era, one of the most difficult problems is identification of combined markers from different omics platforms. Many methods have been developed to identify candidate markers for each type of omics data, but few methods facilitate the identification of multiple markers on multi-omics platforms. microRNAs (miRNAs) is well known to affect only indirectly phenotypes by regulating mRNA expression and/or protein translation. To take into account this knowledge into practice, we suggest a miRNA-mRNA integration model for survival time analysis, called mimi-surv, which accounts for the biological relationship, to identify such integrated markers more efficiently. Through simulation studies, we found that the statistical power of mimi-surv be better than other models. Application to real datasets from Seoul National University Hospital and The Cancer Genome Atlas demonstrated that mimi-surv successfully identified miRNA-mRNA integrations sets associated with progression-free survival of pancreatic ductal adenocarcinoma (PDAC) patients. Only mimi-surv found miR-96, a previously unidentified PDAC-related miRNA in these two real datasets. Furthermore, mimi-surv was shown to identify more PDAC related miRNAs than other methods because it used the known structure for miRNA-mRNA regularization. An implementation of mimi-surv is available at http://statgen.snu.ac.kr/software/mimi-surv.
Collapse
Affiliation(s)
- Yongkang Kim
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul, South Korea.,Department of Genomic Medicine, Seoul National University Hospital, Seoul, South Korea
| | - Jin-Young Jang
- Department of Surgery and Cancer Research Institute, Seoul National University College of Medicine, Seoul, South Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Seoul, South Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, South Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| |
Collapse
|
18
|
Souza MG, Vallejo EE, Estrada K. Detecting Clustered Independent Rare Variant Associations Using Genetic Algorithms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:932-939. [PMID: 31403438 DOI: 10.1109/tcbb.2019.2930505] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The availability of an increasing collection of sequencing data provides the opportunity to study genetic variation with an unprecedented level of detail. There is much interest in uncovering the role of rare variants and their contribution to disease. However, detecting associations of rare variants with small minor allele frequencies (MAF) and modest effects remains a challenge for rare variant association methods. Due to this low signal-to-noise ratio, most methods are underpowered to detect associations even when conducting rare variant association tests at the gene level. We present a new method for detecting rare variant associations. The algorithm consists of two steps. In the first step, a genetic algorithm searches for a promising genomic region containing a collection of genes with causal rare variants. In the second step, a genetic algorithm aims at removing false positives from the located genomic region. We tested the proposed method with a collection of datasets obtained from real exome data. The proposed method possesses sufficient power for detecting associations of rare variants with complex phenotypes. This method can be used for studying the contribution of rare variants with complex disease, particularly in cases where single-variant or gene-based tests are underpowered.
Collapse
|
19
|
Hwang H, Cho G, Jin MJ, Ryoo JH, Choi Y, Lee SH. A knowledge-based multivariate statistical method for examining gene-brain-behavioral/cognitive relationships: Imaging genetics generalized structured component analysis. PLoS One 2021; 16:e0247592. [PMID: 33690643 PMCID: PMC7946325 DOI: 10.1371/journal.pone.0247592] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/10/2021] [Indexed: 12/30/2022] Open
Abstract
With advances in neuroimaging and genetics, imaging genetics is a naturally emerging field that combines genetic and neuroimaging data with behavioral or cognitive outcomes to examine genetic influence on altered brain functions associated with behavioral or cognitive variation. We propose a statistical approach, termed imaging genetics generalized structured component analysis (IG-GSCA), which allows researchers to investigate such gene-brain-behavior/cognitive associations, taking into account well-documented biological characteristics (e.g., genetic pathways, gene-environment interactions, etc.) and methodological complexities (e.g., multicollinearity) in imaging genetic studies. We begin by describing the conceptual and technical underpinnings of IG-GSCA. We then apply the approach for investigating how nine depression-related genes and their interactions with an environmental variable (experience of potentially traumatic events) influence the thickness variations of 53 brain regions, which in turn affect depression severity in a sample of Korean participants. Our analysis shows that a dopamine receptor gene and an interaction between a serotonin transporter gene and the environment variable have statistically significant effects on a few brain regions' variations that have statistically significant negative impacts on depression severity. These relationships are largely supported by previous studies. We also conduct a simulation study to safeguard whether IG-GSCA can recover parameters as expected in a similar situation.
Collapse
Affiliation(s)
- Heungsun Hwang
- Department of Psychology, McGill University, Montreal, Quebec, Canada
| | - Gyeongcheol Cho
- Department of Psychology, McGill University, Montreal, Quebec, Canada
| | - Min Jin Jin
- Institute of Liberal Education, Kongju National University, Gongju, Korea
| | - Ji Hoon Ryoo
- Department of Education, Yonsei University, Seoul, Korea
| | - Younyoung Choi
- Department of Counseling Psychology, Hanyang Cyber University, Seoul, Korea
| | - Seung Hwan Lee
- Department of Psychiatry, Inje University Ilsan-Paik Hospital and Inje University, Goyang, Korea
| |
Collapse
|
20
|
Kim B, Cho EJ, Yoon JH, Kim SS, Cheong JY, Cho SW, Park T. Pathway-Based Integrative Analysis of Metabolome and Microbiome Data from Hepatocellular Carcinoma and Liver Cirrhosis Patients. Cancers (Basel) 2020; 12:E2705. [PMID: 32967314 PMCID: PMC7563418 DOI: 10.3390/cancers12092705] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 09/14/2020] [Accepted: 09/16/2020] [Indexed: 12/12/2022] Open
Abstract
Aberrations of the human microbiome are associated with diverse liver diseases, including hepatocellular carcinoma (HCC). Even if we can associate specific microbes with particular diseases, it is difficult to know mechanistically how the microbe contributes to the pathophysiology. Here, we sought to reveal the functional potential of the HCC-associated microbiome with the human metabolome which is known to play a role in connecting host phenotype to microbiome function. To utilize both microbiome and metabolomic data sets, we propose an innovative, pathway-based analysis, Hierarchical structural Component Model for pathway analysis of Microbiome and Metabolome (HisCoM-MnM), for integrating microbiome and metabolomic data. In particular, we used pathway information to integrate these two omics data sets, thus providing insight into biological interactions between different biological layers, with regard to the host's phenotype. The application of HisCoM-MnM to data sets from 103 and 97 patients with HCC and liver cirrhosis (LC), respectively, showed that this approach could identify HCC-related pathways related to cancer metabolic reprogramming, in addition to the significant metabolome and metagenome that make up those pathways.
Collapse
Affiliation(s)
- Boram Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea;
| | - Eun Ju Cho
- Department of Internal Medicine and Liver Research Institute, Seoul National University College of Medicine, Seoul 03080, Korea; (E.J.C.); (J.-H.Y.)
| | - Jung-Hwan Yoon
- Department of Internal Medicine and Liver Research Institute, Seoul National University College of Medicine, Seoul 03080, Korea; (E.J.C.); (J.-H.Y.)
| | - Soon Sun Kim
- Department of Gastroenterology, Ajou University School of Medicine, Suwon 16499, Korea; (S.S.K.); (J.Y.C.); (S.W.C.)
| | - Jae Youn Cheong
- Department of Gastroenterology, Ajou University School of Medicine, Suwon 16499, Korea; (S.S.K.); (J.Y.C.); (S.W.C.)
| | - Sung Won Cho
- Department of Gastroenterology, Ajou University School of Medicine, Suwon 16499, Korea; (S.S.K.); (J.Y.C.); (S.W.C.)
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea;
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
21
|
Choi S, Lee S, Huh I, Hwang H, Park T. HisCoM-G×E: Hierarchical Structural Component Analysis of Gene-Based Gene-Environment Interactions. Int J Mol Sci 2020; 21:E6724. [PMID: 32937825 PMCID: PMC7555026 DOI: 10.3390/ijms21186724] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 08/31/2020] [Accepted: 09/04/2020] [Indexed: 11/30/2022] Open
Abstract
Gene-environment interaction (G×E) studies are one of the most important solutions for understanding the "missing heritability" problem in genome-wide association studies (GWAS). Although many statistical methods have been proposed for detecting and identifying G×E, most employ single nucleotide polymorphism (SNP)-level analysis. In this study, we propose a new statistical method, Hierarchical structural CoMponent analysis of gene-based Gene-Environment interactions (HisCoM-G×E). HisCoM-G×E is based on the hierarchical structural relationship among all SNPs within a gene, and can accommodate all possible SNP-level effects into a single latent variable, by imposing a ridge penalty, and thus more efficiently takes into account the latent interaction term of G×E. The performance of the proposed method was evaluated in simulation studies, and we applied the proposed method to investigate gene-alcohol intake interactions affecting systolic blood pressure (SBP), using samples from the Korea Associated REsource (KARE) consortium data.
Collapse
Affiliation(s)
- Sungkyoung Choi
- Department of Applied Mathematics, Hanyang University (ERICA), Ansan 15588, Korea;
| | - Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul 03080, Korea;
| | - Iksoo Huh
- Department of nursing, College of Nursing and Research Institute of Nursing Science, Seoul National University, Seoul 03080, Korea;
| | - Heungsun Hwang
- Department of Psychology, McGill University, Montreal, QC H3A 1G1, Canada;
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
22
|
Leem S, Huh I, Park T. Enhanced Permutation Tests via Multiple Pruning. Front Genet 2020; 11:509. [PMID: 32670346 PMCID: PMC7330123 DOI: 10.3389/fgene.2020.00509] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Accepted: 04/27/2020] [Indexed: 11/25/2022] Open
Abstract
Big multi-omics data in bioinformatics often consists of a huge number of features and relatively small numbers of samples. In addition, features from multi-omics data have their own specific characteristics depending on whether they are from genomics, proteomics, metabolomics, etc. Due to these distinct characteristics, standard statistical analyses using parametric-based assumptions may sometimes fail to provide exact asymptotic results. To resolve this issue, permutation tests can be a way to exactly analyze multi-omics data because they are distribution-free and flexible to use. In permutation tests, p-values are evaluated by estimating the locations of test statistics in an empirical null distribution generated by random shuffling. However, the permutation approach can be infeasible when the number of features increases, because more stringent control of type I error is needed for multiple hypothesis testing, and consequently, much larger numbers of permutations are required to reach significance. To address this problem, we propose a well-organized strategy, “ENhanced Permutation tests via multiple Pruning (ENPP).” ENPP prunes the features in every permutation round if they are determined to be non-significant. In other words, if the feature statistics from the permuted datasets exceed the feature statistics from the original dataset, beyond a predetermined threshold, the feature is determined to be non-significant. If so, ENPP removes the feature and iterates the process without the feature in the next permutation round. Our simulation study showed that the ENPP method could remove about 50% of the features at the first permutation round, and, by the 100th permutation round, 98% of the features had been removed and only 7.4% of the computation time with the original unpruned permutation approach had elapsed. In addition, we applied this approach to a real data set (Korea Association REsource: KARE) of 327,872 SNPs to find association with a non-normally distributed phenotype (fasting plasma glucose), interpreted the results, and discussed the feasibility and advantages of the approach.
Collapse
Affiliation(s)
- Sangseob Leem
- Department of Statistics, Seoul National University, Seoul, South Korea
| | - Iksoo Huh
- College of Nursing and Research Institute of Nursing Science, Seoul National University, Seoul, South Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, South Korea
| |
Collapse
|
23
|
Zhao Z, Zucknick M. Structured penalized regression for drug sensitivity prediction. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12400] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
24
|
Jiang N, Lee S, Park T. HisCoM-PCA: software for hierarchical structural component analysis for pathway analysis based using principal component analysis. Genomics Inform 2020; 18:e11. [PMID: 32224844 PMCID: PMC7120349 DOI: 10.5808/gi.2020.18.1.e11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 03/11/2020] [Indexed: 11/20/2022] Open
Affiliation(s)
- Nan Jiang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul 08826, Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Statistics, Seoul National University, Seoul 08826, Korea
- Corresponding author: E-mail:
| |
Collapse
|
25
|
Jiang N, Lee S, Park T. Hierarchical structural component model for pathway analysis of common variants. BMC Med Genomics 2020; 13:26. [PMID: 32093692 PMCID: PMC7038534 DOI: 10.1186/s12920-019-0650-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 12/19/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Genome-wide association studies (GWAS) have been widely used to identify phenotype-related genetic variants using many statistical methods, such as logistic and linear regression. However, GWAS-identified SNPs, as identified with stringent statistical significance, explain just a small portion of the overall estimated genetic heritability. To address this 'missing heritability' issue, gene- and pathway-based analysis, and biological mechanisms, have been used for many GWAS studies. However, many of these methods often neglect the correlation between genes and between pathways. METHODS We constructed a hierarchical component model that considers correlations both between genes and between pathways. Based on this model, we propose a novel pathway analysis method for GWAS datasets, Hierarchical structural Component Model for Pathway analysis of Common vAriants (HisCoM-PCA). HisCoM-PCA first summarizes the common variants of each gene, first at the gene-level, and then analyzes all pathways simultaneously by ridge-type penalization of both the gene and pathway effects on the phenotype. Statistical significance of the gene and pathway coefficients can be examined by permutation tests. RESULTS Using the simulation data set of Genetic Analysis Workshop 17 (GAW17), for both binary and continuous phenotypes, we showed that HisCoM-PCA well-controlled type I error, and had a higher empirical power compared to several other methods. In addition, we applied our method to a SNP chip dataset of KARE for four human physiologic traits: (1) type 2 diabetes; (2) hypertension; (3) systolic blood pressure; and (4) diastolic blood pressure. Those results showed that HisCoM-PCA could successfully identify signal pathways with superior statistical and biological significance. CONCLUSIONS Our approach has the advantage of providing an intuitive biological interpretation for associations between common variants and phenotypes, via pathway information, potentially addressing the missing heritability conundrum.
Collapse
Affiliation(s)
- Nan Jiang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea
| | - Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul, 03080, Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Korea.
- Department of Statistics, Seoul National University, Seoul, 08826, Korea.
| |
Collapse
|
26
|
Choi JY, Kyung M, Hwang H, Park JH. Bayesian Extended Redundancy Analysis: A Bayesian Approach to Component-based Regression with Dimension Reduction. MULTIVARIATE BEHAVIORAL RESEARCH 2020; 55:30-48. [PMID: 31021267 DOI: 10.1080/00273171.2019.1598837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Extended redundancy analysis (ERA) combines linear regression with dimension reduction to explore the directional relationships between multiple sets of predictors and outcome variables in a parsimonious manner. It aims to extract a component from each set of predictors in such a way that it accounts for the maximum variance of outcome variables. In this article, we extend ERA into the Bayesian framework, called Bayesian ERA (BERA). The advantages of BERA are threefold. First, BERA enables to make statistical inferences based on samples drawn from the joint posterior distribution of parameters obtained from a Markov chain Monte Carlo algorithm. As such, it does not necessitate any resampling method, which is on the other hand required for (frequentist's) ordinary ERA to test the statistical significance of parameter estimates. Second, it formally incorporates relevant information obtained from previous research into analyses by specifying informative power prior distributions. Third, BERA handles missing data by implementing multiple imputation using a Markov Chain Monte Carlo algorithm, avoiding the potential bias of parameter estimates due to missing data. We assess the performance of BERA through simulation studies and apply BERA to real data regarding academic achievement.
Collapse
Affiliation(s)
- Ji Yeh Choi
- Department of Psychology, National University of Singapore, Singapore, Singapore
| | - Minjung Kyung
- Department of Statistics, Duksung Women's University, Seoul, Korea
| | - Heungsun Hwang
- Department of Psychology, McGill University, Montreal, Quebec, Canada
| | - Ju-Hyun Park
- Department of Statistics, Dongguk University, Seoul, Korea
| |
Collapse
|
27
|
Mok L, Park T. HisCoM-PAGE: software for hierarchical structural component models for pathway analysis of gene expression data. Genomics Inform 2019; 17:e45. [PMID: 31896245 PMCID: PMC6944051 DOI: 10.5808/gi.2019.17.4.e45] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 11/22/2019] [Indexed: 12/04/2022] Open
Abstract
To identify pathways associated with survival phenotypes using gene expression data, we recently proposed the hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE) method. The HisCoM-PAGE software can consider hierarchical structural relationships between genes and pathways and analyze multiple pathways simultaneously. It can be applied to various types of gene expression data, such as microarray data or RNA sequencing data. We expect that the HisCoM-PAGE software will make our method more easily accessible to researchers who want to perform pathway analysis for survival times.
Collapse
Affiliation(s)
- Lydia Mok
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Statistics, Seoul National University, Seoul 08826, Korea
- Corresponding author: E-mail:
| |
Collapse
|
28
|
Mok L, Kim Y, Lee S, Choi S, Lee S, Jang JY, Park T. HisCoM-PAGE: Hierarchical Structural Component Models for Pathway Analysis of Gene Expression Data. Genes (Basel) 2019; 10:E931. [PMID: 31739607 PMCID: PMC6896173 DOI: 10.3390/genes10110931] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 11/06/2019] [Accepted: 11/07/2019] [Indexed: 01/10/2023] Open
Abstract
Although there have been several analyses for identifying cancer-associated pathways, based on gene expression data, most of these are based on single pathway analyses, and thus do not consider correlations between pathways. In this paper, we propose a hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE), which accounts for the hierarchical structure of genes and pathways, as well as the correlations among pathways. Specifically, HisCoM-PAGE focuses on the survival phenotype and identifies its associated pathways. Moreover, its application to real biological data analysis of pancreatic cancer data demonstrated that HisCoM-PAGE could successfully identify pathways associated with pancreatic cancer prognosis. Simulation studies comparing the performance of HisCoM-PAGE with other competing methods such as Gene Set Enrichment Analysis (GSEA), Global Test, and Wald-type Test showed HisCoM-PAGE to have the highest power to detect causal pathways in most simulation scenarios.
Collapse
Affiliation(s)
- Lydia Mok
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Yongkang Kim
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| | - Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul 03080, Korea
| | - Sungkyoung Choi
- Department of Applied Mathematics, Hanyang University (ERICA), Ansan 15588, Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Seoul 05006, Korea
| | - Jin-Young Jang
- Department of Surgery, Seoul National University College of Medicine, Seoul 03080, Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
29
|
Lee S, Kim S, Kim Y, Oh B, Hwang H, Park T. Pathway analysis of rare variants for the clustered phenotypes by using hierarchical structured components analysis. BMC Med Genomics 2019; 12:100. [PMID: 31296220 PMCID: PMC6624181 DOI: 10.1186/s12920-019-0517-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
BACKGROUNDS Recent large-scale genetic studies often involve clustered phenotypes such as repeated measurements. Compared to a series of univariate analyses of single phenotypes, an analysis of clustered phenotypes can be useful for substantially increasing statistical power to detect more genetic associations. Moreover, for the analysis of rare variants, incorporation of biological information can boost weak effects of the rare variants. RESULTS Through simulation studies, we showed that the proposed method outperforms other method currently available for pathway-level analysis of clustered phenotypes. Moreover, a real data analysis using a large-scale whole exome sequencing dataset of 995 samples with metabolic syndrome-related phenotypes successfully identified the glyoxylate and dicarboxylate metabolism pathway that could not be identified by the univariate analyses of single phenotypes and other existing method. CONCLUSION In this paper, we introduced a novel pathway-level association test by combining hierarchical structured components analysis and penalized generalized estimating equations. The proposed method analyzes all pathways in a single unified model while considering their correlations. C/C++ implementation of PHARAOH-GEE is publicly available at http://statgen.snu.ac.kr/software/pharaoh-gee/ .
Collapse
Affiliation(s)
- Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul, Korea
| | - Sunmee Kim
- Department of Psychology, McGill University, Montreal, Canada
| | - Yongkang Kim
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Bermseok Oh
- Department of Biochemistry and Molecular Biology, School of Medicine, Kyung Hee University, Seoul, Korea
| | - Heungsun Hwang
- Department of Psychology, McGill University, Montreal, Canada
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.
| |
Collapse
|
30
|
Lee S, Park T. Integration of a Large-Scale Genetic Analysis Workbench Increases the Accessibility of a High-Performance Pathway-Based Analysis Method. Genomics Inform 2018; 16:e39. [PMID: 30602100 PMCID: PMC6440666 DOI: 10.5808/gi.2018.16.4.e39] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Accepted: 12/14/2018] [Indexed: 11/20/2022] Open
Abstract
The rapid increase in genetic dataset volume has demanded extensive adoption of biological knowledge to reduce the computational complexity, and the biological pathway is one well-known source of such knowledge. In this regard, we have introduced a novel statistical method that enables the pathway-based association study of large-scale genetic dataset—namely, PHARAOH. However, researcher-level application of the PHARAOH method has been limited by a lack of generally used file formats and the absence of various quality control options that are essential to practical analysis. In order to overcome these limitations, we introduce our integration of the PHARAOH method into our recently developed all-in-one workbench. The proposed new PHARAOH program not only supports various de facto standard genetic data formats but also provides many quality control measures and filters based on those measures. We expect that our updated PHARAOH provides advanced accessibility of the pathway-level analysis of large-scale genetic datasets to researchers.
Collapse
Affiliation(s)
- Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul 03080, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
31
|
Choi S, Lee S, Kim Y, Hwang H, Park T. HisCoM-GGI: Hierarchical structural component analysis of gene-gene interactions. J Bioinform Comput Biol 2018; 16:1840026. [PMID: 30567476 DOI: 10.1142/s0219720018400267] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Although genome-wide association studies (GWAS) have successfully identified thousands of single nucleotide polymorphisms (SNPs) associated with common diseases, these observations are limited for fully explaining "missing heritability". Determining gene-gene interactions (GGI) are one possible avenue for addressing the missing heritability problem. While many statistical approaches have been proposed to detect GGI, most of these focus primarily on SNP-to-SNP interactions. While there are many advantages of gene-based GGI analyses, such as reducing the burden of multiple-testing correction, and increasing power by aggregating multiple causal signals across SNPs in specific genes, only a few methods are available. In this study, we proposed a new statistical approach for gene-based GGI analysis, "Hierarchical structural CoMponent analysis of Gene-Gene Interactions" (HisCoM-GGI). HisCoM-GGI is based on generalized structured component analysis, and can consider hierarchical structural relationships between genes and SNPs. For a pair of genes, HisCoM-GGI first effectively summarizes all possible pairwise SNP-SNP interactions into a latent variable, from which it then performs GGI analysis. HisCoM-GGI can evaluate both gene-level and SNP-level interactions. Through simulation studies, HisCoM-GGI demonstrated higher statistical power than existing gene-based GGI methods, in analyzing a GWAS of a Korean population for identifying GGI associated with body mass index. Resultantly, HisCoM-GGI successfully identified 14 potential GGI, two of which, (NCOR2 <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mo>×</mml:mo></mml:math> SPOCK1) and (LINGO2 <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mo>×</mml:mo></mml:math> ZNF385D) were successfully replicated in independent datasets. We conclude that HisCoM-GGI method may be a valuable tool for genome to identify GGI in missing heritability, allowing us to better understand the biological genetic mechanisms of complex traits. We conclude that HisCoM-GGI method may be a valuable tool for genome to identify GGI in missing heritability, allowing us to better understand biological genetic mechanisms of complex traits. An implementation of HisCoM-GGI can be downloaded from the website ( http://statgen.snu.ac.kr/software/hiscom-ggi ).
Collapse
Affiliation(s)
- Sungkyoung Choi
- Department of Pharmacology, Yonsei University College of Medicine, 50-1 Yonsei-ro Seodaemun-gu, Seoul 03722, Republic of Korea
| | - Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, 71 Daehak-ro Jongno-gu, Seoul 03082, Republic of Korea
| | - Yongkang Kim
- Department of Statistics, Seoul National University, 1 Gwanak-ro Gwanak-gu, Seoul 08826, Republic of Korea.,Department of Psychology, McGill University, 2001 Avenue McGill College, Montreal, Quebec H3A 1G1, Canada
| | - Heungsun Hwang
- Department of Psychology, McGill University, 2001 Avenue McGill College, Montreal, Quebec H3A 1G1, Canada
| | - Taesung Park
- Department of Statistics, Seoul National University, 1 Gwanak-ro Gwanak-gu, Seoul 08826, Republic of Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro Gwanak-gu, Seoul 08826, Republic of Korea
| |
Collapse
|
32
|
Kim S, Choi S, Yoon JH, Kim Y, Lee S, Park T. Drug response prediction model using a hierarchical structural component modeling method. BMC Bioinformatics 2018; 19:288. [PMID: 30367591 PMCID: PMC6101092 DOI: 10.1186/s12859-018-2270-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Component-based structural equation modeling methods are now widely used in science, business, education, and other fields. This method uses unobservable variables, i.e., "latent" variables, and structural equation model relationships between observable variables. Here, we applied this structural equation modeling method to biologically structured data. To identify candidate drug-response biomarkers, we first used proteomic peptide-level data, as measured by multiple reaction monitoring mass spectrometry (MRM-MS), for liver cancer patients. MRM-MS is a highly sensitive and selective method for proteomic targeted quantitation of peptide abundances in complex biological samples. RESULTS We developed a component-based drug response prediction model, having the advantage that it first combines collapsed peptide-level data into protein-level information, facilitating subsequent biological interpretation. Our model also uses an alternating least squares algorithm, to efficiently estimate both coefficients of peptides and proteins. This approach also considers correlations between variables, without constraint, by a multiple testing problem. Using estimated peptide and protein coefficients, we selected significant protein biomarkers by permutation testing, resulting in our model for predicting liver cancer response to the tyrosine kinase inhibitor sorafenib. CONCLUSIONS Using data from a cohort of liver cancer patients, we then "fine-tuned" our model to successfully predict drug responses, as demonstrated by a high area under the curve (AUC) score. Such drug response prediction models may eventually find clinical translation in identifying individual patients likely to respond to specific therapies.
Collapse
Affiliation(s)
- Sungtae Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826 South Korea
| | - Sungkyoung Choi
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826 South Korea
| | - Jung-Hwan Yoon
- Department of Internal Medicine and Liver Research Institute, Seoul National University College of Medicine, Seoul, 03080 South Korea
| | - Youngsoo Kim
- Department of Biomedical Engineering, Seoul National University College of Medicine, Seoul, 03080 South Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Seoul, 05006 South Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826 South Korea
- Department of Statistics, Seoul National University, Seoul, 08826 South Korea
| |
Collapse
|
33
|
Lee S, Kim Y, Choi S, Hwang H, Park T. Pathway-based approach using hierarchical components of rare variants to analyze multiple phenotypes. BMC Bioinformatics 2018; 19:79. [PMID: 29745849 PMCID: PMC5998880 DOI: 10.1186/s12859-018-2066-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND As one possible solution to the "missing heritability" problem, many methods have been proposed that apply pathway-based analyses, using rare variants that are detected by next generation sequencing technology. However, while a number of methods for pathway-based rare-variant analysis of multiple phenotypes have been proposed, no method considers a unified model that incorporate multiple pathways. RESULTS Simulation studies successfully demonstrated advantages of multivariate analysis, compared to univariate analysis, and comparison studies showed the proposed approach to outperform existing methods. Moreover, real data analysis of six type 2 diabetes-related traits, using large-scale whole exome sequencing data, identified significant pathways that were not found by univariate analysis. Furthermore, strong relationships between the identified pathways, and their associated metabolic disorder risk factors, were found via literature search, and one of the identified pathway, was successfully replicated by an analysis with an independent dataset. CONCLUSIONS Herein, we present a powerful, pathway-based approach to investigate associations between multiple pathways and multiple phenotypes. By reflecting the natural hierarchy of biological behavior, and considering correlation between pathways and phenotypes, the proposed method is capable of analyzing multiple phenotypes and multiple pathways simultaneously.
Collapse
Affiliation(s)
- Sungyoung Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Yongkang Kim
- Department of Statistics, Seoul National University, 1 Gwanak-ro Gwanak-gu, Seoul, 08826, Korea
| | - Sungkyoung Choi
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Heungsun Hwang
- Department of Psychology, McGill University, Montreal, Canada
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea.
- Department of Statistics, Seoul National University, 1 Gwanak-ro Gwanak-gu, Seoul, 08826, Korea.
| |
Collapse
|
34
|
Kim Y, Lee S, Choi S, Jang JY, Park T. Hierarchical structural component modeling of microRNA-mRNA integration analysis. BMC Bioinformatics 2018; 19:75. [PMID: 29745843 PMCID: PMC5998903 DOI: 10.1186/s12859-018-2070-0] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
BACKGROUND Identification of multi-markers is one of the most challenging issues in personalized medicine era. Nowadays, many different types of omics data are generated from the same subject. Although many methods endeavor to identify candidate markers, for each type of omics data, few or none can facilitate such identification. RESULTS It is well known that microRNAs affect phenotypes only indirectly, through regulating mRNA expression and/or protein translation. Toward addressing this issue, we suggest a hierarchical structured component analysis of microRNA-mRNA integration ("HisCoM-mimi") model that accounts for this biological relationship, to efficiently study and identify such integrated markers. In simulation studies, HisCoM-mimi showed the better performance than the other three methods. Also, in real data analysis, HisCoM-mimi successfully identified more gives more informative miRNA-mRNA integration sets relationships for pancreatic ductal adenocarcinoma (PDAC) diagnosis, compared to the other methods. CONCLUSION As exemplified by an application to pancreatic cancer data, our proposed model effectively identified integrated miRNA/target mRNA pairs as markers for early diagnosis, providing a much broader biological interpretation.
Collapse
Affiliation(s)
- Yongkang Kim
- Department of Statistics, Seoul National University, Seoul, Korea
| | - Sungyoung Lee
- Interdisciplinary program in Bioinformatics, Seoul National University, Seoul, Korea
| | - Sungkyoung Choi
- Interdisciplinary program in Bioinformatics, Seoul National University, Seoul, Korea
| | - Jin-Young Jang
- Department of Surgery and Cancer Research Institute, Seoul National University College of Medicine, Seoul, Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, Korea.
- Interdisciplinary program in Bioinformatics, Seoul National University, Seoul, Korea.
| |
Collapse
|