1
|
Liu W, Pratte KA, Castaldi PJ, Hersh C, Bowler RP, Banaei-Kashani F, Kechris KJ. A generalized higher-order correlation analysis framework for multi-omics network inference. PLoS Comput Biol 2025; 21:e1011842. [PMID: 40228208 PMCID: PMC11996223 DOI: 10.1371/journal.pcbi.1011842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 01/31/2025] [Indexed: 04/16/2025] Open
Abstract
Multiple -omics (genomics, proteomics, etc.) profiles are commonly generated to gain insight into a disease or physiological system. Constructing multi-omics networks with respect to the trait(s) of interest provides an opportunity to understand relationships between molecular features but integration is challenging due to multiple data sets with high dimensionality. One approach is to use canonical correlation to integrate one or two omics types and a single trait of interest. However, these types of methods may be limited due to (1) not accounting for higher-order correlations existing among features, (2) computational inefficiency when extending to more than two omics data when using a penalty term-based sparsity method, and (3) lack of flexibility for focusing on specific correlations (e.g., omics-to-phenotype correlation versus omics-to-omics correlations). In this work, we have developed a novel multi-omics network analysis pipeline called Sparse Generalized Tensor Canonical Correlation Analysis Network Inference (SGTCCA-Net) that can effectively overcome these limitations. We also introduce an implementation to improve the summarization of networks for downstream analyses. Simulation and real-data experiments demonstrate the effectiveness of our novel method for inferring omics networks and features of interest.
Collapse
Affiliation(s)
- Weixuan Liu
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Katherine A. Pratte
- Department of Biostatistics, National Jewish Health, Denver, Colorado, United States of America
| | - Peter J. Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Craig Hersh
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| | - Russell P. Bowler
- Division of Pulmonary Medicine, Department of Medicine, National Jewish Health, Denver, Colorado, United States of America
| | - Farnoush Banaei-Kashani
- Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, Colorado, United States of America
| | - Katerina J. Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| |
Collapse
|
2
|
Konigsberg IR, Vu T, Liu W, Litkowski EM, Pratte KA, Vargas LB, Gilmore N, Abdel-Hafiz M, Manichaikul A, Cho MH, Hersh CP, DeMeo DL, Banaei-Kashani F, Bowler RP, Lange LA, Kechris KJ. Proteomic networks and related genetic variants associated with smoking and chronic obstructive pulmonary disease. BMC Genomics 2024; 25:825. [PMID: 39223457 PMCID: PMC11370252 DOI: 10.1186/s12864-024-10619-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 07/15/2024] [Indexed: 09/04/2024] Open
Abstract
BACKGROUND Studies have identified individual blood biomarkers associated with chronic obstructive pulmonary disease (COPD) and related phenotypes. However, complex diseases such as COPD typically involve changes in multiple molecules with interconnections that may not be captured when considering single molecular features. METHODS Leveraging proteomic data from 3,173 COPDGene Non-Hispanic White (NHW) and African American (AA) participants, we applied sparse multiple canonical correlation network analysis (SmCCNet) to 4,776 proteins assayed on the SomaScan v4.0 platform to derive sparse networks of proteins associated with current vs. former smoking status, airflow obstruction, and emphysema quantitated from high-resolution computed tomography scans. We then used NetSHy, a dimension reduction technique leveraging network topology, to produce summary scores of each proteomic network, referred to as NetSHy scores. We next performed a genome-wide association study (GWAS) to identify variants associated with the NetSHy scores, or network quantitative trait loci (nQTLs). Finally, we evaluated the replicability of the networks in an independent cohort, SPIROMICS. RESULTS We identified networks of 13 to 104 proteins for each phenotype and exposure in NHW and AA, and the derived NetSHy scores significantly associated with the variable of interests. Networks included known (sRAGE, ALPP, MIP1) and novel molecules (CA10, CPB1, HIS3, PXDN) and interactions involved in COPD pathogenesis. We observed 7 nQTL loci associated with NetSHy scores, 4 of which remained after conditional analysis. Networks for smoking status and emphysema, but not airflow obstruction, demonstrated a high degree of replicability across race groups and cohorts. CONCLUSIONS In this work, we apply state-of-the-art molecular network generation and summarization approaches to proteomic data from COPDGene participants to uncover protein networks associated with COPD phenotypes. We further identify genetic associations with networks. This work discovers protein networks containing known and novel proteins and protein interactions associated with clinically relevant COPD phenotypes across race groups and cohorts.
Collapse
Affiliation(s)
- Iain R Konigsberg
- Department of Biomedical Informatics, School of Medicine, University of Colorado - Anschutz Medical Campus, Aurora, CO, USA
| | - Thao Vu
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA
| | - Weixuan Liu
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA
| | - Elizabeth M Litkowski
- Department of Biomedical Informatics, School of Medicine, University of Colorado - Anschutz Medical Campus, Aurora, CO, USA
- Department of Medicine, University of Michigan, Ann Arbor, MI, USA
| | | | - Luciana B Vargas
- Department of Biomedical Informatics, School of Medicine, University of Colorado - Anschutz Medical Campus, Aurora, CO, USA
| | - Niles Gilmore
- Department of Biomedical Informatics, School of Medicine, University of Colorado - Anschutz Medical Campus, Aurora, CO, USA
| | - Mohamed Abdel-Hafiz
- Department of Computer Science and Engineering, University of Colorado - Denver, Denver, CO, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Craig P Hersh
- Channing Division of Network Medicine, Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Dawn L DeMeo
- Channing Division of Network Medicine, Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Farnoush Banaei-Kashani
- Department of Computer Science and Engineering, University of Colorado - Denver, Denver, CO, USA
| | | | - Leslie A Lange
- Department of Biomedical Informatics, School of Medicine, University of Colorado - Anschutz Medical Campus, Aurora, CO, USA
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, USA.
| |
Collapse
|
3
|
Liu W, Vu T, R Konigsberg I, A Pratte K, Zhuang Y, Kechris KJ. Smccnet 2.0: a comprehensive tool for multi-omics network inference with shiny visualization. BMC Bioinformatics 2024; 25:276. [PMID: 39179997 PMCID: PMC11344457 DOI: 10.1186/s12859-024-05900-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 08/14/2024] [Indexed: 08/26/2024] Open
Abstract
Sparse multiple canonical correlation network analysis (SmCCNet) is a machine learning technique for integrating omics data along with a variable of interest (e.g., phenotype of complex disease), and reconstructing multi-omics networks that are specific to this variable. We present the second-generation SmCCNet (SmCCNet 2.0) that adeptly integrates single or multiple omics data types along with a quantitative or binary phenotype of interest. In addition, this new package offers a streamlined setup process that can be configured manually or automatically, ensuring a flexible and user-friendly experience. AVAILABILITY : This package is available in both CRAN: https://cran.r-project.org/web/packages/SmCCNet/index.html and Github: https://github.com/KechrisLab/SmCCNet under the MIT license. The network visualization tool is available at https://smccnet.shinyapps.io/smccnetnetwork/ .
Collapse
Affiliation(s)
- Weixuan Liu
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA.
| | - Thao Vu
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Iain R Konigsberg
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| | - Katherine A Pratte
- Department of Biostatistics, National Jewish Health, Denver, 80206, CO, USA
| | - Yonghua Zhuang
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, 80045, USA
| |
Collapse
|
4
|
Liu W, Vu T, Konigsberg I, Pratte K, Zhuang Y, Kechris K. SmCCNet 2.0: A Comprehensive Tool for Multi-omics Network Inference with Shiny Visualization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.20.567893. [PMID: 38045372 PMCID: PMC10690212 DOI: 10.1101/2023.11.20.567893] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Summary Sparse multiple canonical correlation network analysis (SmCCNet) is a machine learning technique for integrating omics data along with a variable of interest (e.g., phenotype of complex disease), and reconstructing multi-omics networks that are specific to this variable. We present the second-generation SmCCNet (SmCCNet 2.0) that adeptly integrates single or multiple omics data types along with a quantitative or binary phenotype of interest. In addition, this new package offers a streamlined setup process that can be configured manually or automatically, ensuring a flexible and user-friendly experience. Availability This package is available in both CRAN: https://cran.r-project.org/web/packages/SmCCNet/index.html and Github: https://github.com/KechrisLab/SmCCNet under the MIT license. The network visualization tool is available at https://smccnet.shinyapps.io/smccnetnetwork/.
Collapse
Affiliation(s)
- Weixuan Liu
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| | - Thao Vu
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| | - Iain Konigsberg
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| | - Katherine Pratte
- Department of Biostatistics, National Jewish Health, Denver, 80206, CO, USA
| | - Yonghua Zhuang
- Department of Pediatrics, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, 80045, CO, USA
| |
Collapse
|
5
|
Konigsberg IR, Vu T, Liu W, Litkowski EM, Pratte KA, Vargas LB, Gilmore N, Abdel-Hafiz M, Manichaikul AW, Cho MH, Hersh CP, DeMeo DL, Banaei-Kashani F, Bowler RP, Lange LA, Kechris KJ. Proteomic Networks and Related Genetic Variants Associated with Smoking and Chronic Obstructive Pulmonary Disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.26.24303069. [PMID: 38464285 PMCID: PMC10925350 DOI: 10.1101/2024.02.26.24303069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Background Studies have identified individual blood biomarkers associated with chronic obstructive pulmonary disease (COPD) and related phenotypes. However, complex diseases such as COPD typically involve changes in multiple molecules with interconnections that may not be captured when considering single molecular features. Methods Leveraging proteomic data from 3,173 COPDGene Non-Hispanic White (NHW) and African American (AA) participants, we applied sparse multiple canonical correlation network analysis (SmCCNet) to 4,776 proteins assayed on the SomaScan v4.0 platform to derive sparse networks of proteins associated with current vs. former smoking status, airflow obstruction, and emphysema quantitated from high-resolution computed tomography scans. We then used NetSHy, a dimension reduction technique leveraging network topology, to produce summary scores of each proteomic network, referred to as NetSHy scores. We next performed genome-wide association study (GWAS) to identify variants associated with the NetSHy scores, or network quantitative trait loci (nQTLs). Finally, we evaluated the replicability of the networks in an independent cohort, SPIROMICS. Results We identified networks of 13 to 104 proteins for each phenotype and exposure in NHW and AA, and the derived NetSHy scores significantly associated with the variable of interests. Networks included known (sRAGE, ALPP, MIP1) and novel molecules (CA10, CPB1, HIS3, PXDN) and interactions involved in COPD pathogenesis. We observed 7 nQTL loci associated with NetSHy scores, 4 of which remained after conditional analysis. Networks for smoking status and emphysema, but not airflow obstruction, demonstrated a high degree of replicability across race groups and cohorts. Conclusions In this work, we apply state-of-the-art molecular network generation and summarization approaches to proteomic data from COPDGene participants to uncover protein networks associated with COPD phenotypes. We further identify genetic associations with networks. This work discovers protein networks containing known and novel proteins and protein interactions associated with clinically relevant COPD phenotypes across race groups and cohorts.
Collapse
Affiliation(s)
- Iain R Konigsberg
- Department of Biomedical Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Thao Vu
- Department of Biostatistics and Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Weixuan Liu
- Department of Biostatistics and Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Elizabeth M Litkowski
- Department of Biomedical Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
- Department of Medicine, University of Michigan, Ann Arbor, MI
| | | | - Luciana B Vargas
- Department of Biomedical Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Niles Gilmore
- Department of Biomedical Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Mohamed Abdel-Hafiz
- Department of Computer Science and Engineering, University of Colorado - Denver, Denver, CO
| | - Ani W Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA
| | - Michael H Cho
- Channing Division of Network Medicine and Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Craig P Hersh
- Channing Division of Network Medicine and Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Dawn L DeMeo
- Channing Division of Network Medicine and Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | | | | | - Leslie A Lange
- Department of Biomedical Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| | - Katerina J Kechris
- Department of Biostatistics and Informatics, University of Colorado - Anschutz Medical Campus, Aurora, CO
| |
Collapse
|
6
|
Liu W, Pratte KA, Castaldi PJ, Hersh C, Bowler RP, Banaei-Kashani F, Kechris KJ. A Generalized Higher-order Correlation Analysis Framework for Multi-Omics Network Inference. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.22.576667. [PMID: 38328226 PMCID: PMC10849540 DOI: 10.1101/2024.01.22.576667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
Multiple -omics (genomics, proteomics, etc.) profiles are commonly generated to gain insight into a disease or physiological system. Constructing multi-omics networks with respect to the trait(s) of interest provides an opportunity to understand relationships between molecular features but integration is challenging due to multiple data sets with high dimensionality. One approach is to use canonical correlation to integrate one or two omics types and a single trait of interest. However, these types of methods may be limited due to (1) not accounting for higher-order correlations existing among features, (2) computational inefficiency when extending to more than two omics data when using a penalty term-based sparsity method, and (3) lack of flexibility for focusing on specific correlations (e.g., omics-to-phenotype correlation versus omics-to-omics correlations). In this work, we have developed a novel multi-omics network analysis pipeline called Sparse Generalized Tensor Canonical Correlation Analysis Network Inference (SGTCCA-Net) that can effectively overcome these limitations. We also introduce an implementation to improve the summarization of networks for downstream analyses. Simulation and real-data experiments demonstrate the effectiveness of our novel method for inferring omics networks and features of interest.
Collapse
Affiliation(s)
- Weixuan Liu
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | | | - Peter J. Castaldi
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, United States
| | - Craig Hersh
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, United States
| | - Russell P. Bowler
- Division of Pulmonary Medicine, Department of Medicine, National Jewish Health, Denver, CO, USA
| | - Farnoush Banaei-Kashani
- Department of Computer Science and Engineering, College of Engineering, Design and Computing, University of Colorado Denver, Denver, CO, USA
| | - Katerina J. Kechris
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|