1
|
Gygi JP, Konstorum A, Pawar S, Aron E, Kleinstein SH, Guan L. A supervised Bayesian factor model for the identification of multi-omics signatures. Bioinformatics 2024; 40:btae202. [PMID: 38603606 PMCID: PMC11078774 DOI: 10.1093/bioinformatics/btae202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 02/29/2024] [Accepted: 04/10/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are identified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive modeling. However, multi-omics integration and predictive modeling are generally performed independently in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful. RESULTS We developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the reconstruction of underlying factors in synthetic examples and prediction accuracy of coronavirus disease 2019 severity and breast cancer tumor subtypes. AVAILABILITY AND IMPLEMENTATION SPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.
Collapse
Affiliation(s)
- Jeremy P Gygi
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
| | - Anna Konstorum
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, United States
| | - Shrikant Pawar
- Department of Genetics, Yale Center for Genomic Analysis (YCGA), Yale School of Medicine, New Haven, CT 06520, United States
| | - Edel Aron
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
| | - Steven H Kleinstein
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT 06520, United States
- Department of Pathology, Yale School of Medicine, New Haven, CT 06520, United States
- Department of Immunobiology, Yale School of Medicine, New Haven, CT 06520, United States
| | - Leying Guan
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06520, United States
| |
Collapse
|
2
|
Gygi JP, Maguire C, Patel RK, Shinde P, Konstorum A, Shannon CP, Xu L, Hoch A, Jayavelu ND, Haddad EK, Reed EF, Kraft M, McComsey GA, Metcalf JP, Ozonoff A, Esserman D, Cairns CB, Rouphael N, Bosinger SE, Kim-Schulze S, Krammer F, Rosen LB, van Bakel H, Wilson M, Eckalbar WL, Maecker HT, Langelier CR, Steen H, Altman MC, Montgomery RR, Levy O, Melamed E, Pulendran B, Diray-Arce J, Smolen KK, Fragiadakis GK, Becker PM, Sekaly RP, Ehrlich LI, Fourati S, Peters B, Kleinstein SH, Guan L. Integrated longitudinal multiomics study identifies immune programs associated with acute COVID-19 severity and mortality. J Clin Invest 2024; 134:e176640. [PMID: 38690733 PMCID: PMC11060740 DOI: 10.1172/jci176640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 03/12/2024] [Indexed: 05/03/2024] Open
Abstract
BACKGROUNDPatients hospitalized for COVID-19 exhibit diverse clinical outcomes, with outcomes for some individuals diverging over time even though their initial disease severity appears similar to that of other patients. A systematic evaluation of molecular and cellular profiles over the full disease course can link immune programs and their coordination with progression heterogeneity.METHODSWe performed deep immunophenotyping and conducted longitudinal multiomics modeling, integrating 10 assays for 1,152 Immunophenotyping Assessment in a COVID-19 Cohort (IMPACC) study participants and identifying several immune cascades that were significant drivers of differential clinical outcomes.RESULTSIncreasing disease severity was driven by a temporal pattern that began with the early upregulation of immunosuppressive metabolites and then elevated levels of inflammatory cytokines, signatures of coagulation, formation of neutrophil extracellular traps, and T cell functional dysregulation. A second immune cascade, predictive of 28-day mortality among critically ill patients, was characterized by reduced total plasma Igs and B cells and dysregulated IFN responsiveness. We demonstrated that the balance disruption between IFN-stimulated genes and IFN inhibitors is a crucial biomarker of COVID-19 mortality, potentially contributing to failure of viral clearance in patients with fatal illness.CONCLUSIONOur longitudinal multiomics profiling study revealed temporal coordination across diverse omics that potentially explain the disease progression, providing insights that can inform the targeted development of therapies for patients hospitalized with COVID-19, especially those who are critically ill.TRIAL REGISTRATIONClinicalTrials.gov NCT04378777.FUNDINGNIH (5R01AI135803-03, 5U19AI118608-04, 5U19AI128910-04, 4U19AI090023-11, 4U19AI118610-06, R01AI145835-01A1S1, 5U19AI062629-17, 5U19AI057229-17, 5U19AI125357-05, 5U19AI128913-03, 3U19AI077439-13, 5U54AI142766-03, 5R01AI104870-07, 3U19AI089992-09, 3U19AI128913-03, and 5T32DA018926-18); NIAID, NIH (3U19AI1289130, U19AI128913-04S1, and R01AI122220); and National Science Foundation (DMS2310836).
Collapse
Affiliation(s)
| | - Cole Maguire
- The University of Texas at Austin, Austin, Texas, USA
| | | | - Pramod Shinde
- La Jolla Institute for Immunology, La Jolla, California, USA
| | | | - Casey P. Shannon
- Centre for Heart Lung Innovation, University of British Columbia, Vancouver, Canada
- Prevention of Organ Failure (PROOF) Centre of Excellence, Providence Research, Vancouver, British Columbia, Canada
| | - Leqi Xu
- Yale School of Public Health, New Haven, Connecticut, USA
| | - Annmarie Hoch
- Clinical and Data Coordinating Center (CDCC) and
- Precision Vaccines Program, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | - Elias K. Haddad
- Drexel University, Tower Health Hospital, Philadelphia, Pennsylvania, USA
| | - IMPACC Network
- The Immunophenotyping Assessment in a COVID-19 Cohort (IMPACC) Network is detailed in Supplemental Acknowledgments
| | - Elaine F. Reed
- David Geffen School of Medicine at the UCLA, Los Angeles, California, USA
| | - Monica Kraft
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Grace A. McComsey
- Case Western Reserve University and University Hospitals of Cleveland, Cleveland, Ohio, USA
| | - Jordan P. Metcalf
- Oklahoma University Health Sciences Center, Oklahoma City, Oklahoma, USA
| | - Al Ozonoff
- Clinical and Data Coordinating Center (CDCC) and
- Precision Vaccines Program, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Pediatrics, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | - Charles B. Cairns
- Drexel University, Tower Health Hospital, Philadelphia, Pennsylvania, USA
| | | | | | | | - Florian Krammer
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
- Ignaz Semmelweis Institute, Interuniversity Institute for Infection Research, Medical University of Vienna, Vienna, Austria
| | - Lindsey B. Rosen
- National Institute of Allergy and Infectious Diseases (NIAID), NIH, Bethesda, Maryland, USA
| | - Harm van Bakel
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | | | | | | | | | - Hanno Steen
- Precision Vaccines Program, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
- Department of Pathology, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | | | - Ofer Levy
- Precision Vaccines Program, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Department of Pediatrics, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | - Bali Pulendran
- Stanford University School of Medicine, Palo Alto, California, USA
| | - Joann Diray-Arce
- Clinical and Data Coordinating Center (CDCC) and
- Precision Vaccines Program, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
- Department of Pediatrics, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Kinga K. Smolen
- Precision Vaccines Program, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
- Department of Pediatrics, Boston Children’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | | | - Patrice M. Becker
- National Institute of Allergy and Infectious Diseases (NIAID), NIH, Bethesda, Maryland, USA
| | - Rafick P. Sekaly
- Case Western Reserve University and University Hospitals of Cleveland, Cleveland, Ohio, USA
| | | | - Slim Fourati
- Case Western Reserve University and University Hospitals of Cleveland, Cleveland, Ohio, USA
| | - Bjoern Peters
- La Jolla Institute for Immunology, La Jolla, California, USA
- Department of Medicine, UCSD, La Jolla, California, USA
| | | | - Leying Guan
- Yale School of Public Health, New Haven, Connecticut, USA
| |
Collapse
|
3
|
Shinde P, Soldevila F, Reyna J, Aoki M, Rasmussen M, Willemsen L, Kojima M, Ha B, Greenbaum JA, Overton JA, Guzman-Orozco H, Nili S, Orfield S, Gygi JP, da Silva Antunes R, Sette A, Grant B, Olsen LR, Konstorum A, Guan L, Ay F, Kleinstein SH, Peters B. A multi-omics systems vaccinology resource to develop and test computational models of immunity. Cell Rep Methods 2024; 4:100731. [PMID: 38490204 PMCID: PMC10985234 DOI: 10.1016/j.crmeth.2024.100731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 01/04/2024] [Accepted: 02/20/2024] [Indexed: 03/17/2024]
Abstract
Systems vaccinology studies have identified factors affecting individual vaccine responses, but comparing these findings is challenging due to varying study designs. To address this lack of reproducibility, we established a community resource for comparing Bordetella pertussis booster responses and to host annual contests for predicting patients' vaccination outcomes. We report here on our experiences with the "dry-run" prediction contest. We found that, among 20+ models adopted from the literature, the most successful model predicting vaccination outcome was based on age alone. This confirms our concerns about the reproducibility of conclusions between different vaccinology studies. Further, we found that, for newly trained models, handling of baseline information on the target variables was crucial. Overall, multiple co-inertia analysis gave the best results of the tested modeling approaches. Our goal is to engage community in these prediction challenges by making data and models available and opening a public contest in August 2024.
Collapse
Affiliation(s)
- Pramod Shinde
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Ferran Soldevila
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Joaquin Reyna
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA; Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, San Diego, CA, USA
| | - Minori Aoki
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Mikkel Rasmussen
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA; Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Lisa Willemsen
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Mari Kojima
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Brendan Ha
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Jason A Greenbaum
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - James A Overton
- Knocean Inc., 107 Quebec Avenue, Toronto, Ontario M6P 2T3, Canada
| | - Hector Guzman-Orozco
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Somayeh Nili
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Shelby Orfield
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Jeremy P Gygi
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
| | - Ricardo da Silva Antunes
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Alessandro Sette
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA; Department of Medicine, University of California, San Diego, San Diego, CA, USA
| | - Barry Grant
- Department of Molecular Biology, School of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Lars Rønn Olsen
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Anna Konstorum
- Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
| | - Leying Guan
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Ferhat Ay
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA; Department of Medicine, University of California, San Diego, San Diego, CA, USA
| | - Steven H Kleinstein
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA; Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
| | - Bjoern Peters
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA; Department of Medicine, University of California, San Diego, San Diego, CA, USA.
| |
Collapse
|
4
|
Gygi JP, Maguire C, Patel RK, Shinde P, Konstorum A, Shannon CP, Xu L, Hoch A, Jayavelu ND, Network I, Haddad EK, Reed EF, Kraft M, McComsey GA, Metcalf J, Ozonoff A, Esserman D, Cairns CB, Rouphael N, Bosinger SE, Kim-Schulze S, Krammer F, Rosen LB, van Bakel H, Wilson M, Eckalbar W, Maecker H, Langelier CR, Steen H, Altman MC, Montgomery RR, Levy O, Melamed E, Pulendran B, Diray-Arce J, Smolen KK, Fragiadakis GK, Becker PM, Augustine AD, Sekaly RP, Ehrlich LIR, Fourati S, Peters B, Kleinstein SH, Guan L. Integrated longitudinal multi-omics study identifies immune programs associated with COVID-19 severity and mortality in 1152 hospitalized participants. bioRxiv 2023:2023.11.03.565292. [PMID: 37986828 PMCID: PMC10659275 DOI: 10.1101/2023.11.03.565292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Hospitalized COVID-19 patients exhibit diverse clinical outcomes, with some individuals diverging over time even though their initial disease severity appears similar. A systematic evaluation of molecular and cellular profiles over the full disease course can link immune programs and their coordination with progression heterogeneity. In this study, we carried out deep immunophenotyping and conducted longitudinal multi-omics modeling integrating ten distinct assays on a total of 1,152 IMPACC participants and identified several immune cascades that were significant drivers of differential clinical outcomes. Increasing disease severity was driven by a temporal pattern that began with the early upregulation of immunosuppressive metabolites and then elevated levels of inflammatory cytokines, signatures of coagulation, NETosis, and T-cell functional dysregulation. A second immune cascade, predictive of 28-day mortality among critically ill patients, was characterized by reduced total plasma immunoglobulins and B cells, as well as dysregulated IFN responsiveness. We demonstrated that the balance disruption between IFN-stimulated genes and IFN inhibitors is a crucial biomarker of COVID-19 mortality, potentially contributing to the failure of viral clearance in patients with fatal illness. Our longitudinal multi-omics profiling study revealed novel temporal coordination across diverse omics that potentially explain disease progression, providing insights that inform the targeted development of therapies for hospitalized COVID-19 patients, especially those critically ill.
Collapse
|
5
|
Gygi JP, Konstorum A, Pawar S, Aron E, Kleinstein SH, Guan L. A supervised Bayesian factor model for the identification of multi-omics signatures. bioRxiv 2023:2023.01.25.525545. [PMID: 36747790 PMCID: PMC9900835 DOI: 10.1101/2023.01.25.525545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
MOTIVATION Predictive biological signatures provide utility as biomarkers for disease diagnosis and prognosis, as well as prediction of responses to vaccination or therapy. These signatures are iden-tified from high-throughput profiling assays through a combination of dimensionality reduction and machine learning techniques. The genes, proteins, metabolites, and other biological analytes that compose signatures also generate hypotheses on the underlying mechanisms driving biological responses, thus improving biological understanding. Dimensionality reduction is a critical step in signature discovery to address the large number of analytes in omics datasets, especially for multi-omics profiling studies with tens of thousands of measurements. Latent factor models, which can account for the structural heterogeneity across diverse assays, effectively integrate multi-omics data and reduce dimensionality to a small number of factors that capture correlations and associations among measurements. These factors provide biologically interpretable features for predictive model-ing. However, multi-omics integration and predictive modeling are generally performed independent-ly in sequential steps, leading to suboptimal factor construction. Combining these steps can yield better multi-omics signatures that are more predictive while still being biologically meaningful. RESULTS We developed a supervised variational Bayesian factor model that extracts multi-omics signatures from high-throughput profiling datasets that can span multiple data types. Signature-based multiPle-omics intEgration via lAtent factoRs (SPEAR) adaptively determines factor rank, emphasis on factor structure, data relevance and feature sparsity. The method improves the recon-struction of underlying factors in synthetic examples and prediction accuracy of COVID-19 severity and breast cancer tumor subtypes. AVAILABILITY SPEAR is a publicly available R-package hosted at https://bitbucket.org/kleinstein/SPEAR.
Collapse
|
6
|
Shinde P, Soldevila F, Reyna J, Aoki M, Rasmussen M, Willemsen L, Kojima M, Ha B, Greenbaum JA, Overton JA, Guzman-Orozco H, Nili S, Orfield S, Gygi JP, da Silva Antunes R, Sette A, Grant B, Olsen LR, Konstorum A, Guan L, Ay F, Kleinstein SH, Peters B. A systems vaccinology resource to develop and test computational models of immunity. bioRxiv 2023:2023.08.28.555193. [PMID: 37693565 PMCID: PMC10491180 DOI: 10.1101/2023.08.28.555193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Computational models that predict an individual's response to a vaccine offer the potential for mechanistic insights and personalized vaccination strategies. These models are increasingly derived from systems vaccinology studies that generate immune profiles from human cohorts pre- and post-vaccination. Most of these studies involve relatively small cohorts and profile the response to a single vaccine. The ability to assess the performance of the resulting models would be improved by comparing their performance on independent datasets, as has been done with great success in other areas of biology such as protein structure predictions. To transfer this approach to system vaccinology studies, we established a prototype platform that focuses on the evaluation of Computational Models of Immunity to Pertussis Booster vaccinations (CMI-PB). A community resource, CMI-PB generates experimental data for the explicit purpose of model evaluation, which is performed through a series of annual data releases and associated contests. We here report on our experience with the first such 'dry run' for a contest where the goal was to predict individual immune responses based on pre-vaccination multi-omic profiles. Over 30 models adopted from the literature were tested, but only one was predictive, and was based on age alone. The performance of new models built using CMI-PB training data was much better, but varied significantly based on the choice of pre-vaccination features used and the model building strategy. This suggests that previously published models developed for other vaccines do not generalize well to Pertussis Booster vaccination. Overall, these results reinforced the need for comparative analysis across models and datasets that CMI-PB aims to achieve. We are seeking wider community engagement for our first public prediction contest, which will open in early 2024.
Collapse
Affiliation(s)
- Pramod Shinde
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Ferran Soldevila
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Joaquin Reyna
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, CA, USA
| | - Minori Aoki
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Mikkel Rasmussen
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Lisa Willemsen
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Mari Kojima
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Brendan Ha
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Jason A Greenbaum
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - James A Overton
- Knocean Inc., 107 Quebec Ave. Toronto, Ontario, M6P 2T3, Canada
| | - Hector Guzman-Orozco
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Somayeh Nili
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Shelby Orfield
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Jeremy P. Gygi
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
| | - Ricardo da Silva Antunes
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
| | - Alessandro Sette
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
- Department of Medicine, University of California, San Diego, San Diego, CA, USA
| | - Barry Grant
- Department of Molecular Biology, School of Biological Sciences, University of California San Diego, La Jolla, California, USA
| | - Lars Rønn Olsen
- Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Anna Konstorum
- Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
| | - Leying Guan
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Ferhat Ay
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
- Department of Medicine, University of California, San Diego, San Diego, CA, USA
| | - Steven H. Kleinstein
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Pathology, Yale University School of Medicine, New Haven, CT, USA
| | - Bjoern Peters
- Center for Infectious Disease and Vaccine Research, La Jolla Institute for Immunology, La Jolla, CA, USA
- Department of Medicine, University of California, San Diego, San Diego, CA, USA
| |
Collapse
|
7
|
Gygi JP, Kleinstein SH, Guan L. Predictive overfitting in immunological applications: Pitfalls and solutions. Hum Vaccin Immunother 2023; 19:2251830. [PMID: 37697867 PMCID: PMC10498807 DOI: 10.1080/21645515.2023.2251830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 07/27/2023] [Accepted: 08/21/2023] [Indexed: 09/13/2023] Open
Abstract
Overfitting describes the phenomenon where a highly predictive model on the training data generalizes poorly to future observations. It is a common concern when applying machine learning techniques to contemporary medical applications, such as predicting vaccination response and disease status in infectious disease or cancer studies. This review examines the causes of overfitting and offers strategies to counteract it, focusing on model complexity reduction, reliable model evaluation, and harnessing data diversity. Through discussion of the underlying mathematical models and illustrative examples using both synthetic data and published real datasets, our objective is to equip analysts and bioinformaticians with the knowledge and tools necessary to detect and mitigate overfitting in their research.
Collapse
Affiliation(s)
- Jeremy P. Gygi
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
| | - Steven H. Kleinstein
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Pathology, Yale School of Medicine, New Haven, CT, USA
- Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA
| | - Leying Guan
- Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, USA
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| |
Collapse
|
8
|
Diray-Arce J, Fourati S, Doni Jayavelu N, Patel R, Maguire C, Chang AC, Dandekar R, Qi J, Lee BH, van Zalm P, Schroeder A, Chen E, Konstorum A, Brito A, Gygi JP, Kho A, Chen J, Pawar S, Gonzalez-Reiche AS, Hoch A, Milliren CE, Overton JA, Westendorf K, Cairns CB, Rouphael N, Bosinger SE, Kim-Schulze S, Krammer F, Rosen L, Grubaugh ND, van Bakel H, Wilson M, Rajan J, Steen H, Eckalbar W, Cotsapas C, Langelier CR, Levy O, Altman MC, Maecker H, Montgomery RR, Haddad EK, Sekaly RP, Esserman D, Ozonoff A, Becker PM, Augustine AD, Guan L, Peters B, Kleinstein SH. Multi-omic longitudinal study reveals immune correlates of clinical course among hospitalized COVID-19 patients. Cell Rep Med 2023; 4:101079. [PMID: 37327781 PMCID: PMC10203880 DOI: 10.1016/j.xcrm.2023.101079] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 01/31/2023] [Accepted: 05/16/2023] [Indexed: 06/18/2023]
Abstract
The IMPACC cohort, composed of >1,000 hospitalized COVID-19 participants, contains five illness trajectory groups (TGs) during acute infection (first 28 days), ranging from milder (TG1-3) to more severe disease course (TG4) and death (TG5). Here, we report deep immunophenotyping, profiling of >15,000 longitudinal blood and nasal samples from 540 participants of the IMPACC cohort, using 14 distinct assays. These unbiased analyses identify cellular and molecular signatures present within 72 h of hospital admission that distinguish moderate from severe and fatal COVID-19 disease. Importantly, cellular and molecular states also distinguish participants with more severe disease that recover or stabilize within 28 days from those that progress to fatal outcomes (TG4 vs. TG5). Furthermore, our longitudinal design reveals that these biologic states display distinct temporal patterns associated with clinical outcomes. Characterizing host immune responses in relation to heterogeneity in disease course may inform clinical prognosis and opportunities for intervention.
Collapse
Affiliation(s)
- Joann Diray-Arce
- Clinical and Data Coordinating Center, Boston Children's Hospital, Boston, MA 02115, USA; Precision Vaccines Program, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA.
| | - Slim Fourati
- Emory School of Medicine, Atlanta, GA 30322, USA
| | | | - Ravi Patel
- University of California San Francisco, San Francisco, CA 94115, USA
| | - Cole Maguire
- The University of Texas at Austin, Austin, TX 78712, USA
| | - Ana C Chang
- Clinical and Data Coordinating Center, Boston Children's Hospital, Boston, MA 02115, USA
| | - Ravi Dandekar
- University of California San Francisco, San Francisco, CA 94115, USA
| | - Jingjing Qi
- Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Brian H Lee
- Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Patrick van Zalm
- Precision Vaccines Program, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Andrew Schroeder
- University of California San Francisco, San Francisco, CA 94115, USA
| | - Ernie Chen
- Yale School of Medicine, New Haven, CT 06510, USA
| | | | | | | | - Alvin Kho
- Clinical and Data Coordinating Center, Boston Children's Hospital, Boston, MA 02115, USA
| | - Jing Chen
- Clinical and Data Coordinating Center, Boston Children's Hospital, Boston, MA 02115, USA; Precision Vaccines Program, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | | | | | - Annmarie Hoch
- Clinical and Data Coordinating Center, Boston Children's Hospital, Boston, MA 02115, USA; Precision Vaccines Program, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Carly E Milliren
- Clinical and Data Coordinating Center, Boston Children's Hospital, Boston, MA 02115, USA
| | | | | | - Charles B Cairns
- Drexel University, Tower Health Hospital, Philadelphia, PA 19104, USA
| | | | | | | | - Florian Krammer
- Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Lindsey Rosen
- National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, MD 20814, USA
| | | | - Harm van Bakel
- Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Michael Wilson
- University of California San Francisco, San Francisco, CA 94115, USA
| | - Jayant Rajan
- University of California San Francisco, San Francisco, CA 94115, USA
| | - Hanno Steen
- Precision Vaccines Program, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Walter Eckalbar
- University of California San Francisco, San Francisco, CA 94115, USA
| | - Chris Cotsapas
- Yale School of Medicine, New Haven, CT 06510, USA; Broad Institute of MIT & Harvard, Cambridge, MA 02142, USA
| | | | - Ofer Levy
- Precision Vaccines Program, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of MIT & Harvard, Cambridge, MA 02142, USA
| | - Matthew C Altman
- Benaroya Research Institute, University of Washington, Seattle, WA 98101, USA
| | - Holden Maecker
- Stanford University School of Medicine, Palo Alto, CA 94305, USA
| | | | - Elias K Haddad
- Drexel University, Tower Health Hospital, Philadelphia, PA 19104, USA
| | | | | | - Al Ozonoff
- Clinical and Data Coordinating Center, Boston Children's Hospital, Boston, MA 02115, USA; Precision Vaccines Program, Boston Children's Hospital, Harvard Medical School, Boston, MA 02115, USA; Broad Institute of MIT & Harvard, Cambridge, MA 02142, USA
| | - Patrice M Becker
- National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, MD 20814, USA
| | - Alison D Augustine
- National Institute of Allergy and Infectious Diseases, National Institute of Health, Bethesda, MD 20814, USA
| | - Leying Guan
- Yale School of Public Health, New Haven, CT 06510, USA
| | - Bjoern Peters
- La Jolla Institute for Immunology, La Jolla, CA 92037, USA
| | | |
Collapse
|
9
|
Navarrete-Perea J, Liu X, Rad R, Gygi JP, Gygi SP, Paulo JA. Assessing interference in isobaric tag-based sample multiplexing using an 18-plex interference standard. Proteomics 2021; 22:e2100317. [PMID: 34918453 DOI: 10.1002/pmic.202100317] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 12/09/2021] [Accepted: 12/09/2021] [Indexed: 11/06/2022]
Abstract
Reporter ion interference remains a limitation of isobaric tag-based sample multiplexing. Advances in instrumentation and data acquisition modes, such as the recently developed real-time database search (RTS), can reduce interference. However, interference persists as does the need to benchmark upstream sample preparation and data acquisition strategies. Here, we present an updated Triple yeast KnockOut (TKO) standard as well as corresponding upgrades to the TKO Viewing Tool (TVT2.5, http://tko.hms.harvard.edu/). Specifically, we expand the TKO standard to incorporate the TMTpro18-plex reagents (TKO18). We also construct a variant thereof which has been digested only with LysC (TKO18L). We compare proteome coverage and interference levels of TKO18 and TKO18L data that are acquired under different data acquisition modes and analyzed using TVT2.5. Our data illustrate that RTS reduces interference while improving proteome coverage and suggest that digesting with LysC alone only modestly reduces interference, albeit at the expense of proteome depth. Collectively, the two new TKO standards coupled with the updated TVT represent a convenient and versatile platform for assessing and developing methods to reduce interference in isobaric tag-based experiments. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Jose Navarrete-Perea
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, 02115, USA
| | - Xinyue Liu
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, 02115, USA
| | - Ramin Rad
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, 02115, USA
| | - Jeremy P Gygi
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, 02115, USA
| | - Steven P Gygi
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, 02115, USA
| | - Joao A Paulo
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, 02115, USA
| |
Collapse
|
10
|
Gygi JP, Rad R, Navarrete-Perea J, Younesi S, Gygi SP, Paulo JA. A Triple Knockout Isobaric-Labeling Quality Control Platform with an Integrated Online Database Search. J Am Soc Mass Spectrom 2020; 31:1344-1349. [PMID: 32202424 PMCID: PMC7332369 DOI: 10.1021/jasms.0c00029] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Sample multiplexing using isobaric tagging is a powerful strategy for proteome-wide protein quantification. One major caveat of isobaric tagging is ratio compression that results from the isolation, fragmentation, and quantification of coeluting, near-isobaric peptides, a phenomenon typically referred to as "ion interference". A robust platform to ensure quality control, optimize parameters, and enable comparisons across samples is essential as new instrumentation and analytical methods evolve. Here, we introduce TKO-iQC, an integrated platform consisting of the Triple Knockout (TKO) yeast digest standard and an automated web-based database search and protein profile visualization application. We highlight two new TKO standards based on the TMTpro reagent (TKOpro9 and TKOpro16) as well as an updated TKO Viewing Tool, TVT2.0. TKO-iQC greatly facilitates the comparison of instrument performance with a straightforward and streamlined workflow.
Collapse
Affiliation(s)
- Jeremy P. Gygi
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Ramin Rad
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Jose Navarrete-Perea
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Simon Younesi
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Steven P. Gygi
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Joao A. Paulo
- Corresponding Author: Joao A. Paulo – Department of Cell Biology, Harvard Medical School, Boston, Massachusetts 02115, United States;
| |
Collapse
|
11
|
Gygi JP, Yu Q, Navarrete-Perea J, Rad R, Gygi SP, Paulo JA. Web-Based Search Tool for Visualizing Instrument Performance Using the Triple Knockout (TKO) Proteome Standard. J Proteome Res 2018; 18:687-693. [PMID: 30451507 DOI: 10.1021/acs.jproteome.8b00737] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Multiplexing strategies are at the forefront of mass-spectrometry-based proteomics, with SPS-MS3 methods becoming increasingly commonplace. A known caveat of isobaric multiplexing is interference resulting from coisolated and cofragmented ions that do not originate from the selected precursor of interest. The triple knockout (TKO) standard was designed to benchmark data collection strategies to minimize interference. However, a limitation to its widespread use has been the lack of an automated analysis platform. We present a TKO Visualization Tool (TVT). The TVT viewer allows for automated, web-based, database searching of the TKO standard, returning traditional figures of merit, such as peptide and protein counts, scan-specific ion accumulation times, as well as the TKO-specific metric, the IFI (interference-free index). Moreover, the TVT viewer allows for plotting of two TKO standards to assess protocol optimizations, compare instruments, or measure degradation of instrument performance over time. We showcase the TVT viewer by probing the selection of (1) stationary phase resin, (2) MS2 isolation window width, and (3) number of synchronous precursor selection (SPS) ions for SPS-MS3 analysis. Using the TVT viewer will allow the proteomics community to search and compare TKO results to optimize user-specific data collection workflows.
Collapse
Affiliation(s)
- Jeremy P Gygi
- Department of Cell Biology , Harvard Medical School , Boston , Massachusetts 02115 , United States
| | - Qing Yu
- Department of Cell Biology , Harvard Medical School , Boston , Massachusetts 02115 , United States
| | - Jose Navarrete-Perea
- Department of Cell Biology , Harvard Medical School , Boston , Massachusetts 02115 , United States
| | - Ramin Rad
- Department of Cell Biology , Harvard Medical School , Boston , Massachusetts 02115 , United States
| | - Steven P Gygi
- Department of Cell Biology , Harvard Medical School , Boston , Massachusetts 02115 , United States
| | - Joao A Paulo
- Department of Cell Biology , Harvard Medical School , Boston , Massachusetts 02115 , United States
| |
Collapse
|