1
|
de Toledo MA, de Lima JVS, Salomão R, Leite GGF. Characterizing a low-density neutrophil gene signature in acute and chronic infections and its impact on disease severity. J Leukoc Biol 2025; 117:qiaf027. [PMID: 40037342 DOI: 10.1093/jleuko/qiaf027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2024] [Revised: 01/15/2025] [Accepted: 02/27/2025] [Indexed: 03/06/2025] Open
Abstract
Low-density neutrophils (LDNs) or polymorphonuclear myeloid-derived suppressor cells are involved in the pathogenesis of cancer, autoimmune, and infectious diseases. They are crucial in the host response to invading pathogens, especially during acute illness, and are associated with poor prognosis in many infectious diseases. However, their gene expression profile and contribution to disease outcomes are not well described. We conducted a meta-analysis of gene expression datasets from peripheral blood mononuclear cells (PBMCs), focusing on patients with viral and bacterial infections. We identified a consensus set of 2,798 differentially expressed genes. Among these, 49 genes were commonly found in both the neutrophil degranulation pathway and the granule lumen-specific community. To validate this signature, we evaluated its expression in RNA-seq datasets, finding consistent upregulation of 24 genes in severe infections, 17 of them overlapped with genes overexpressed in CD16int cells. We also investigated the abundance of LDN-related proteins in a PBMC proteomics dataset from a cohort of sepsis and septic shock patients. Out of the 17 genes analyzed, 13 corresponding proteins were identified, 10 of which demonstrated significantly higher abundance in sepsis and septic shock patients compared with healthy controls. In conclusion, our study identified a pattern of 17 upregulated LDN genes, common to PBMC transcriptome and RNA-seq, and upregulated in CD16int, associated with acute infections and severe clinical outcomes, marking the first time these genes have been collectively presented as a potential signature of LDNs in relation to disease severity. Further research with prospective cohorts is needed to validate this LDN signature and explore its clinical implications.
Collapse
Affiliation(s)
- Matheus Aparecido de Toledo
- Division of Infectious Diseases, Department of Medicine, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo 04023-900, Brazil
| | - João Victor Souza de Lima
- Division of Infectious Diseases, Department of Medicine, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo 04023-900, Brazil
| | - Reinaldo Salomão
- Division of Infectious Diseases, Department of Medicine, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo 04023-900, Brazil
| | - Giuseppe G F Leite
- Division of Infectious Diseases, Department of Medicine, Escola Paulista de Medicina, Universidade Federal de São Paulo, São Paulo 04023-900, Brazil
| |
Collapse
|
2
|
Saddic L, Kaneda G, Momenzadeh A, Zilberberg L, Song Y, Mastali M, Kreimer S, Hutton A, Haghani A, Meyer J, Parker S. Single Cell Proteomics Reveals Novel Cell Phenotypes in Marfan Mouse Aneurysm. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.15.638465. [PMID: 40027651 PMCID: PMC11870452 DOI: 10.1101/2025.02.15.638465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Background Single-cell omics technology is a powerful tool in biomedical research. However, single cell proteomics has lagged due to an inability to amplify peptides in a similar fashion to nucleotide strings. Single cell proteomics is important because proteins are the main functional unit in cells, and they often poorly correlate with mRNA quantities. In this paper we describe the first single cell proteomic analysis of complex tissue, comparing aneurysmal and normal mouse aorta from males and females. We also compare and integrate our single cell proteomic profiles with a matching single cell transcriptomics dataset. Methods We compared single cell proteomes between male and female, wild-type and Fbn1 C1041G/+ Marfan mice (N=3 per group). Individual cells from mouse aortic root single cell suspensions were deposited in 384 well plates and subjected to ultra-sensitive nanoflow liquid chromatography-ion mobility-time of flight-mass spectrometry. The data were analyzed with leiden clustering to identify cell types. Statistical analyses were performed to detect differential proteins within cell types and multi-omics analysis integrated single cell proteomics with published single cell RNA-seq. Results We identified all major aortic cell types including 7 distinct smooth muscle cell subtypes. The proportion of these cells varied based on sex and the Fbn1 C1041G/+ genotype. Differentially expressed proteins between male and female in addition to wild-type and Marfan samples uncovered enhanced endothelial to mesenchymal transition patterns in endothelial cells from male Marfan mice. Comparisons between single cell RNA and single cell proteomic profiles showed similarities in major subtypes but not smooth muscle cell subtypes. Multi-omics analysis of these two single cell platforms demonstrated a potential novel role for smooth muscle cell derived angiotensin signaling in the Marfan phenotype. Conclusions Single cell proteomics identified new subpopulations of vascular smooth muscles cells and novel cell type specific protein signatures related to sex differences and aneurysm formation. Abbreviations Next generation sequencing (NGS), Mass spectrometer (MS), Single cell proteomics by Mass Spectrometry (ScOPE-MS), Marfan's syndrome (MFS), Fibrillin 1 (FBN1), Transforming growth factor β (TGFβ), Smooth muscle cell (SMC), Single cell proteomic (scProteomic), Differentially expressed proteins (DEPs), Wild-type (WT), Hanks' balanced salt solution (HBSS), Fetal bovine serum (FBS), Dulbecco's Modified Eagle Medium (DMEM), Data-independent acquisition parallel accumulation-serial fragmentation (DIA-PASEF), Magnetic assisted cell sorted (MACS), Single Cell Analysis in Python (Scanpy), Kyoto Encyclopedia of Genes and Genomes (KEGG), Principal component analysis (PCA), Uniform manifold projection (UMAP), Single cell transcriptomic (scTranscriptomic), Smoothelin (Smtn), Transgelin (Tagln), Myosin heavy chain 11 (Myh11), Platelet endothelial cell adhesion molecule 1 (Pecam1), Dipeptidase 1 (Dpep1), Uncoupling protein 1 (Ucp1), Low-density lipoprotein receptor-related protein (Lrp1), DNA ligase 3 (Lig3), Capsaicin channel transient receptor potential vanilloid 1 (Trpv1), Endothelial to mesenchymal transition (endMT), Intercellular adhesion molecule 1 (Icam1), Intercellular adhesion molecule 2 (Icam2), Endothelial cell-selective adhesion molecule (Esam), Calponin 1 (Cnn1), Vimentin (Vim), Zinc finger E-box-binding homeobox 1 (Zeb1), Snail family transcriptional repressor 1 (Snai1), Tropomyosin alpha-4 chain (Tpm4), Angiotensin converting enzyme (Ace).
Collapse
|
3
|
Sweatt AJ, Griffiths CD, Groves SM, Paudel BB, Wang L, Kashatus DF, Janes KA. Proteome-wide copy-number estimation from transcriptomics. Mol Syst Biol 2024; 20:1230-1256. [PMID: 39333715 PMCID: PMC11535397 DOI: 10.1038/s44320-024-00064-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 08/22/2024] [Accepted: 09/02/2024] [Indexed: 09/29/2024] Open
Abstract
Protein copy numbers constrain systems-level properties of regulatory networks, but proportional proteomic data remain scarce compared to RNA-seq. We related mRNA to protein statistically using best-available data from quantitative proteomics and transcriptomics for 4366 genes in 369 cell lines. The approach starts with a protein's median copy number and hierarchically appends mRNA-protein and mRNA-mRNA dependencies to define an optimal gene-specific model linking mRNAs to protein. For dozens of cell lines and primary samples, these protein inferences from mRNA outmatch stringent null models, a count-based protein-abundance repository, empirical mRNA-to-protein ratios, and a proteogenomic DREAM challenge winner. The optimal mRNA-to-protein relationships capture biological processes along with hundreds of known protein-protein complexes, suggesting mechanistic relationships. We use the method to identify a viral-receptor abundance threshold for coxsackievirus B3 susceptibility from 1489 systems-biology infection models parameterized by protein inference. When applied to 796 RNA-seq profiles of breast cancer, inferred copy-number estimates collectively re-classify 26-29% of luminal tumors. By adopting a gene-centered perspective of mRNA-protein covariation across different biological contexts, we achieve accuracies comparable to the technical reproducibility of contemporary proteomics.
Collapse
Affiliation(s)
- Andrew J Sweatt
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908, USA
| | - Cameron D Griffiths
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908, USA
| | - Sarah M Groves
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908, USA
| | - B Bishal Paudel
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908, USA
| | - Lixin Wang
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908, USA
| | - David F Kashatus
- Department of Microbiology, Immunology & Cancer Biology, University of Virginia, Charlottesville, VA, 22908, USA
| | - Kevin A Janes
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, 22908, USA.
- Department of Biochemistry & Molecular Genetics, University of Virginia, Charlottesville, VA, 22908, USA.
| |
Collapse
|
4
|
Sun D, Macedonia C, Chen Z, Chandrasekaran S, Najarian K, Zhou S, Cernak T, Ellingrod VL, Jagadish HV, Marini B, Pai M, Violi A, Rech JC, Wang S, Li Y, Athey B, Omenn GS. Can Machine Learning Overcome the 95% Failure Rate and Reality that Only 30% of Approved Cancer Drugs Meaningfully Extend Patient Survival? J Med Chem 2024; 67:16035-16055. [PMID: 39253942 DOI: 10.1021/acs.jmedchem.4c01684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Despite implementing hundreds of strategies, cancer drug development suffers from a 95% failure rate over 30 years, with only 30% of approved cancer drugs extending patient survival beyond 2.5 months. Adding more criteria without eliminating nonessential ones is impractical and may fall into the "survivorship bias" trap. Machine learning (ML) models may enhance efficiency by saving time and cost. Yet, they may not improve success rate without identifying the root causes of failure. We propose a "STAR-guided ML system" (structure-tissue/cell selectivity-activity relationship) to enhance success rate and efficiency by addressing three overlooked interdependent factors: potency/specificity to the on/off-targets determining efficacy in tumors at clinical doses, on/off-target-driven tissue/cell selectivity influencing adverse effects in the normal organs at clinical doses, and optimal clinical doses balancing efficacy/safety as determined by potency/specificity and tissue/cell selectivity. STAR-guided ML models can directly predict clinical dose/efficacy/safety from five features to design/select the best drugs, enhancing success and efficiency of cancer drug development.
Collapse
Affiliation(s)
| | | | - Zhigang Chen
- LabBotics.ai, Palo Alto, California 94303, United States
| | | | | | - Simon Zhou
- Aurinia Pharmaceuticals Inc., Rockville, Maryland 20850, United States
| | | | | | | | | | | | | | | | | | - Yan Li
- Translational Medicine and Clinical Pharmacology, Bristol Myers Squibb, Summit, New Jersey 07901, United States
| | | | | |
Collapse
|
5
|
Han Y, Wennersten SA, Pandi BP, Ng DCM, Lau E, Lam MPY. A Ratiometric Catalog of Protein Isoform Shifts in the Cardiac Fetal Gene Program. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.09.588716. [PMID: 38645170 PMCID: PMC11030362 DOI: 10.1101/2024.04.09.588716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
The fetal genetic program orchestrates cardiac development and the re-expression of fetal genes is thought to underlie cardiac disease and adaptation. Here, a proteomics ratio test using mass spectrometry is applied to find protein isoforms with statistically significant usage differences in the fetal vs. postnatal mouse heart. Changes in isoform usage ratios are pervasive at the protein level, with 104 significant events observed, including 88 paralog-derived isoform switching events and 16 splicing-derived isoform switching events between fetal and postnatal hearts. The ratiometric proteomic comparisons rediscovered hallmark fetal gene signatures including a postnatal switch from fetal β (MYH7) toward ɑ (MYH6) myosin heavy chains and from slow skeletal muscle (TNNI1) toward cardiac (TNNI3) troponin I. Altered usages in metabolic proteins are prominent, including a platelet to muscle phosphofructokinase (PFKP - PFKM), enolase 1 to 3 (ENO1 - ENO3), and alternative splicing of pyruvate kinase M2 toward M1 (PKM2 - PKM1) isoforms in glycolysis. The data also revealed a parallel change in mitochondrial proteins in cardiac development, suggesting the shift toward aerobic respiration involves also a remodeling of the mitochondrial protein isoform proportion. Finally, a number of glycolytic protein isoforms revert toward their fetal forms in adult hearts under pathological cardiac hypertrophy, suggesting their functional roles in adaptive or maladaptive response, but this reversal is partial. In summary, this work presents a catalog of ratiometric protein markers of the fetal genetic program of the mouse heart, including previously unreported splice isoform markers.
Collapse
Affiliation(s)
- Yu Han
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Sara A Wennersten
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Boomathi P Pandi
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Dominic C M Ng
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Edward Lau
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Maggie P Y Lam
- Department of Medicine, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Consortium for Fibrosis Research and Translation, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
6
|
Joshi SK, Piehowski P, Liu T, Gosline SJC, McDermott JE, Druker BJ, Traer E, Tyner JW, Agarwal A, Tognon CE, Rodland KD. Mass Spectrometry-Based Proteogenomics: New Therapeutic Opportunities for Precision Medicine. Annu Rev Pharmacol Toxicol 2024; 64:455-479. [PMID: 37738504 PMCID: PMC10950354 DOI: 10.1146/annurev-pharmtox-022723-113921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/24/2023]
Abstract
Proteogenomics refers to the integration of comprehensive genomic, transcriptomic, and proteomic measurements from the same samples with the goal of fully understanding the regulatory processes converting genotypes to phenotypes, often with an emphasis on gaining a deeper understanding of disease processes. Although specific genetic mutations have long been known to drive the development of multiple cancers, gene mutations alone do not always predict prognosis or response to targeted therapy. The benefit of proteogenomics research is that information obtained from proteins and their corresponding pathways provides insight into therapeutic targets that can complement genomic information by providing an additional dimension regarding the underlying mechanisms and pathophysiology of tumors. This review describes the novel insights into tumor biology and drug resistance derived from proteogenomic analysis while highlighting the clinical potential of proteogenomic observations and advances in technique and analysis tools.
Collapse
Affiliation(s)
- Sunil K Joshi
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, California, USA
| | - Paul Piehowski
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Tao Liu
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Sara J C Gosline
- Pacific Northwest National Laboratory, Richland, Washington, USA
| | - Jason E McDermott
- Pacific Northwest National Laboratory, Richland, Washington, USA
- Department of Molecular Microbiology and Immunology, Oregon Health & Science University, Portland, Oregon, USA
| | - Brian J Druker
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - Elie Traer
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - Jeffrey W Tyner
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
- Department of Molecular Microbiology and Immunology, Oregon Health & Science University, Portland, Oregon, USA
| | - Anupriya Agarwal
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
- Department of Molecular Microbiology and Immunology, Oregon Health & Science University, Portland, Oregon, USA
| | - Cristina E Tognon
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Division of Hematology and Medical Oncology, Department of Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - Karin D Rodland
- Knight Cancer Institute, Oregon Health & Science University, Portland, Oregon, USA;
- Pacific Northwest National Laboratory, Richland, Washington, USA
| |
Collapse
|
7
|
Chowdhury S, Kennedy JJ, Ivey RG, Murillo OD, Hosseini N, Song X, Petralia F, Calinawan A, Savage SR, Berry AB, Reva B, Ozbek U, Krek A, Ma W, da Veiga Leprevost F, Ji J, Yoo S, Lin C, Voytovich UJ, Huang Y, Lee SH, Bergan L, Lorentzen TD, Mesri M, Rodriguez H, Hoofnagle AN, Herbert ZT, Nesvizhskii AI, Zhang B, Whiteaker JR, Fenyo D, McKerrow W, Wang J, Schürer SC, Stathias V, Chen XS, Barcellos-Hoff MH, Starr TK, Winterhoff BJ, Nelson AC, Mok SC, Kaufmann SH, Drescher C, Cieslik M, Wang P, Birrer MJ, Paulovich AG. Proteogenomic analysis of chemo-refractory high-grade serous ovarian cancer. Cell 2023; 186:3476-3498.e35. [PMID: 37541199 PMCID: PMC10414761 DOI: 10.1016/j.cell.2023.07.004] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 03/23/2023] [Accepted: 07/05/2023] [Indexed: 08/06/2023]
Abstract
To improve the understanding of chemo-refractory high-grade serous ovarian cancers (HGSOCs), we characterized the proteogenomic landscape of 242 (refractory and sensitive) HGSOCs, representing one discovery and two validation cohorts across two biospecimen types (formalin-fixed paraffin-embedded and frozen). We identified a 64-protein signature that predicts with high specificity a subset of HGSOCs refractory to initial platinum-based therapy and is validated in two independent patient cohorts. We detected significant association between lack of Ch17 loss of heterozygosity (LOH) and chemo-refractoriness. Based on pathway protein expression, we identified 5 clusters of HGSOC, which validated across two independent patient cohorts and patient-derived xenograft (PDX) models. These clusters may represent different mechanisms of refractoriness and implicate putative therapeutic vulnerabilities.
Collapse
Affiliation(s)
- Shrabanti Chowdhury
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jacob J Kennedy
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Richard G Ivey
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Oscar D Murillo
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Noshad Hosseini
- Department of Computational Medicine and Bioinformatics, Michigan Center for Translational Pathology, University of Michigan School of Medicine, Ann Arbor, MI 48109, USA
| | - Xiaoyu Song
- Tisch Cancer Institute, Department of Population Health Science and Policy, Institute for Health Care Delivery Science, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Francesca Petralia
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Anna Calinawan
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sara R Savage
- Lester and Sue Smith Breast Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Boris Reva
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Umut Ozbek
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Azra Krek
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Weiping Ma
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | - Jiayi Ji
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | | | - Chenwei Lin
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Uliana J Voytovich
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Yajue Huang
- Department of Laboratory Medicine & Pathology, Mayo Clinic, Rochester, MN 55905, USA
| | - Sun-Hee Lee
- Departments of Oncology and Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Lindsay Bergan
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Travis D Lorentzen
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Mehdi Mesri
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Rockville, MD 20850, USA
| | - Henry Rodriguez
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Rockville, MD 20850, USA
| | - Andrew N Hoofnagle
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA 98195, USA
| | - Zachary T Herbert
- Molecular Biology Core Facilities, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Alexey I Nesvizhskii
- Department of Pathology, Department of Computational Medicine and Bioinformatics, University of Michigan School of Medicine, Ann Arbor, MI 48109, USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jeffrey R Whiteaker
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - David Fenyo
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Wilson McKerrow
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Joshua Wang
- Institute for Systems Genetics, NYU School of Medicine, New York, NY 10016, USA
| | - Stephan C Schürer
- Department of Molecular and Cellular Pharmacology, Sylvester Comprehensive Cancer Center, Miller School of Medicine, and Institute for Data Science & Computing, University of Miami, Miami, FL 33136, USA
| | - Vasileios Stathias
- Department of Molecular and Cellular Pharmacology, Sylvester Comprehensive Cancer Center, Miller School of Medicine, and Institute for Data Science & Computing, University of Miami, Miami, FL 33136, USA
| | - X Steven Chen
- Department of Public Health Sciences, Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL 33136, USA
| | - Mary Helen Barcellos-Hoff
- Helen Diller Family Comprehensive Cancer Center, Department of Radiation Oncology, University of California, San Francisco, San Francisco, CA 94115, USA
| | - Timothy K Starr
- Department of Obstetrics, Gynecology and Women's Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Boris J Winterhoff
- Department of Obstetrics, Gynecology and Women's Health, University of Minnesota, Minneapolis, MN 55455, USA
| | - Andrew C Nelson
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN 55455, USA
| | - Samuel C Mok
- Department of Gynecologic Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Scott H Kaufmann
- Departments of Oncology and Molecular Pharmacology & Experimental Therapeutics, Mayo Clinic, Rochester, MN 55905, USA
| | - Charles Drescher
- Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Marcin Cieslik
- Department of Pathology, Department of Computational Medicine and Bioinformatics, Michigan Center for Translational Pathology, University of Michigan School of Medicine, Ann Arbor, MI 48109, USA.
| | - Pei Wang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| | - Michael J Birrer
- Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA.
| | - Amanda G Paulovich
- Translational Science and Therapeutics Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA.
| |
Collapse
|
8
|
Oberdoerffer P, Miller KM. Histone H2A variants: Diversifying chromatin to ensure genome integrity. Semin Cell Dev Biol 2023; 135:59-72. [PMID: 35331626 PMCID: PMC9489817 DOI: 10.1016/j.semcdb.2022.03.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 03/07/2022] [Accepted: 03/08/2022] [Indexed: 12/12/2022]
Abstract
Histone variants represent chromatin components that diversify the structure and function of the genome. The variants of H2A, primarily H2A.X, H2A.Z and macroH2A, are well-established participants in DNA damage response (DDR) pathways, which function to protect the integrity of the genome. Through their deposition, post-translational modifications and unique protein interaction networks, these variants guard DNA from endogenous threats including replication stress and genome fragility as well as from DNA lesions inflicted by exogenous sources. A growing body of work is now providing a clearer picture on the involvement and mechanistic basis of H2A variant contribution to genome integrity. Beyond their well-documented role in gene regulation, we review here how histone H2A variants promote genome stability and how alterations in these pathways contribute to human diseases including cancer.
Collapse
Affiliation(s)
- Philipp Oberdoerffer
- Department of Radiation Oncology and Molecular Radiation Sciences, Johns Hopkins University School of Medicine, Baltimore, MD 21287 USA.
| | - Kyle M Miller
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA; Livestrong Cancer Institutes, Dell Medical School, The University of Texas at Austin, Austin, TX 78712, USA.
| |
Collapse
|
9
|
Srivastava H, Lippincott MJ, Currie J, Canfield R, Lam MPY, Lau E. Protein prediction models support widespread post-transcriptional regulation of protein abundance by interacting partners. PLoS Comput Biol 2022; 18:e1010702. [PMID: 36356032 PMCID: PMC9681107 DOI: 10.1371/journal.pcbi.1010702] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 11/22/2022] [Accepted: 11/01/2022] [Indexed: 11/12/2022] Open
Abstract
Protein and mRNA levels correlate only moderately. The availability of proteogenomics data sets with protein and transcript measurements from matching samples is providing new opportunities to assess the degree to which protein levels in a system can be predicted from mRNA information. Here we examined the contributions of input features in protein abundance prediction models. Using large proteogenomics data from 8 cancer types within the Clinical Proteomic Tumor Analysis Consortium (CPTAC) data set, we trained models to predict the abundance of over 13,000 proteins using matching transcriptome data from up to 958 tumor or normal adjacent tissue samples each, and compared predictive performances across algorithms, data set sizes, and input features. Over one-third of proteins (4,648) showed relatively poor predictability (elastic net r ≤ 0.3) from their cognate transcripts. Moreover, we found widespread occurrences where the abundance of a protein is considerably less well explained by its own cognate transcript level than that of one or more trans locus transcripts. The incorporation of additional trans-locus transcript abundance data as input features increasingly improved the ability to predict sample protein abundance. Transcripts that contribute to non-cognate protein abundance primarily involve those encoding known or predicted interaction partners of the protein of interest, including not only large multi-protein complexes as previously shown, but also small stable complexes in the proteome with only one or few stable interacting partners. Network analysis further shows a complex proteome-wide interdependency of protein abundance on the transcript levels of multiple interacting partners. The predictive model analysis here therefore supports that protein-protein interaction including in small protein complexes exert post-transcriptional influence on proteome compositions more broadly than previously recognized. Moreover, the results suggest mRNA and protein co-expression analysis may have utility for finding gene interactions and predicting expression changes in biological systems.
Collapse
Affiliation(s)
- Himangi Srivastava
- Department of Medicine/Cardiology, University of Colorado School of Medicine, Aurora, Colorado, United States of America
- Consortium for Fibrosis Research and Translation, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Michael J. Lippincott
- Department of Medicine/Cardiology, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Jordan Currie
- Department of Medicine/Cardiology, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Robert Canfield
- Department of Medicine/Cardiology, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Maggie P. Y. Lam
- Department of Medicine/Cardiology, University of Colorado School of Medicine, Aurora, Colorado, United States of America
- Consortium for Fibrosis Research and Translation, University of Colorado School of Medicine, Aurora, Colorado, United States of America
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Edward Lau
- Department of Medicine/Cardiology, University of Colorado School of Medicine, Aurora, Colorado, United States of America
- Consortium for Fibrosis Research and Translation, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| |
Collapse
|
10
|
Upadhya SR, Ryan CJ. Experimental reproducibility limits the correlation between mRNA and protein abundances in tumor proteomic profiles. CELL REPORTS METHODS 2022; 2:100288. [PMID: 36160043 PMCID: PMC9499981 DOI: 10.1016/j.crmeth.2022.100288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 07/14/2022] [Accepted: 08/16/2022] [Indexed: 11/21/2022]
Abstract
Large-scale studies of human proteomes have revealed only a moderate correlation between mRNA and protein abundances. It is unclear to what extent this moderate correlation reflects post-transcriptional regulation and to what extent it reflects measurement error. Here, by analyzing replicate profiles of tumors and cell lines, we show that there is considerable variation in the reproducibility of measurements of transcripts and proteins from individual genes. Proteins with more reproducible measurements tend to have a higher mRNA-protein correlation, suggesting that measurement reproducibility accounts for a substantial fraction of the unexplained variation between mRNA and protein abundances. The reproducibility of individual proteins is somewhat consistent across studies, and we exploit this to develop an aggregate reproducibility score that explains a substantial amount of the variation in mRNA-protein correlations across multiple studies. Finally, we show that pathways previously reported to have a higher-than-average mRNA-protein correlation may simply contain members that can be more reproducibly quantified.
Collapse
Affiliation(s)
- Swathi Ramachandra Upadhya
- School of Computer Science, University College Dublin, Dublin, Ireland
- Systems Biology Ireland, University College Dublin, Dublin, Ireland
| | - Colm J. Ryan
- School of Computer Science, University College Dublin, Dublin, Ireland
- Systems Biology Ireland, University College Dublin, Dublin, Ireland
| |
Collapse
|
11
|
Suomi T, Elo LL. Statistical and machine learning methods to study human CD4+ T cell proteome profiles. Immunol Lett 2022; 245:8-17. [DOI: 10.1016/j.imlet.2022.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/11/2022] [Accepted: 03/15/2022] [Indexed: 11/05/2022]
|
12
|
Jassinskaja M, Hansson J. The Opportunity of Proteomics to Advance the Understanding of Intra- and Extracellular Regulation of Malignant Hematopoiesis. Front Cell Dev Biol 2022; 10:824098. [PMID: 35350382 PMCID: PMC8957922 DOI: 10.3389/fcell.2022.824098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 02/22/2022] [Indexed: 11/13/2022] Open
Abstract
Fetal and adult hematopoiesis are regulated by largely distinct sets of cell-intrinsic gene regulatory networks as well as extracellular cues in their respective microenvironment. These ontogeny-specific programs drive hematopoietic stem and progenitor cells (HSPCs) in fetus and adult to divergent susceptibility to initiation and progression of hematological malignancies, such as leukemia. Elucidating how leukemogenic hits disturb the intra- and extracellular programs in HSPCs along ontogeny will provide a better understanding of the causes for age-associated differences in malignant hematopoiesis and facilitate the improvement of strategies for prevention and treatment of pediatric and adult acute leukemia. Here, we review current knowledge of the intrinsic and extrinsic programs regulating normal and malignant hematopoiesis, with a particular focus on the differences between infant and adult acute leukemia. We discuss the recent advances in mass spectrometry-based proteomics and its opportunity for resolving the interplay of cell-intrinsic and niche-associated factors in regulating malignant hematopoiesis.
Collapse
Affiliation(s)
- Maria Jassinskaja
- Lund Stem Cell Center, Division of Molecular Hematology, Lund University, Lund, Sweden.,York Biomedical Research Institute, Department of Biology, University of York, York, United Kingdom
| | - Jenny Hansson
- Lund Stem Cell Center, Division of Molecular Hematology, Lund University, Lund, Sweden
| |
Collapse
|
13
|
Xu W, He H, Guo Z, Li W. Evaluation of machine learning models on protein level inference from prioritized RNA features. Brief Bioinform 2022; 23:6555405. [PMID: 35352096 DOI: 10.1093/bib/bbac091] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 02/16/2022] [Accepted: 02/23/2022] [Indexed: 11/12/2022] Open
Abstract
The parallel measurement of transcriptome and proteome revealed unmatched profiles. Since proteomic analysis is more expensive and challenging than transcriptomic analysis, the question of how to use messenger RNA (mRNA) expression data to predict protein level is extremely important. Here, we comprehensively evaluated 13 machine learning models on inferring protein expression levels using RNA expression profile. A total of 20 proteogenomic datasets from three mainstream proteomic platforms with >2500 samples of 13 human tissues were collected for model evaluation. Our results highlighted that the appropriate feature selection methods combined with classical machine learning models could achieve excellent predictive performance. The voting ensemble model outperformed other candidate models across datasets. Adding the mRNA proxy model to the regression model further improved the prediction performance. The dataset and gene characteristics could affect the prediction performance. Finally, we applied the model to the brain transcriptome of cerebral cortex regions to infer the protein profile for better understanding the functional characteristics of the brain regions. This benchmarking work not only provides useful hints on the inherent correlation between transcriptome and proteome, but also has practical value of the transcriptome-based prediction of protein expression levels.
Collapse
Affiliation(s)
- Wenjian Xu
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| | - Haochen He
- Department of Radiation Protection and Health Physics, Beijing Institute of Radiation Medicine, Beijing 100850, China
| | - Zhengguang Guo
- Core Facility of Instruments, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, 5 Dong Dan San Tiao, Beijing 100005, China
| | - Wei Li
- Beijing Key Laboratory for Genetics of Birth Defects, Beijing Pediatric Research Institute; MOE Key Laboratory of Major Diseases in Children; Rare Disease Center, Beijing Children's Hospital, Capital Medical University, National Center for Children's Health, Beijing 100045, China
| |
Collapse
|
14
|
Han Y, Li LZ, Kastury NL, Thomas CT, Lam MPY, Lau E. Transcriptome features of striated muscle aging and predictability of protein level changes. Mol Omics 2021; 17:796-808. [PMID: 34328155 PMCID: PMC8511094 DOI: 10.1039/d1mo00178g] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
We performed total RNA sequencing and multi-omics analysis comparing skeletal muscle and cardiac muscle in young adult (4 months) vs. early aging (20 months) mice to examine the molecular mechanisms of striated muscle aging. We observed that aging cardiac and skeletal muscles both invoke transcriptomic changes in innate immune system and mitochondria pathways but diverge in extracellular matrix processes. On an individual gene level, we identified 611 age-associated signatures in skeletal and cardiac muscles, including a number of myokine and cardiokine encoding genes. Because RNA and protein levels correlate only partially, we reason that differentially expressed transcripts that accurately reflect their protein counterparts will be more valuable proxies for proteomic changes and by extension physiological states. We applied a computational data analysis workflow to estimate which transcriptomic changes are more likely relevant to protein-level regulation using large proteogenomics data sets. We estimate about 48% of the aging-associated transcripts predict protein levels well (r ≥ 0.5). In parallel, a comparison of the identified aging-regulated genes with public human transcriptomics data showed that only 35-45% of the identified genes show an age-dependent expression in corresponding human tissues. Thus, integrating both RNA-protein correlation and human conservation across data sources, we nominate 134 prioritized aging striated muscle signatures that are predicted to correlate strongly with protein levels and that show age-dependent expression in humans. The results here reveal new details into how aging reshapes gene expression in striated muscles at the transcript and protein levels.
Collapse
Affiliation(s)
- Yu Han
- Department of Medicine, Consortium for Fibrosis Research & Translation, University of Colorado School of Medicine, Aurora, CO 80045, USA.
| | - Lauren Z Li
- Department of Medicine, Consortium for Fibrosis Research & Translation, University of Colorado School of Medicine, Aurora, CO 80045, USA.
| | - Nikhitha L Kastury
- Department of Medicine, Consortium for Fibrosis Research & Translation, University of Colorado School of Medicine, Aurora, CO 80045, USA.
| | - Cody T Thomas
- Department of Medicine, Consortium for Fibrosis Research & Translation, University of Colorado School of Medicine, Aurora, CO 80045, USA.
| | - Maggie P Y Lam
- Department of Medicine, Consortium for Fibrosis Research & Translation, University of Colorado School of Medicine, Aurora, CO 80045, USA.
- Department of Biochemistry and Molecular Genetics, Consortium for Fibrosis Research & Translation, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Edward Lau
- Department of Medicine, Consortium for Fibrosis Research & Translation, University of Colorado School of Medicine, Aurora, CO 80045, USA.
| |
Collapse
|
15
|
Mann M, Kumar C, Zeng WF, Strauss MT. Artificial intelligence for proteomics and biomarker discovery. Cell Syst 2021; 12:759-770. [PMID: 34411543 DOI: 10.1016/j.cels.2021.06.006] [Citation(s) in RCA: 150] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/07/2021] [Accepted: 06/28/2021] [Indexed: 12/14/2022]
Abstract
There is an avalanche of biomedical data generation and a parallel expansion in computational capabilities to analyze and make sense of these data. Starting with genome sequencing and widely employed deep sequencing technologies, these trends have now taken hold in all omics disciplines and increasingly call for multi-omics integration as well as data interpretation by artificial intelligence technologies. Here, we focus on mass spectrometry (MS)-based proteomics and describe how machine learning and, in particular, deep learning now predicts experimental peptide measurements from amino acid sequences alone. This will dramatically improve the quality and reliability of analytical workflows because experimental results should agree with predictions in a multi-dimensional data landscape. Machine learning has also become central to biomarker discovery from proteomics data, which now starts to outperform existing best-in-class assays. Finally, we discuss model transparency and explainability and data privacy that are required to deploy MS-based biomarkers in clinical settings.
Collapse
Affiliation(s)
- Matthias Mann
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | - Chanchal Kumar
- Translational Science & Experimental Medicine, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden.
| | - Wen-Feng Zeng
- Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany.
| | | |
Collapse
|
16
|
Anene CA, Khan F, Bewicke-Copley F, Maniati E, Wang J. ACSNI: An unsupervised machine-learning tool for prediction of tissue-specific pathway components using gene expression profiles. PATTERNS (NEW YORK, N.Y.) 2021; 2:100270. [PMID: 34179848 PMCID: PMC8212143 DOI: 10.1016/j.patter.2021.100270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 03/10/2021] [Accepted: 04/28/2021] [Indexed: 11/01/2022]
Abstract
Determining the tissue- and disease-specific circuit of biological pathways remains a fundamental goal of molecular biology. Many components of these biological pathways still remain unknown, hindering the full and accurate characterization of biological processes of interest. Here we describe ACSNI, an algorithm that combines prior knowledge of biological processes with a deep neural network to effectively decompose gene expression profiles (GEPs) into multi-variable pathway activities and identify unknown pathway components. Experiments on public GEP data show that ACSNI predicts cogent components of mTOR, ATF2, and HOTAIRM1 signaling that recapitulate regulatory information from genetic perturbation and transcription factor binding datasets. Our framework provides a fast and easy-to-use method to identify components of signaling pathways as a tool for molecular mechanism discovery and to prioritize genes for designing future targeted experiments (https://github.com/caanene1/ACSNI).
Collapse
Affiliation(s)
- Chinedu Anthony Anene
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Faraz Khan
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Findlay Bewicke-Copley
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Eleni Maniati
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| | - Jun Wang
- Centre for Cancer Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, UK
| |
Collapse
|
17
|
A primer on applying AI synergistically with domain expertise to oncology. Biochim Biophys Acta Rev Cancer 2021; 1876:188548. [PMID: 33901609 DOI: 10.1016/j.bbcan.2021.188548] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 04/13/2021] [Accepted: 04/15/2021] [Indexed: 12/24/2022]
Abstract
BACKGROUND The concurrent growth of large-scale oncology data alongside the computational methods with which to analyze and model it has created a promising environment for revolutionizing cancer diagnosis, treatment, prevention, and drug discovery. Computational methods applied to large datasets have accelerated the drug discovery process by reducing bottlenecks and widening the search space beyond what is experimentally tractable. As the research community gains understanding of the myriad genetic underpinnings of cancer via sequencing, imaging, screens, and more that are ingested, transformed, and modeled by top open-source machine learning and artificial intelligence tools readily available, the next big drug candidate might seem merely an "Enter" key away. Of course, the reality is more convoluted, but still promising. SCOPE OF REVIEW We present methods to approach the process of building an AI model, with strong emphasis on the aspects of model development we believe to be crucial to success but that are not commonly discussed: diligence in posing questions, identifying suitable datasets and curating them, and collaborating closely with biology and oncology experts while designing and evaluating the model. Digital pathology, Electronic Health Records, and other data types outside of high-throughput molecular data are reviewed well by others and outside of the scope of this review. This review emphasizes the importance of considering the limitations of the datasets, computational methods, and our minds when designing AI models. For example, datasets can be biased towards areas of research interest, funding, and particular patient populations. Neural networks may learn representations and correlations within the data that are grounded not in biological phenomena, but statistical anomalies erroneously extracted from the training data. Researchers may mis-interpret or over-interpret the output, or design and evaluate the training process such that the resultant model generalizes poorly. Fortunately, awareness of the strengths and limitations of applying data analytics and AI to drug discovery enables us to leverage them carefully and insightfully while maximizing their utility. These applications when performed in close collaboration with domain experts, together with continuous critical evaluation, generation of new data to minimize known blind spots as they are found, and rigorous experimental validation, increases the success rate of the study. We will discuss applications including AI-assisted target identification, drug repurposing, patient stratification, and gene prioritization. MAJOR CONCLUSIONS Data analytics and AI have demonstrated capabilities to revolutionize cancer research, prevention, and treatment by maximizing our understanding and use of the expanding panoply of experimental data. However, to separate promise from true utility, computational tools must be carefully designed, critically evaluated, and constantly improved. Once that is achieved, a human-computer hybrid discovery process will outperform one driven by each alone. GENERAL SIGNIFICANCE This review highlights the challenges and promise of synergizing predictive AI models with human expertise towards greater understanding of cancer.
Collapse
|
18
|
Szalai B, Saez-Rodriguez J. Why do pathway methods work better than they should? FEBS Lett 2020; 594:4189-4200. [PMID: 33270910 DOI: 10.1002/1873-3468.14011] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 11/14/2020] [Accepted: 11/19/2020] [Indexed: 12/28/2022]
Abstract
Pathway analysis methods are frequently applied to cancer gene expression data to identify dysregulated pathways. These methods often infer pathway activity based on the expression of genes belonging to a given pathway, even though the proteins ultimately determine the activity of a given pathway. Furthermore, the association between gene expression levels and protein activities is not well-characterized. Here, we posit that pathway-based methods are effective not because of the correlation between expression and activity of members of a given pathway, but because pathway gene sets overlap with the genes regulated by transcription factors (TFs). Thus, pathway-based methods do not inform about the activity of the pathway of interest but rather reflect changes in TF activities.
Collapse
Affiliation(s)
- Bence Szalai
- Department of Physiology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Julio Saez-Rodriguez
- Institute of Computational Biomedicine, Faculty of Medicine, Heidelberg University, Germany.,Joint Research Centre for Computational Biomedicine (JRC-COMBINE), Faculty of Medicine, RWTH Aachen University, Germany
| |
Collapse
|
19
|
Barzine MP, Freivalds K, Wright JC, Opmanis M, Rituma D, Ghavidel FZ, Jarnuczak AF, Celms E, Čerāns K, Jonassen I, Lace L, Antonio Vizcaíno J, Choudhary JS, Brazma A, Viksna J. Using Deep Learning to Extrapolate Protein Expression Measurements. Proteomics 2020; 20:e2000009. [PMID: 32937025 PMCID: PMC7757209 DOI: 10.1002/pmic.202000009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 08/27/2020] [Indexed: 01/23/2023]
Abstract
Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, including human cell lines and human and mouse tissues. This method predicts the protein expression values with average R 2 scores between 0.46 and 0.54, which is significantly better than predictions based on correlations using the RNA expression data alone. Moreover, it is demonstrated that the derived models can be "transferred" across experiments and species. For instance, the model derived from human tissues gave a R 2 = 0.51 when applied to mouse tissue data. It is concluded that protein abundances generated in label-free MS experiments can be computationally predicted using functional annotated attributes and can be used to highlight aberrant protein abundance values.
Collapse
Affiliation(s)
- Mitra Parissa Barzine
- European Molecular Biology LaboratoryEuropean Bioinformatics InstituteEMBL‐EBIWellcome Trust Genome CampusHinxtonCB10 1SDUK
| | - Karlis Freivalds
- Institute of Mathematics and Computer ScienceUniversity of LatviaRigaLV1459Latvia
- Faculty of ComputingUniversity of LatviaRigaLV1586Latvia
| | | | - Mārtiņš Opmanis
- Institute of Mathematics and Computer ScienceUniversity of LatviaRigaLV1459Latvia
| | - Darta Rituma
- Institute of Mathematics and Computer ScienceUniversity of LatviaRigaLV1459Latvia
- Faculty of ComputingUniversity of LatviaRigaLV1586Latvia
| | | | - Andrew F. Jarnuczak
- European Molecular Biology LaboratoryEuropean Bioinformatics InstituteEMBL‐EBIWellcome Trust Genome CampusHinxtonCB10 1SDUK
| | - Edgars Celms
- Institute of Mathematics and Computer ScienceUniversity of LatviaRigaLV1459Latvia
- Faculty of ComputingUniversity of LatviaRigaLV1586Latvia
| | - Kārlis Čerāns
- Institute of Mathematics and Computer ScienceUniversity of LatviaRigaLV1459Latvia
- Faculty of ComputingUniversity of LatviaRigaLV1586Latvia
| | - Inge Jonassen
- Computational Biology UnitInformatics DepartmentUniversity of BergenBergenNO5020Norway
| | - Lelde Lace
- Institute of Mathematics and Computer ScienceUniversity of LatviaRigaLV1459Latvia
- Faculty of ComputingUniversity of LatviaRigaLV1586Latvia
| | - Juan Antonio Vizcaíno
- European Molecular Biology LaboratoryEuropean Bioinformatics InstituteEMBL‐EBIWellcome Trust Genome CampusHinxtonCB10 1SDUK
| | | | - Alvis Brazma
- European Molecular Biology LaboratoryEuropean Bioinformatics InstituteEMBL‐EBIWellcome Trust Genome CampusHinxtonCB10 1SDUK
| | - Juris Viksna
- Institute of Mathematics and Computer ScienceUniversity of LatviaRigaLV1459Latvia
- Faculty of ComputingUniversity of LatviaRigaLV1586Latvia
| |
Collapse
|