1
|
Khan S, Conover R, Asthagiri AR, Slavov N. Dynamics of Single-Cell Protein Covariation during Epithelial-Mesenchymal Transition. J Proteome Res 2025; 24:1519-1527. [PMID: 38663020 PMCID: PMC11502509 DOI: 10.1021/acs.jproteome.4c00277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024]
Abstract
Physiological processes, such as the epithelial-mesenchymal transition (EMT), are mediated by changes in protein interactions. These changes may be better reflected in protein covariation within a cellular cluster than in the temporal dynamics of cluster-average protein abundance. To explore this possibility, we quantified proteins in single human cells undergoing EMT. Covariation analysis of the data revealed that functionally coherent protein clusters dynamically changed their protein-protein correlations without concomitant changes in the cluster-average protein abundance. These dynamics of protein-protein correlations were monotonic in time and delineated protein modules functioning in actin cytoskeleton organization, energy metabolism, and protein transport. These protein modules are defined by protein covariation within the same time point and cluster and, thus, reflect biological regulation masked by the cluster-average protein dynamics. Thus, protein correlation dynamics across single cells offers a window into protein regulation during physiological transitions.
Collapse
Affiliation(s)
- Saad Khan
- Department
of Bioengineering, Northeastern University, Boston, Massachusetts 02115, United States
- Department
of Biology, Northeastern University, Boston, Massachusetts 02115, United States
| | - Rachel Conover
- Department
of Bioengineering, Northeastern University, Boston, Massachusetts 02115, United States
| | - Anand R. Asthagiri
- Department
of Bioengineering, Northeastern University, Boston, Massachusetts 02115, United States
- Department
of Biology, Northeastern University, Boston, Massachusetts 02115, United States
- Department
of Chemical Engineering, Northeastern University, Boston, Massachusetts 02115, United States
| | - Nikolai Slavov
- Department
of Bioengineering, Northeastern University, Boston, Massachusetts 02115, United States
- Department
of Biology, Northeastern University, Boston, Massachusetts 02115, United States
- Parallel
Squared Technology Institute, Watertown, Massachusetts 02472, United States
| |
Collapse
|
2
|
Nitz A, Giraldez Chavez JH, Eliason ZG, Payne SH. Are We There Yet? Assessing the Readiness of Single-Cell Proteomics to Answer Biological Hypotheses. J Proteome Res 2025; 24:1482-1492. [PMID: 38981598 PMCID: PMC11976870 DOI: 10.1021/acs.jproteome.4c00091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 05/02/2024] [Accepted: 06/13/2024] [Indexed: 07/11/2024]
Abstract
Single-cell analysis is an active area of research in many fields of biology. Measurements at single-cell resolution allow researchers to study diverse populations without losing biologically meaningful information to sample averages. Many technologies have been used to study single cells, including mass spectrometry-based single-cell proteomics (SCP). SCP has seen a lot of growth over the past couple of years through improvements in data acquisition and analysis, leading to greater proteomic depth. Because method development has been the main focus in SCP, biological applications have been sprinkled in only as proof-of-concept. However, SCP methods now provide significant coverage of the proteome and have been implemented in many laboratories. Thus, a primary question to address in our community is whether the current state of technology is ready for widespread adoption for biological inquiry. In this Perspective, we examine the potential for SCP in three thematic areas of biological investigation: cell annotation, developmental trajectories, and spatial mapping. We identify that the primary limitation of SCP is sample throughput. As proteome depth has been the primary target for method development to date, we advocate for a change in focus to facilitate measuring tens of thousands of single-cell proteomes to enable biological applications beyond proof-of-concept.
Collapse
Affiliation(s)
- Alyssa
A. Nitz
- Biology Department, Brigham Young University, Provo, Utah 84602, United States
| | | | - Zachary G. Eliason
- Biology Department, Brigham Young University, Provo, Utah 84602, United States
| | - Samuel H. Payne
- Biology Department, Brigham Young University, Provo, Utah 84602, United States
| |
Collapse
|
3
|
Segers A, Castiglione C, Vanderaa C, De Baere E, Martens L, Risso D, Clement L. omicsGMF: a multi-tool for dimensionality reduction, batch correction and imputation applied to bulk- and single cell proteomics data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.24.644996. [PMID: 40196514 PMCID: PMC11974731 DOI: 10.1101/2025.03.24.644996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/09/2025]
Abstract
The unprecedented speed and sensitivity of mass spectrometry (MS) unlocked large-scale applications of proteomics and even enabled proteome profiling of single cells. However, this fast-evolving field is hindered by a lack of scalable dimensionality reduction tools that can compensate for substantial batch effects and missingness across MS runs. Therefore, we present omicsGMF, a fast, scalable, and interpretable matrix factorization method, tailored for bulk and single-cell proteomics data. Unlike current workflows that sequentially apply imputation, batch correction, and principal component analysis, omicsGMF integrates these steps into a unified framework, dramatically enhancing data processing and dimensionality reduction. Additionally, omicsGMF provides robust imputation of missing values, outperforming bespoke state-of-the-art imputation tools. We further demonstrate how this integrated approach increases statistical power to detect differentially abundant proteins in the downstream data analysis. Hence, omicsGMF is a highly scalable approach to dimensionality reduction in proteomics, that dramatically improves many important steps in proteomics data analysis.
Collapse
Affiliation(s)
- Alexandre Segers
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University. Ghent, Belgium
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital. Ghent, Belgium
| | - Cristian Castiglione
- Bocconi Institute for Data Science and Analytics, Bocconi University. Milan, Italy
| | - Christophe Vanderaa
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University. Ghent, Belgium
| | - Elfride De Baere
- Center for Medical Genetics Ghent, Ghent University and Ghent University Hospital. Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University. Ghent, Belgium
| | - Lennart Martens
- Department of Biomolecular Medicine, Ghent University. Ghent, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB. Ghent, Belgium
| | - Davide Risso
- Department of Statistical Sciences, University of Padova. Padova, Italy
| | - Lieven Clement
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University. Ghent, Belgium
| |
Collapse
|
4
|
Sanchez-Avila X, de Oliveira RM, Huang S, Wang C, Kelly RT. Trends in Mass Spectrometry-Based Single-Cell Proteomics. Anal Chem 2025; 97:5893-5907. [PMID: 40091206 PMCID: PMC12003028 DOI: 10.1021/acs.analchem.5c00661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Affiliation(s)
- Ximena Sanchez-Avila
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, United States
| | - Raphaela M de Oliveira
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, United States
| | - Siqi Huang
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, United States
| | - Chao Wang
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, United States
| | - Ryan T Kelly
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah 84602, United States
| |
Collapse
|
5
|
MOON HAEUN, DU JINHONG, LEI JING, ROEDER KATHRYN. AUGMENTED DOUBLY ROBUST POST-IMPUTATION INFERENCE FOR PROTEOMIC DATA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.03.23.586387. [PMID: 39868108 PMCID: PMC11761724 DOI: 10.1101/2024.03.23.586387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Quantitative measurements produced by mass spectrometry proteomics experiments offer a direct way to explore the role of proteins in molecular mechanisms. However, analysis of such data is challenging due to the large proportion of missing values. A common strategy to address this issue is to utilize an imputed dataset, which often introduces systematic bias into downstream analyses if the imputation errors are ignored. In this paper, we propose a statistical framework inspired by doubly robust estimators that offers valid and efficient inference for proteomic data. Our framework combines powerful machine learning tools, such as variational autoencoders, to augment the imputation quality with high-dimensional peptide data, and a parametric model to estimate the propensity score for debiasing imputed outcomes. Our estimator is compatible with the double machine learning framework and has provable properties. Simulation studies verify its empirical superiority over other existing procedures. In application to both single-cell proteomic data and bulk-cell Alzheimer's Disease data our method utilizes the imputed data to gain additional, meaningful discoveries and yet maintains good control of false positives.
Collapse
Affiliation(s)
- HAEUN MOON
- Department of Statistics, Seoul National University
| | - JIN-HONG DU
- Department of Statistics and Data Science, Carnegie Mellon University
| | - JING LEI
- Department of Statistics and Data Science, Carnegie Mellon University
| | - KATHRYN ROEDER
- Department of Statistics and Data Science, Carnegie Mellon University
| |
Collapse
|
6
|
Heininen J, Movahedi P, Kotiaho T, Kostiainen R, Pahikkala T, Teppo J. Targeted and Untargeted Amine Metabolite Quantitation in Single Cells with Isobaric Multiplexing. Chemistry 2024; 30:e202403278. [PMID: 39422672 DOI: 10.1002/chem.202403278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Accepted: 10/14/2024] [Indexed: 10/19/2024]
Abstract
We developed a single cell amine analysis approach utilizing isobarically multiplexed samples of 6 individual cells along with analyte abundant carrier. This methodology was applied for absolute quantitation of amino acids and untargeted relative quantitation of amines in a total of 108 individual cells using nanoflow LC with high-resolution mass spectrometry. Together with individually determined cell sizes, this provides accessible quantification of intracellular amino acid concentrations within individual cells. The targeted method was partially validated for 10 amino acids with limits of detection in low attomoles, linear calibration range covering analyte amounts typically from 30 amol to 120 fmol, and correlation coefficients (R) above 0.99. This was applied with cell sizes recorded during dispensing to determine millimolar intracellular amino acid concentrations. The untargeted approach yielded 249 features that were detected in at least 25 % of the single cells, providing modest cell type separation on principal component analysis. Using Greedy forward selection with regularized least squares, a sub-selection of 100 features explaining most of the difference was determined. These features were annotated using MS2 from analyte standards and accurate mass with library search. The approach provides accessible, sensitive, and high-throughput method with the potential to be expanded also to other forms of ultrasensitive analysis.
Collapse
Affiliation(s)
- Juho Heininen
- Drug Research Program and Division of Pharmaceutical Chemistry and Technology, Faculty of Pharmacy, University of Helsinki, P.O. Box 56, FI-00014, Helsinki, Finland
| | - Parisa Movahedi
- Department of Computing, Turku University, 20014, Turku, Finland
| | - Tapio Kotiaho
- Drug Research Program and Division of Pharmaceutical Chemistry and Technology, Faculty of Pharmacy, University of Helsinki, P.O. Box 56, FI-00014, Helsinki, Finland
- Department of Chemistry, Faculty of Science, University of Helsinki, P.O. Box 55, FI-00014, Helsinki, Finland
| | - Risto Kostiainen
- Drug Research Program and Division of Pharmaceutical Chemistry and Technology, Faculty of Pharmacy, University of Helsinki, P.O. Box 56, FI-00014, Helsinki, Finland
| | - Tapio Pahikkala
- Department of Computing, Turku University, 20014, Turku, Finland
| | - Jaakko Teppo
- Drug Research Program and Division of Pharmaceutical Chemistry and Technology, Faculty of Pharmacy, University of Helsinki, P.O. Box 56, FI-00014, Helsinki, Finland
| |
Collapse
|
7
|
Fulcher JM, Markillie LM, Mitchell HD, Williams SM, Engbrecht KM, Degnan DJ, Bramer LM, Moore RJ, Chrisler WB, Cantlon-Bruce J, Bagnoli JW, Qian WJ, Seth A, Paša-Tolić L, Zhu Y. Parallel measurement of transcriptomes and proteomes from same single cells using nanodroplet splitting. Nat Commun 2024; 15:10614. [PMID: 39638780 PMCID: PMC11621338 DOI: 10.1038/s41467-024-54099-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 11/01/2024] [Indexed: 12/07/2024] Open
Abstract
Single-cell multiomics provides comprehensive insights into gene regulatory networks, cellular diversity, and temporal dynamics. Here, we introduce nanoSPLITS (nanodroplet SPlitting for Linked-multimodal Investigations of Trace Samples), an integrated platform that enables global profiling of the transcriptome and proteome from same single cells via RNA sequencing and mass spectrometry-based proteomics, respectively. Benchmarking of nanoSPLITS demonstrates high measurement precision with deep proteomic and transcriptomic profiling of single-cells. We apply nanoSPLITS to cyclin-dependent kinase 1 inhibited cells and found phospho-signaling events could be quantified alongside global protein and mRNA measurements, providing insights into cell cycle regulation. We extend nanoSPLITS to primary cells isolated from human pancreatic islets, introducing an efficient approach for facile identification of unknown cell types and their protein markers by mapping transcriptomic data to existing large-scale single-cell RNA sequencing reference databases. Accordingly, we establish nanoSPLITS as a multiomic technology incorporating global proteomics and anticipate the approach will be critical to furthering our understanding of biological systems.
Collapse
Affiliation(s)
- James M Fulcher
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, 99354, USA.
| | - Lye Meng Markillie
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Hugh D Mitchell
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Sarah M Williams
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Kristin M Engbrecht
- Nuclear, Chemistry, and Biology Division, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - David J Degnan
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Lisa M Bramer
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Ronald J Moore
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - William B Chrisler
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Joshua Cantlon-Bruce
- Scienion AG, Volmerstraße 7, 12489, Berlin, Germany
- Cellenion SASU, 60 Avenue Rockefeller, Bâtiment BioSerra2, 69008, Lyon, France
| | - Johannes W Bagnoli
- Cellenion SASU, 60 Avenue Rockefeller, Bâtiment BioSerra2, 69008, Lyon, France
| | - Wei-Jun Qian
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Anjali Seth
- Cellenion SASU, 60 Avenue Rockefeller, Bâtiment BioSerra2, 69008, Lyon, France
| | - Ljiljana Paša-Tolić
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Ying Zhu
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, 99354, USA.
- Department of Proteomic and Genomic Technologies, Genentech Inc., 1 DNA Way, South San Francisco, 94080, USA.
| |
Collapse
|
8
|
Leduc A, Khoury L, Cantlon J, Khan S, Slavov N. Massively parallel sample preparation for multiplexed single-cell proteomics using nPOP. Nat Protoc 2024; 19:3750-3776. [PMID: 39117766 PMCID: PMC11614709 DOI: 10.1038/s41596-024-01033-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 05/27/2024] [Indexed: 08/10/2024]
Abstract
Single-cell proteomics by mass spectrometry (MS) allows the quantification of proteins with high specificity and sensitivity. To increase its throughput, we developed nano-proteomic sample preparation (nPOP), a method for parallel preparation of thousands of single cells in nanoliter-volume droplets deposited on glass slides. Here, we describe its protocol with emphasis on its flexibility to prepare samples for different multiplexed MS methods. An implementation using the plexDIA MS multiplexing method, which uses non-isobaric mass tags to barcode peptides from different samples for data-independent acquisition, demonstrates accurate quantification of ~3,000-3,700 proteins per human cell. A separate implementation with isobaric mass tags and prioritized data acquisition demonstrates analysis of 1,827 single cells at a rate of >1,000 single cells per day at a depth of 800-1,200 proteins per human cell. The protocol is implemented by using a cell-dispensing and liquid-handling robot-the CellenONE instrument-and uses readily available consumables, which should facilitate broad adoption. nPOP can be applied to all samples that can be processed to a single-cell suspension. It takes 1 or 2 d to prepare >3,000 single cells. We provide metrics and software (the QuantQC R package) for quality control and data exploration. QuantQC supports the robust scaling of nPOP to higher plex reagents for achieving reliable and scalable single-cell proteomics.
Collapse
Affiliation(s)
- Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA.
| | - Luke Khoury
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA
| | | | - Saad Khan
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, and Barnett Institute, Northeastern University, Boston, MA, USA.
- Parallel Squared Technology Institute, Watertown, MA, USA.
| |
Collapse
|
9
|
Montes C, Zhang J, Nolan TM, Walley JW. Single-cell proteomics differentiates Arabidopsis root cell types. THE NEW PHYTOLOGIST 2024; 244:1750-1759. [PMID: 38923440 DOI: 10.1111/nph.19923] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 06/09/2024] [Indexed: 06/28/2024]
Abstract
Single-cell proteomics (SCP) is an emerging approach to resolve cellular heterogeneity within complex tissues of multi-cellular organisms. Here, we demonstrate the feasibility of SCP on plant samples using the model plant Arabidopsis thaliana. Specifically, we focused on examining isolated single cells from the cortex and endodermis, which are two adjacent root cell types derived from a common stem cell lineage. From 756 root cells, we identified 3763 proteins and 1118 proteins/cell. Ultimately, we focus on 3217 proteins quantified following stringent filtering. Of these, we identified 596 proteins whose expression is enriched in either the cortex or endodermis and are able to differentiate these closely related plant cell types. Collectivity, this study demonstrates that SCP can resolve neighboring cell types with distinct functions, thereby facilitating the identification of biomarkers and candidate proteins to enable functional genomics.
Collapse
Affiliation(s)
- Christian Montes
- Department of Plant Pathology, Entomology, and Microbiology, Iowa State University, Ames, IA, 50011, USA
| | - Jingyuan Zhang
- Department of Biology, Duke University, Durham, NC, 27708, USA
| | - Trevor M Nolan
- Department of Biology, Duke University, Durham, NC, 27708, USA
- Howard Hughes Medical Institute, Duke University, Durham, NC, 27708, USA
| | - Justin W Walley
- Department of Plant Pathology, Entomology, and Microbiology, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
10
|
Khan S, Elcheikhali M, Leduc A, Huffman RG, Derks J, Franks A, Slavov N. Inferring post-transcriptional regulation within and across cell types in human testis. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.08.617313. [PMID: 39416047 PMCID: PMC11483007 DOI: 10.1101/2024.10.08.617313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/19/2024]
Abstract
Single-cell tissue atlases commonly use RNA abundances as surrogates for protein abundances. Yet, protein abundance also depends on the regulation of protein synthesis and degradation rates. To estimate the contributions of such post transcriptional regulation, we quantified the proteomes of 5,883 single cells from human testis using 3 distinct mass spectrometry methods (SCoPE2, pSCoPE, and plexDIA). To distinguish between biological and technical factors contributing to differences between protein and RNA levels, we developed BayesPG, a Bayesian model of transcript and protein abundance that systematically accounts for technical variation and infers biological differences. We use BayesPG to jointly model RNA and protein data collected from 29,709 single cells across different methods and datasets. BayesPG estimated consensus mRNA and protein levels for 3,861 gene products and quantified the relative protein-to-mRNA ratio (rPTR) for each gene across six distinct cell types in samples from human testis. About 28% of the gene products exhibited significant differences at protein and RNA levels and contributed to about 1, 500 significant GO groups. We observe that specialized and context specific functions, such as those related to spermatogenesis are regulated after transcription. Among hundreds of detected post translationally modified peptides, many show significant abundance differences across cell types. Furthermore, some phosphorylated peptides covary with kinases in a cell-type dependent manner, suggesting cell-type specific regulation. Our results demonstrate the potential of inferring protein regulation in from single-cell proteogenomic data and provide a generalizable model, BayesPG, for performing such analyses.
Collapse
Affiliation(s)
- Saad Khan
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Co-first authors, equal contribution
| | - Megan Elcheikhali
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Co-first authors, equal contribution
- Department of Statistics and Applied Probability, University of California Santa Barbara, CA, USA
- Parallel Squared Technology Institute, Watertown, MA, USA
| | - Andrew Leduc
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - R Gray Huffman
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
| | - Jason Derks
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Parallel Squared Technology Institute, Watertown, MA, USA
| | - Alexander Franks
- Department of Statistics and Applied Probability, University of California Santa Barbara, CA, USA
- Co-senior authors, equal contribution
| | - Nikolai Slavov
- Departments of Bioengineering, Biology, Chemistry and Chemical Biology, Single Cell Proteomics Center, Northeastern University, Boston, MA 02115, USA
- Parallel Squared Technology Institute, Watertown, MA, USA
- Co-senior authors, equal contribution
| |
Collapse
|
11
|
Younus S, Rönnstrand L, Kazi JU. Xputer: bridging data gaps with NMF, XGBoost, and a streamlined GUI experience. Front Artif Intell 2024; 7:1345179. [PMID: 38720912 PMCID: PMC11076752 DOI: 10.3389/frai.2024.1345179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Accepted: 04/08/2024] [Indexed: 05/12/2024] Open
Abstract
The rapid proliferation of data across diverse fields has accentuated the importance of accurate imputation for missing values. This task is crucial for ensuring data integrity and deriving meaningful insights. In response to this challenge, we present Xputer, a novel imputation tool that adeptly integrates Non-negative Matrix Factorization (NMF) with the predictive strengths of XGBoost. One of Xputer's standout features is its versatility: it supports zero imputation, enables hyperparameter optimization through Optuna, and allows users to define the number of iterations. For enhanced user experience and accessibility, we have equipped Xputer with an intuitive Graphical User Interface (GUI) ensuring ease of handling, even for those less familiar with computational tools. In performance benchmarks, Xputer often outperforms IterativeImputer in terms of imputation accuracy. Furthermore, Xputer autonomously handles a diverse spectrum of data types, including categorical, continuous, and Boolean, eliminating the need for prior preprocessing. Given its blend of performance, flexibility, and user-friendly design, Xputer emerges as a state-of-the-art solution in the realm of data imputation.
Collapse
Affiliation(s)
- Saleena Younus
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Lund University Cancer Centre (LUCC), Lund University, Lund, Sweden
| | - Lars Rönnstrand
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Lund University Cancer Centre (LUCC), Lund University, Lund, Sweden
- Department of Hematology, Oncology and Radiation Physics, Skåne University Hospital, Lund, Sweden
| | - Julhash U. Kazi
- Division of Translational Cancer Research, Department of Laboratory Medicine, Lund University, Lund, Sweden
- Lund Stem Cell Center, Lund University, Lund, Sweden
- Lund University Cancer Centre (LUCC), Lund University, Lund, Sweden
| |
Collapse
|
12
|
Thomas DR, Garnish SE, Khoo CA, Padmanabhan B, Scott NE, Newton HJ. Coxiella burnetii protein CBU2016 supports CCV expansion. Pathog Dis 2024; 82:ftae018. [PMID: 39138067 PMCID: PMC11352601 DOI: 10.1093/femspd/ftae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 07/28/2024] [Accepted: 08/13/2024] [Indexed: 08/15/2024] Open
Abstract
Coxiella burnetii is a globally distributed obligate intracellular pathogen. Although often asymptomatic, infections can cause acute Q fever with influenza-like symptoms and/or severe chronic Q fever. Coxiella burnetii develops a unique replicative niche within host cells called the Coxiella-containing vacuole (CCV), facilitated by the Dot/Icm type IV secretion system translocating a cohort of bacterial effector proteins into the host. The role of some effectors has been elucidated; however, the actions of the majority remain enigmatic and the list of true effectors is disputable. This study examined CBU2016, a unique C. burnetii protein previously designated as an effector with a role in infection. We were unable to validate CBU2016 as a translocated effector protein. Employing targeted knock-out and complemented strains, we found that the loss of CBU2016 did not cause a replication defect within Hela, THP-1, J774, or iBMDM cells or in axenic media, nor did it affect the pathogenicity of C. burnetii in the Galleria mellonella infection model. The absence of CBU2016 did, however, result in a consistent decrease in the size of CCVs in HeLa cells. These results suggest that although CBU2016 may not be a Dot/Icm effector, it is still able to influence the host environment during infection.
Collapse
Affiliation(s)
- David R Thomas
- Infection Program, Monash Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, VIC 3800, Australia
- Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| | - Sarah E Garnish
- Infection Program, Monash Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, VIC 3800, Australia
| | - Chen Ai Khoo
- Infection Program, Monash Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, VIC 3800, Australia
| | - Bhavna Padmanabhan
- Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| | - Nichollas E Scott
- Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| | - Hayley J Newton
- Infection Program, Monash Biomedicine Discovery Institute and Department of Microbiology, Monash University, Clayton, VIC 3800, Australia
- Department of Microbiology and Immunology at the Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| |
Collapse
|
13
|
Ziegler AR, Dufour A, Scott NE, Edgington-Mitchell LE. Ion Mobility-Based Enrichment-Free N-Terminomics Analysis Reveals Novel Legumain Substrates in Murine Spleen. Mol Cell Proteomics 2024; 23:100714. [PMID: 38199506 PMCID: PMC10862022 DOI: 10.1016/j.mcpro.2024.100714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 12/19/2023] [Accepted: 01/02/2024] [Indexed: 01/12/2024] Open
Abstract
Aberrant levels of the asparaginyl endopeptidase legumain have been linked to inflammation, neurodegeneration, and cancer, yet our understanding of this protease is incomplete. Systematic attempts to identify legumain substrates have been previously confined to in vitro studies, which fail to mirror physiological conditions and obscure biologically relevant cleavage events. Using high-field asymmetric waveform ion mobility spectrometry (FAIMS), we developed a streamlined approach for proteome and N-terminome analyses without the need for N-termini enrichment. Compared to unfractionated proteomic analysis, we demonstrate FAIMS fractionation improves N-termini identification by >2.5 fold, resulting in the identification of >2882 unique N-termini from limited sample amounts. In murine spleens, this approach identifies 6366 proteins and 2528 unique N-termini, with 235 cleavage events enriched in WT compared to legumain-deficient spleens. Among these, 119 neo-N-termini arose from asparaginyl endopeptidase activities, representing novel putative physiological legumain substrates. The direct cleavage of selected substrates by legumain was confirmed using in vitro assays, providing support for the existence of physiologically relevant extra-lysosomal legumain activity. Combined, these data shed critical light on the functions of legumain and demonstrate the utility of FAIMS as an accessible method to improve depth and quality of N-terminomics studies.
Collapse
Affiliation(s)
- Alexander R Ziegler
- Department of Biochemistry and Pharmacology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, Victoria, Australia
| | - Antoine Dufour
- Department of Physiology and Pharmacology, University of Calgary, Calgary, Alberta, Canada; McCaig Institute for Bone and Joint Health, University of Calgary, Calgary, Alberta, Canada
| | - Nichollas E Scott
- Department of Microbiology and Immunology, Peter Doherty Institute, The University of Melbourne, Parkville, Victoria, Australia.
| | - Laura E Edgington-Mitchell
- Department of Biochemistry and Pharmacology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, Victoria, Australia.
| |
Collapse
|
14
|
Wang F, Liu C, Li J, Yang F, Song J, Zang T, Yao J, Wang G. SPDB: a comprehensive resource and knowledgebase for proteomic data at the single-cell resolution. Nucleic Acids Res 2024; 52:D562-D571. [PMID: 37953313 PMCID: PMC10767837 DOI: 10.1093/nar/gkad1018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/28/2023] [Accepted: 10/23/2023] [Indexed: 11/14/2023] Open
Abstract
The single-cell proteomics enables the direct quantification of protein abundance at the single-cell resolution, providing valuable insights into cellular phenotypes beyond what can be inferred from transcriptome analysis alone. However, insufficient large-scale integrated databases hinder researchers from accessing and exploring single-cell proteomics, impeding the advancement of this field. To fill this deficiency, we present a comprehensive database, namely Single-cell Proteomic DataBase (SPDB, https://scproteomicsdb.com/), for general single-cell proteomic data, including antibody-based or mass spectrometry-based single-cell proteomics. Equipped with standardized data process and a user-friendly web interface, SPDB provides unified data formats for convenient interaction with downstream analysis, and offers not only dataset-level but also protein-level data search and exploration capabilities. To enable detailed exhibition of single-cell proteomic data, SPDB also provides a module for visualizing data from the perspectives of cell metadata or protein features. The current version of SPDB encompasses 133 antibody-based single-cell proteomic datasets involving more than 300 million cells and over 800 marker/surface proteins, and 10 mass spectrometry-based single-cell proteomic datasets involving more than 4000 cells and over 7000 proteins. Overall, SPDB is envisioned to be explored as a useful resource that will facilitate the wider research communities by providing detailed insights into proteomics from the single-cell perspective.
Collapse
Affiliation(s)
- Fang Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
- AI Lab, Tencent, Shenzhen 518000, China
| | - Chunpu Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Jiawei Li
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Fan Yang
- AI Lab, Tencent, Shenzhen 518000, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Tianyi Zang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | | | - Guohua Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| |
Collapse
|
15
|
Grégoire S, Vanderaa C, Dit Ruys SP, Kune C, Mazzucchelli G, Vertommen D, Gatto L. Standardized Workflow for Mass-Spectrometry-Based Single-Cell Proteomics Data Processing and Analysis Using the scp Package. Methods Mol Biol 2024; 2817:177-220. [PMID: 38907155 DOI: 10.1007/978-1-0716-3934-4_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/23/2024]
Abstract
Mass-spectrometry (MS)-based single-cell proteomics (SCP) explores cellular heterogeneity by focusing on the functional effectors of the cells-proteins. However, extracting meaningful biological information from MS data is far from trivial, especially with single cells. Currently, data analysis workflows are substantially different from one research team to another. Moreover, it is difficult to evaluate pipelines as ground truths are missing. Our team has developed the R/Bioconductor package called scp to provide a standardized framework for SCP data analysis. It relies on the widely used QFeatures and SingleCellExperiment data structures. In addition, we used a design containing cell lines mixed in known proportions to generate controlled variability for data analysis benchmarking. In this chapter, we provide a flexible data analysis protocol for SCP data using the scp package together with comprehensive explanations at each step of the processing. Our main steps are quality control on the feature and cell level, aggregation of the raw data into peptides and proteins, normalization, and batch correction. We validate our workflow using our ground truth data set. We illustrate how to use this modular, standardized framework and highlight some crucial steps.
Collapse
Affiliation(s)
- Samuel Grégoire
- Computational Biology and Bioinformatics Unit, de Duve Institute, UCLouvain, Brussels, Belgium
| | - Christophe Vanderaa
- Computational Biology and Bioinformatics Unit, de Duve Institute, UCLouvain, Brussels, Belgium
| | | | - Christopher Kune
- Laboratory of Mass Spectrometry, MolSys Research Unit, University of Liège, Liège, Belgium
| | - Gabriel Mazzucchelli
- Laboratory of Mass Spectrometry, MolSys Research Unit, University of Liège, Liège, Belgium
- GIGA Proteomics Facility, University of Liège, Liège, Belgium
| | - Didier Vertommen
- Protein Phosphorylation Unit, de Duve Institute, UCLouvain, Brussels, Belgium
| | - Laurent Gatto
- Computational Biology and Bioinformatics Unit, de Duve Institute, UCLouvain, Brussels, Belgium.
| |
Collapse
|