1
|
SpliceProt 2.0: A Sequence Repository of Human, Mouse, and Rat Proteoforms. Int J Mol Sci 2024; 25:1183. [PMID: 38256255 PMCID: PMC10816255 DOI: 10.3390/ijms25021183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 12/15/2023] [Accepted: 01/03/2024] [Indexed: 01/24/2024] Open
Abstract
SpliceProt 2.0 is a public proteogenomics database that aims to list the sequence of known proteins and potential new proteoforms in human, mouse, and rat proteomes. This updated repository provides an even broader range of computationally translated proteins and serves, for example, to aid with proteomic validation of splice variants absent from the reference UniProtKB/SwissProt database. We demonstrate the value of SpliceProt 2.0 to predict orthologous proteins between humans and murines based on transcript reconstruction, sequence annotation and detection at the transcriptome and proteome levels. In this release, the annotation data used in the reconstruction of transcripts based on the methodology of ternary matrices were acquired from new databases such as Ensembl, UniProt, and APPRIS. Another innovation implemented in the pipeline is the exclusion of transcripts predicted to be susceptible to degradation through the NMD pathway. Taken together, our repository and its applications represent a valuable resource for the proteogenomics community.
Collapse
|
2
|
How Can Proteomics Help to Elucidate the Pathophysiological Crosstalk in Muscular Dystrophy and Associated Multi-System Dysfunction? Proteomes 2024; 12:4. [PMID: 38250815 PMCID: PMC10801633 DOI: 10.3390/proteomes12010004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 01/09/2024] [Accepted: 01/12/2024] [Indexed: 01/23/2024] Open
Abstract
This perspective article is concerned with the question of how proteomics, which is a core technique of systems biology that is deeply embedded in the multi-omics field of modern bioresearch, can help us better understand the molecular pathogenesis of complex diseases. As an illustrative example of a monogenetic disorder that primarily affects the neuromuscular system but is characterized by a plethora of multi-system pathophysiological alterations, the muscle-wasting disease Duchenne muscular dystrophy was examined. Recent achievements in the field of dystrophinopathy research are described with special reference to the proteome-wide complexity of neuromuscular changes and body-wide alterations/adaptations. Based on a description of the current applications of top-down versus bottom-up proteomic approaches and their technical challenges, future systems biological approaches are outlined. The envisaged holistic and integromic bioanalysis would encompass the integration of diverse omics-type studies including inter- and intra-proteomics as the core disciplines for systematic protein evaluations, with sophisticated biomolecular analyses, including physiology, molecular biology, biochemistry and histochemistry. Integrated proteomic findings promise to be instrumental in improving our detailed knowledge of pathogenic mechanisms and multi-system dysfunction, widening the available biomarker signature of dystrophinopathy for improved diagnostic/prognostic procedures, and advancing the identification of novel therapeutic targets to treat Duchenne muscular dystrophy.
Collapse
|
3
|
Global detection of human variants and isoforms by deep proteome sequencing. Nat Biotechnol 2023; 41:1776-1786. [PMID: 36959352 PMCID: PMC10713452 DOI: 10.1038/s41587-023-01714-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 02/15/2023] [Indexed: 03/25/2023]
Abstract
An average shotgun proteomics experiment detects approximately 10,000 human proteins from a single sample. However, individual proteins are typically identified by peptide sequences representing a small fraction of their total amino acids. Hence, an average shotgun experiment fails to distinguish different protein variants and isoforms. Deeper proteome sequencing is therefore required for the global discovery of protein isoforms. Using six different human cell lines, six proteases, deep fractionation and three tandem mass spectrometry fragmentation methods, we identify a million unique peptides from 17,717 protein groups, with a median sequence coverage of approximately 80%. Direct comparison with RNA expression data provides evidence for the translation of most nonsynonymous variants. We have also hypothesized that undetected variants likely arise from mutation-induced protein instability. We further observe comparable detection rates for exon-exon junction peptides representing constitutive and alternative splicing events. Our dataset represents a resource for proteoform discovery and provides direct evidence that most frame-preserving alternatively spliced isoforms are translated.
Collapse
|
4
|
Mass Spectrometry-Based Proteomic Technology and Its Application to Study Skeletal Muscle Cell Biology. Cells 2023; 12:2560. [PMID: 37947638 PMCID: PMC10649384 DOI: 10.3390/cells12212560] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 10/27/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023] Open
Abstract
Voluntary striated muscles are characterized by a highly complex and dynamic proteome that efficiently adapts to changed physiological demands or alters considerably during pathophysiological dysfunction. The skeletal muscle proteome has been extensively studied in relation to myogenesis, fiber type specification, muscle transitions, the effects of physical exercise, disuse atrophy, neuromuscular disorders, muscle co-morbidities and sarcopenia of old age. Since muscle tissue accounts for approximately 40% of body mass in humans, alterations in the skeletal muscle proteome have considerable influence on whole-body physiology. This review outlines the main bioanalytical avenues taken in the proteomic characterization of skeletal muscle tissues, including top-down proteomics focusing on the characterization of intact proteoforms and their post-translational modifications, bottom-up proteomics, which is a peptide-centric method concerned with the large-scale detection of proteins in complex mixtures, and subproteomics that examines the protein composition of distinct subcellular fractions. Mass spectrometric studies over the last two decades have decisively improved our general cell biological understanding of protein diversity and the heterogeneous composition of individual myofibers in skeletal muscles. This detailed proteomic knowledge can now be integrated with findings from other omics-type methodologies to establish a systems biological view of skeletal muscle function.
Collapse
|
5
|
Exploiting ion-mobility mass spectrometry for unraveling proteome complexity. J Sep Sci 2023; 46:e2300512. [PMID: 37746674 DOI: 10.1002/jssc.202300512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/17/2023] [Accepted: 08/18/2023] [Indexed: 09/26/2023]
Abstract
Ion mobility spectrometry-mass spectrometry (IMS-MS) is experiencing rapid growth in proteomic studies, driven by its enhancements in dynamic range and throughput, increasing the quantitation precision, and the depth of proteome coverage. The core principle of ion mobility spectrometry is to separate ions in an inert gas under the influence of an electric field based on differences in drift time. This minireview provides an introduction to IMS operation modes and a description of advantages and limitations is presented. Moreover, the principles of trapped IMS-MS (TIMS-MS), including parallel accumulation-serial fragmentation are discussed. Finally, emerging applications linked to TIMS focusing on sample throughput (in clinical proteomics) and sensitivity (single-cell proteomics) are reviewed, and the possibilities of intact protein analysis are discussed.
Collapse
|
6
|
Strategies for Conditional Regulation of Proteins. JACS AU 2023; 3:344-357. [PMID: 36873677 PMCID: PMC9975842 DOI: 10.1021/jacsau.2c00654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 01/09/2023] [Accepted: 01/10/2023] [Indexed: 06/18/2023]
Abstract
Design of the next-generation of therapeutics, biosensors, and molecular tools for basic research requires that we bring protein activity under control. Each protein has unique properties, and therefore, it is critical to tailor the current techniques to develop new regulatory methods and regulate new proteins of interest (POIs). This perspective gives an overview of the widely used stimuli and synthetic and natural methods for conditional regulation of proteins.
Collapse
|
7
|
Serial Capture Affinity Purification and Integrated Structural Modeling of the H3K4me3 Binding and DNA Damage Related WDR76:SPIN1 Complex. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.31.526478. [PMID: 36778327 PMCID: PMC9915617 DOI: 10.1101/2023.01.31.526478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
WDR76 is a multifunctional protein involved in many cellular functions. With a diverse and complicated protein interaction network, dissecting the structure and function of specific WDR76 complexes is needed. We previously demonstrated the ability of the Serial Capture Affinity Purification (SCAP) method to isolate specific complexes by introducing two proteins of interest as baits at the same time. Here, we applied SCAP to dissect a subpopulation of WDR76 in complex with SPIN1, a histone marker reader that specifically recognizes trimethylated histone H3 lysine4 (H3K4me3). In contrast to the SCAP analysis of the SPIN1:SPINDOC complex, H3K4me3 was copurified with the WDR76:SPIN1 complex. In combination with crosslinking mass spectrometry, we built an integrated structural model of the complex which revealed that SPIN1 recognized the H3K4me3 epigenetic mark while interacting with WDR76. Lastly, interaction network analysis of copurifying proteins revealed the potential role of the WDR76:SPIN1 complex in the DNA damage response. Teaser In contrast to the SPINDOC/SPIN1 complex, analyses reveal that the WDR76/SPIN1 complex interacts with core histones and is involved in DNA damage.
Collapse
|
8
|
Proteome profiling of ductal carcinoma in situ. Breast Dis 2023; 41:513-520. [PMID: 36641653 DOI: 10.3233/bd-220017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
BACKGROUND AND AIM DCIS is the most common type of non-invasive breast cancer, accounting for about 15 to 30%. Proteome profile is used to detect biomarkers in the tissues of breast cancer patients by mass spectrometry. This study aimed to obtain the expression profile of DCIS proteome, and the expression profile of invasive biomarkers, and finally to introduce a dedicated biomarker panel to facilitate the prognosis and early detection for in situ breast cancer patients. METHODS AND MATERIALS In this study, 10 patients with breast cancer (DCIS) were studied. Benign (marginal) and cancerous tissue samples were obtained from patients for proteomics experiments. Initially, all tissue proteins were extracted using standard methods, and the proteins were separated using two-dimensional electrophoresis. Then, the expression amount of the extracted proteins was determined by ITRAQ. The data were analysed by R software, and gene ontology was utilised for describing the protein in detail. RESULTS 30 spots on gel electrophoresis were found in the tumor tissue group (sample), and 15 spots in the margin group (control) with P < 0.05. Healthy and cancerous tissue gels showed that 5 spots had different expression. VWF, MMP9, ITGAM, MPO and PLG protein spots were identified using the site www.ebi.ac.uk/IPI. Finally, protein biomarkers for breast tumor tissue with margin were introduced with the names of P04406, P49915, P05323, P06733, and P02768. DISCUSSION There are 5 critical proteins in inducing cancer pathways especially complement and coagulation cascades. The hall markers of a healthy cell to be cancerous are proliferation, invasion, angiogenesis, and changes in the immune system. Hence, regulation of protein plays a key role in developing recurrence to breast cancer in margins.
Collapse
|
9
|
Evolution of Protein Functional Annotation: Text Mining Study. J Pers Med 2022; 12:jpm12030479. [PMID: 35330478 PMCID: PMC8952229 DOI: 10.3390/jpm12030479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Revised: 03/07/2022] [Accepted: 03/08/2022] [Indexed: 11/23/2022] Open
Abstract
Within the Human Proteome Project initiative framework for creating functional annotations of uPE1 proteins, the neXt-CP50 Challenge was launched in 2018. In analogy with the missing-protein challenge, each command deciphers the functional features of the proteins in the chromosome-centric mode. However, the neXt-CP50 Challenge is more complicated than the missing-protein challenge: the approaches and methods for solving the problem are clear, but neither the concept of protein function nor specific experimental and/or bioinformatics protocols have been standardized to address it. We proposed using a retrospective analysis of the key HPP repository, the neXtProt database, to identify the most frequently used experimental and bioinformatic methods for analyzing protein functions, and the dynamics of accumulation of functional annotations. It has been shown that the dynamics of the increase in the number of proteins with known functions are greater than the progress made in the experimental confirmation of the existence of questionable proteins in the framework of the missing-protein challenge. At the same time, the functional annotation is based on the guilty-by-association postulate, according to which, based on large-scale experiments on API-MS and Y2H, proteins with unknown functions are most likely mapped through “handshakes” to biochemical processes.
Collapse
|
10
|
How Far Are We from the Completion of the Human Protein Interactome Reconstruction? Biomolecules 2022; 12:biom12010140. [PMID: 35053288 PMCID: PMC8774112 DOI: 10.3390/biom12010140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 01/09/2022] [Accepted: 01/11/2022] [Indexed: 12/12/2022] Open
Abstract
After more than fifteen years from the first high-throughput experiments for human protein–protein interaction (PPI) detection, we are still wondering how close the completion of the genome-scale human PPI network reconstruction is, what needs to be further explored and whether the biological insights gained from the holistic investigation of the current network are valid and useful. The unique structure of PICKLE, a meta-database of the human experimentally determined direct PPI network developed by our group, presently covering ~80% of the UniProtKB/Swiss-Prot reviewed human complete proteome, enables the evaluation of the interactome expansion by comparing the successive PICKLE releases since 2013. We observe a gradual overall increase of 39%, 182%, and 67% in protein nodes, PPIs, and supporting references, respectively. Our results indicate that, in recent years, (a) the PPI addition rate has decreased, (b) the new PPIs are largely determined by high-throughput experiments and mainly concern existing protein nodes and (c), as we had predicted earlier, most of the newly added protein nodes have a low degree. These observations, combined with a largely overlapping k-core between PICKLE releases and a network density increase, imply that an almost complete picture of a structurally defined network has been reached. The comparative unsupervised application of two clustering algorithms indicated that exploring the full interactome topology can reveal the protein neighborhoods involved in closely related biological processes as transcriptional regulation, cell signaling and multiprotein complexes such as the connexon complex associated with cancers. A well-reconstructed human protein interactome is a powerful tool in network biology and medicine research forming the basis for multi-omic and dynamic analyses.
Collapse
|
11
|
The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences. Nucleic Acids Res 2022; 50:D543-D552. [PMID: 34723319 PMCID: PMC8728295 DOI: 10.1093/nar/gkab1038] [Citation(s) in RCA: 2380] [Impact Index Per Article: 1190.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 10/12/2021] [Accepted: 10/14/2021] [Indexed: 12/12/2022] Open
Abstract
The PRoteomics IDEntifications (PRIDE) database (https://www.ebi.ac.uk/pride/) is the world's largest data repository of mass spectrometry-based proteomics data. PRIDE is one of the founding members of the global ProteomeXchange (PX) consortium and an ELIXIR core data resource. In this manuscript, we summarize the developments in PRIDE resources and related tools since the previous update manuscript was published in Nucleic Acids Research in 2019. The number of submitted datasets to PRIDE Archive (the archival component of PRIDE) has reached on average around 500 datasets per month during 2021. In addition to continuous improvements in PRIDE Archive data pipelines and infrastructure, the PRIDE Spectra Archive has been developed to provide direct access to the submitted mass spectra using Universal Spectrum Identifiers. As a key point, the file format MAGE-TAB for proteomics has been developed to enable the improvement of sample metadata annotation. Additionally, the resource PRIDE Peptidome provides access to aggregated peptide/protein evidences across PRIDE Archive. Furthermore, we will describe how PRIDE has increased its efforts to reuse and disseminate high-quality proteomics data into other added-value resources such as UniProt, Ensembl and Expression Atlas.
Collapse
|
12
|
Proteomics in support of immunotherapy: contribution to model-based precision medicine. Expert Rev Proteomics 2021; 19:33-42. [PMID: 34937491 DOI: 10.1080/14789450.2021.2020653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
INTRODUCTION Proteomics encompasses a wide and expanding range of methods to identify, characterize, and quantify thousands of proteins from a variety of biological samples, including blood samples, tumors, and tissues. Such methods are supportive of various forms of immunotherapy applied to chronic conditions such as allergies, autoimmune diseases, cancers, and infectious diseases. AREAS COVERED In support of immunotherapy, proteomics based on mass spectrometry has multiple specific applications related to (i) disease modeling and patient stratification, (ii) antigen/ autoantigen/neoantigen/ allergen identification, (iii) characterization of proteins and monoclonal antibodies used for immunotherapeutic or diagnostic purposes, (iv) identification of biomarkers and companion diagnostics and (v) monitoring by immunoproteomics of immune responses elicited in the course of the disease or following immunotherapy. EXPERT OPINION Proteomics contributes as an enabling technology to an evolution of immunotherapy toward a precision medicine approach aiming to better tailor treatments to patients' specificities in multiple disease areas. This trend is favored by a better understanding through multi-omics profiling of both the patient's characteristics, his/her immune status as well as of the features of the immunotherapeutic drug.
Collapse
|
13
|
Mouse Organ-Specific Proteins and Functions. Cells 2021; 10:cells10123449. [PMID: 34943957 PMCID: PMC8700158 DOI: 10.3390/cells10123449] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 11/29/2021] [Accepted: 12/04/2021] [Indexed: 11/19/2022] Open
Abstract
Organ-specific proteins (OSPs) possess great medical potential both in clinics and in biomedical research. Applications of them—such as alanine transaminase, aspartate transaminase, and troponins—in clinics have raised certain concerns of their organ specificity. The dynamics and diversity of protein expression in heterogeneous human populations are well known, yet their effects on OSPs are less addressed. Here, we used mice as a model and implemented a breadth study to examine the panorgan proteome for potential variations in organ specificity in different genetic backgrounds. Using reasonable resources, we generated panorgan proteomes of four in-bred mouse strains. The results revealed a large diversity that was more profound among OSPs than among proteomes overall. We defined a robustness score to quantify such variation and derived three sets of OSPs with different stringencies. In the meantime, we found that the enriched biological functions of OSPs are also organ-specific and are sensitive and useful to assess the quality of OSPs. We hope our breadth study can open doors to explore the molecular diversity and dynamics of organ specificity at the protein level.
Collapse
|
14
|
Abstract
With the steadfast development of proteomic technology, the number of missing proteins (MPs) has been continuously shrinking, with approximately 1470 MPs that have not been explored yet. Due to this phenomenon, the discovery of MPs has been increasingly more difficult and elusive. In order to face this challenge, we have hypothesized that a stable aneuploid cell line with increased chromosomes serves as a useful material for assisting MP exploration. Ker-CT cell line with trisomy at chromosome 5 and 20 was selected for this purpose. With a combination strategy of RNA-Seq and LC-MS/MS, a total of 22 178 transcripts and 8846 proteins were identified in Ker-CT. Although the transcripts corresponding to 15 and 15 MP genes located at chromosome 5 and 20 were detected, none of the MPs were found in Ker-CT. Surprisingly, 3 MPs containing at least two unique non-nest peptides of length ≥9 amino acids were identified in Ker-CT, whose genes are located on chromosome 3 and 10, respectively. Furthermore, the 3 MPs were verified using the method of parallel reaction monitoring (PRM). These results suggest that the abnormal status of chromosomes may not only impact the expression of the corresponding genes in trisomy chromosomes, but also influence that of other chromosomes, which benefits MP discovery. The data obtained in this study are available via ProteomeXchange (PXD028647) and PeptideAtlas (PASS01700), respectively.
Collapse
|
15
|
Flashlight into the Function of Unannotated C11orf52 using Affinity Purification Mass Spectrometry. J Proteome Res 2021; 20:5340-5346. [PMID: 34739247 DOI: 10.1021/acs.jproteome.1c00540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
For an enhanced understanding of the biological mechanisms of human disease, it is essential to investigate protein functions. In a previous study, we developed a prediction method of gene ontology (GO) terms by the I-TASSER/COFACTOR result, and we applied this to uPE1 in chromosome 11. Here, to validate the bioinformatics prediction of C11orf52, we utilized affinity purification and mass spectrometry to identify interacting partners of C11orf52. Using immunoprecipitation methods with three different peptide tags (Myc, Flag, and 2B8) in HEK 293T cell lines, we identified 79 candidate proteins that are expected to interact with C11orf52. The results of a pathway analysis of the GO and STRING database with candidate proteins showed that C11orf52 could be related to signaling receptor binding, cell-cell adhesion, and ribosome biogenesis. Then, we selected three partner candidates of DSG1, JUP, and PTPN11 for verification of the interaction with C11orf52 and confirmed them by colocalization at the cell-cell junctions by coimmunofluorescence experiments. On the basis of this study, we expect that C11orf52 is related to the Wnt signaling pathway via DSG1 from the protein-protein interactions, given the results of a comprehensive analysis of the bioinformatic predictions. The data set is available at the ProteomeXchange consortium via PRIDE repository (PXD026986).
Collapse
|
16
|
The Arabidopsis PeptideAtlas: Harnessing worldwide proteomics data to create a comprehensive community proteomics resource. THE PLANT CELL 2021; 33:3421-3453. [PMID: 34411258 PMCID: PMC8566204 DOI: 10.1093/plcell/koab211] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 08/13/2021] [Indexed: 05/02/2023]
Abstract
We developed a resource, the Arabidopsis PeptideAtlas (www.peptideatlas.org/builds/arabidopsis/), to solve central questions about the Arabidopsis thaliana proteome, such as the significance of protein splice forms and post-translational modifications (PTMs), or simply to obtain reliable information about specific proteins. PeptideAtlas is based on published mass spectrometry (MS) data collected through ProteomeXchange and reanalyzed through a uniform processing and metadata annotation pipeline. All matched MS-derived peptide data are linked to spectral, technical, and biological metadata. Nearly 40 million out of ∼143 million MS/MS (tandem MS) spectra were matched to the reference genome Araport11, identifying ∼0.5 million unique peptides and 17,858 uniquely identified proteins (only isoform per gene) at the highest confidence level (false discovery rate 0.0004; 2 non-nested peptides ≥9 amino acid each), assigned canonical proteins, and 3,543 lower-confidence proteins. Physicochemical protein properties were evaluated for targeted identification of unobserved proteins. Additional proteins and isoforms currently not in Araport11 were identified that were generated from pseudogenes, alternative start, stops, and/or splice variants, and small Open Reading Frames; these features should be considered when updating the Arabidopsis genome. Phosphorylation can be inspected through a sophisticated PTM viewer. PeptideAtlas is integrated with community resources including TAIR, tracks in JBrowse, PPDB, and UniProtKB. Subsequent PeptideAtlas builds will incorporate millions more MS/MS data.
Collapse
|
17
|
Abstract
The study of proteins circulating in blood offers tremendous opportunities to diagnose, stratify, or possibly prevent diseases. With recent technological advances and the urgent need to understand the effects of COVID-19, the proteomic analysis of blood-derived serum and plasma has become even more important for studying human biology and pathophysiology. Here we provide views and perspectives about technological developments and possible clinical applications that use mass-spectrometry(MS)- or affinity-based methods. We discuss examples where plasma proteomics contributed valuable insights into SARS-CoV-2 infections, aging, and hemostasis and the opportunities offered by combining proteomics with genetic data. As a contribution to the Human Proteome Organization (HUPO) Human Plasma Proteome Project (HPPP), we present the Human Plasma PeptideAtlas build 2021-07 that comprises 4395 canonical and 1482 additional nonredundant human proteins detected in 240 MS-based experiments. In addition, we report the new Human Extracellular Vesicle PeptideAtlas 2021-06, which comprises five studies and 2757 canonical proteins detected in extracellular vesicles circulating in blood, of which 74% (2047) are in common with the plasma PeptideAtlas. Our overview summarizes the recent advances, impactful applications, and ongoing challenges for translating plasma proteomics into utility for precision medicine.
Collapse
|
18
|
Progress Identifying and Analyzing the Human Proteome: 2021 Metrics from the HUPO Human Proteome Project. J Proteome Res 2021; 20:5227-5240. [PMID: 34670092 DOI: 10.1021/acs.jproteome.1c00590] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The 2021 Metrics of the HUPO Human Proteome Project (HPP) show that protein expression has now been credibly detected (neXtProt PE1 level) for 18 357 (92.8%) of the 19 778 predicted proteins coded in the human genome, a gain of 483 since 2020 from reports throughout the world reanalyzed by the HPP. Conversely, the number of neXtProt PE2, PE3, and PE4 missing proteins has been reduced by 478 to 1421. This represents remarkable progress on the proteome parts list. The utilization of proteomics in a broad array of biological and clinical studies likewise continues to expand with many important findings and effective integration with other omics platforms. We present highlights from the Immunopeptidomics, Glycoproteomics, Infectious Disease, Cardiovascular, Musculo-Skeletal, Liver, and Cancers B/D-HPP teams and from the Knowledgebase, Mass Spectrometry, Antibody Profiling, and Pathology resource pillars, as well as ethical considerations important to the clinical utilization of proteomics and protein biomarkers.
Collapse
|
19
|
Abstract
All living organisms depend on tightly regulated cellular networks to control biological functions. Proteolysis is an important irreversible post-translational modification that regulates most, if not all, cellular processes. Proteases are a large family of enzymes that perform hydrolysis of protein substrates, leading to protein activation or degradation. The 473 known and 90 putative human proteases are divided into 5 main mechanistic groups: metalloproteases, serine proteases, cysteine proteases, threonine proteases, and aspartic acid proteases. Proteases are fundamental to all biological systems, and when dysregulated they profoundly influence disease progression. Inhibiting proteases has led to effective therapies for viral infections, cardiovascular disorders, and blood coagulation just to name a few. Between 5 and 10% of all pharmaceutical targets are proteases, despite limited knowledge about their biological roles. More than 50% of all human proteases have no known substrates. We present here a comprehensive list of all current known human proteases. We also present current and novel biochemical tools to characterize protease functions in vitro, in vivo, and ex vivo. These tools make it achievable to define both beneficial and detrimental activities of proteases in health and disease.
Collapse
|
20
|
Proximity labeling and other novel mass spectrometric approaches for spatiotemporal protein dynamics. Expert Rev Proteomics 2021; 18:757-765. [PMID: 34496693 PMCID: PMC8650568 DOI: 10.1080/14789450.2021.1976149] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 08/31/2021] [Indexed: 12/12/2022]
Abstract
BACKGROUND Proteins are highly dynamic and their biological function is controlled by not only temporal abundance changes but also via regulated protein-protein interaction networks, which respond to internal and external perturbations. A wealth of novel analytical reagents and workflows allow studying spatiotemporal protein environments with great granularity while maintaining high throughput and ease of analysis. AREAS COVERED We review technology advances for measuring protein-protein proximity interactions with an emphasis on proximity labeling, and briefly summarize other spatiotemporal approaches including protein localization, and their dynamic changes over time, specifically in human cells and mammalian tissues. We focus especially on novel technologies and workflows emerging within the past 5 years. This includes enrichment-based techniques (proximity labeling and crosslinking), separation-based techniques (organelle fractionation and size exclusion chromatography), and finally sorting-based techniques (laser capture microdissection and mass spectrometry imaging). EXPERT OPINION Spatiotemporal proteomics is a key step in assessing biological complexity, understanding refined regulatory mechanisms, and forming protein complexes and networks. Studying protein dynamics across space and time holds promise for gaining deep insights into how protein networks may be perturbed during disease and aging processes, and offer potential avenues for therapeutic interventions, drug discovery, and biomarker development.
Collapse
|
21
|
Proteomes Are of Proteoforms: Embracing the Complexity. Proteomes 2021; 9:38. [PMID: 34564541 PMCID: PMC8482110 DOI: 10.3390/proteomes9030038] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 08/24/2021] [Accepted: 08/29/2021] [Indexed: 12/17/2022] Open
Abstract
Proteomes are complex-much more so than genomes or transcriptomes. Thus, simplifying their analysis does not simplify the issue. Proteomes are of proteoforms, not canonical proteins. While having a catalogue of amino acid sequences provides invaluable information, this is the Proteome-lite. To dissect biological mechanisms and identify critical biomarkers/drug targets, we must assess the myriad of proteoforms that arise at any point before, after, and between translation and transcription (e.g., isoforms, splice variants, and post-translational modifications [PTM]), as well as newly defined species. There are numerous analytical methods currently used to address proteome depth and here we critically evaluate these in terms of the current 'state-of-the-field'. We thus discuss both pros and cons of available approaches and where improvements or refinements are needed to quantitatively characterize proteomes. To enable a next-generation approach, we suggest that advances lie in transdisciplinarity via integration of current proteomic methods to yield a unified discipline that capitalizes on the strongest qualities of each. Such a necessary (if not revolutionary) shift cannot be accomplished by a continued primary focus on proteo-genomics/-transcriptomics. We must embrace the complexity. Yes, these are the hard questions, and this will not be easy…but where is the fun in easy?
Collapse
|
22
|
Proteogenomics Integrating Novel Junction Peptide Identification Strategy Discovers Three Novel Protein Isoforms of Human NHSL1 and EEF1B2. J Proteome Res 2021; 20:5294-5303. [PMID: 34420305 DOI: 10.1021/acs.jproteome.1c00373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In eukaryotes, alternative pre-mRNA splicing allows a single gene to encode different protein isoforms that function in many biological processes, and they are used as biomarkers or therapeutic targets for diseases. Although protein isoforms in the human genome are well annotated, we speculate that some low-abundance protein isoforms may still be under-annotated because most genes have a primary coding product and alternative protein isoforms tend to be under-expressed. A peptide coencoded by a novel exon and an annotated exon separated by an intron is known as a novel junction peptide. In the absence of known transcripts and homologous proteins, traditional whole-genome six-frame translation-based proteogenomics cannot identify novel junction peptides, and it cannot capture novel alternative splice sites. In this article, we first propose a strategy and tool for identifying novel junction peptides, called CJunction, which we then integrate into a proteogenomics process specifically designed for novel protein isoform discovery and apply to the analysis of a deep-coverage HeLa mass spectrometry data set with identifier PXD004452 in ProteomeXchange. We succeeded in identifying and validating three novel protein isoforms of two functionally important genes, NHSL1 (causative gene of Nance-Horan syndrome) and EEF1B2 (translation elongation factor), which validate our hypothesis. These novel protein isoforms have significant sequence differences from the annotated gene-coding products introduced by the novel N-terminal, suggesting that they may play importantly different functions.
Collapse
|
23
|
The human melanoma proteome atlas-Defining the molecular pathology. Clin Transl Med 2021; 11:e473. [PMID: 34323403 PMCID: PMC8255060 DOI: 10.1002/ctm2.473] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Revised: 06/03/2021] [Accepted: 06/08/2021] [Indexed: 01/19/2023] Open
Abstract
The MM500 study is an initiative to map the protein levels in malignant melanoma tumor samples, focused on in-depth histopathology coupled to proteome characterization. The protein levels and localization were determined for a broad spectrum of diverse, surgically isolated melanoma tumors originating from multiple body locations. More than 15,500 proteoforms were identified by mass spectrometry, from which chromosomal and subcellular localization was annotated within both primary and metastatic melanoma. The data generated by global proteomic experiments covered 72% of the proteins identified in the recently reported high stringency blueprint of the human proteome. This study contributes to the NIH Cancer Moonshot initiative combining detailed histopathological presentation with the molecular characterization for 505 melanoma tumor samples, localized in 26 organs from 232 patients.
Collapse
|
24
|
Abstract
Mass spectra provide the ultimate evidence to support the findings of mass spectrometry proteomics studies in publications, and it is therefore crucial to be able to trace the conclusions back to the spectra. The Universal Spectrum Identifier (USI) provides a standardized mechanism for encoding a virtual path to any mass spectrum contained in datasets deposited to public proteomics repositories. USI enables greater transparency of spectral evidence, with more than 1 billion USI identifications from over 3 billion spectra already available through ProteomeXchange repositories.
Collapse
|
25
|
A Multi-Omics Study of Human Testis and Epididymis. Molecules 2021; 26:molecules26113345. [PMID: 34199411 PMCID: PMC8199593 DOI: 10.3390/molecules26113345] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 05/22/2021] [Accepted: 05/24/2021] [Indexed: 12/12/2022] Open
Abstract
The human testis and epididymis play critical roles in male fertility, including the spermatogenesis process, sperm storage, and maturation. However, the unique functions of the two organs had not been systematically studied. Herein, we provide a systematic and comprehensive multi-omics study between testis and epididymis. RNA-Seq profiling detected and quantified 19,653 in the testis and 18,407 in the epididymis. Proteomic profiling resulted in the identification of a total of 11,024 and 10,386 proteins in the testis and epididymis, respectively, including 110 proteins that previously have been classified as MPs (missing proteins). Furthermore, Five MPs expressed in testis were validated by the MRM method. Subsequently, multi-omcis between testis and epididymis were performed, including biological functions and pathways of DEGs (Differentially Expressed Genes) in each group, revealing that those differences were related to spermatogenesis, male gamete generation, as well as reproduction. In conclusion, this study can help us find the expression regularity of missing protein and help related scientists understand the physiological functions of testis and epididymis more deeply.
Collapse
|
26
|
Proteomics, Personalized Medicine and Cancer. Cancers (Basel) 2021; 13:2512. [PMID: 34063807 PMCID: PMC8196570 DOI: 10.3390/cancers13112512] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 05/12/2021] [Accepted: 05/17/2021] [Indexed: 02/05/2023] Open
Abstract
As of 2020 the human genome and proteome are both at >90% completion based on high stringency analyses. This has been largely achieved by major technological advances over the last 20 years and has enlarged our understanding of human health and disease, including cancer, and is supporting the current trend towards personalized/precision medicine. This is due to improved screening, novel therapeutic approaches and an increased understanding of underlying cancer biology. However, cancer is a complex, heterogeneous disease modulated by genetic, molecular, cellular, tissue, population, environmental and socioeconomic factors, which evolve with time. In spite of recent advances in treatment that have resulted in improved patient outcomes, prognosis is still poor for many patients with certain cancers (e.g., mesothelioma, pancreatic and brain cancer) with a high death rate associated with late diagnosis. In this review we overview key hallmarks of cancer (e.g., autophagy, the role of redox signaling), current unmet clinical needs, the requirement for sensitive and specific biomarkers for early detection, surveillance, prognosis and drug monitoring, the role of the microbiome and the goals of personalized/precision medicine, discussing how emerging omics technologies can further inform on these areas. Exemplars from recent onco-proteogenomic-related publications will be given. Finally, we will address future perspectives, not only from the standpoint of perceived advances in treatment, but also from the hurdles that have to be overcome.
Collapse
|
27
|
Abstract
Proteins are the ultimate product of gene expression. As they hinge between gene transcription and phenotype, they offer a more realistic perspective of toxicopathic effects, responses and even susceptibility to insult than targeting genes and mRNAs while dodging some inter-individual variability that hinders measuring downstream endpoints like metabolites or enzyme activity. Toxicologists have long focused on proteins as biomarkers but the advent of proteomics shifted risk assessment from narrow single-endpoint analyses to whole-proteome screening, enabling deriving protein-centric adverse outcome pathways (AOPs), which are pivotal for the derivation of Systems Biology informally named Systems Toxicology. Especially if coupled pathology, the identification of molecular initiating events (MIEs) and AOPs allow predictive modeling of toxicological pathways, which now stands as the frontier for the next generation of toxicologists. Advances in mass spectrometry, bioinformatics, protein databases and top-down proteomics create new opportunities for mechanistic and effects-oriented research in all fields, from ecotoxicology to pharmacotoxicology.
Collapse
|
28
|
Reflections on the HUPO Human Proteome Project, the Flagship Project of the Human Proteome Organization, at 10 Years. Mol Cell Proteomics 2021; 20:100062. [PMID: 33640492 PMCID: PMC8058560 DOI: 10.1016/j.mcpro.2021.100062] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 02/04/2021] [Accepted: 02/05/2021] [Indexed: 02/08/2023] Open
Abstract
We celebrate the 10th anniversary of the launch of the HUPO Human Proteome Project (HPP) and its major milestone of confident detection of at least one protein from each of 90% of the predicted protein-coding genes, based on the output of the entire proteomics community. The Human Genome Project reached a similar decadal milestone 20 years ago. The HPP has engaged proteomics teams around the world, strongly influenced data-sharing, enhanced quality assurance, and issued stringent guidelines for claims of detecting previously "missing proteins." This invited perspective complements papers on "A High-Stringency Blueprint of the Human Proteome" and "The Human Proteome Reaches a Major Milestone" in special issues of Nature Communications and Journal of Proteome Research, respectively, released in conjunction with the October 2020 virtual HUPO Congress and its celebration of the 10th anniversary of the HUPO HPP.
Collapse
|
29
|
Ethical Principles, Constraints and Opportunities in Clinical Proteomics. Mol Cell Proteomics 2021; 20:100046. [PMID: 33453411 PMCID: PMC7950205 DOI: 10.1016/j.mcpro.2021.100046] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 01/04/2021] [Indexed: 12/11/2022] Open
Abstract
Recent advances in mass spectrometry (MS)-based proteomics have vastly increased the quality and scope of biological information that can be derived from human samples. These advances have rendered current workflows increasingly applicable in biomedical and clinical contexts. As proteomics is poised to take an important role in the clinic, associated ethical responsibilities increase in tandem with impacts on the health, privacy, and wellbeing of individuals. We conducted and here report a systematic literature review of ethical issues in clinical proteomics. We add our perspectives from a background of bioethics, the results of our accompanying paper extracting individual-sensitive results from patient samples, and the literature addressing similar issues in genomics. The spectrum of potential issues ranges from patient re-identification to incidental findings of clinical significance. The latter can be divided into actionable and unactionable findings. Some of these have the potential to be employed in discriminatory or privacy-infringing ways. However, incidental findings may also have great positive potential. A plasma proteome profile, for instance, could inform on the general health or disease status of an individual regardless of the narrow diagnostic question that prompted it. We suggest that early discussion of ethical issues in clinical proteomics can ensure that eventual healthcare practices and regulations reflect the considered judgment of the community and anticipate opportunities and problems that may arise as the technology matures.
Collapse
|
30
|
Proteome-based pathology: the next frontier in precision medicine. EXPERT REVIEW OF PRECISION MEDICINE AND DRUG DEVELOPMENT 2020; 6:1-4. [PMID: 33768159 DOI: 10.1080/23808993.2021.1854611] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
31
|
Abstract
Cardiovascular diseases remain the most rapidly rising contributing factor of all-cause mortality and the leading cause of inpatient hospitalization worldwide, with costs exceeding $30 billion annually in North America. Cell surface and membrane-associated proteins play an important role in cardiomyocyte biology and are involved in the pathogenesis of many human heart diseases. In cardiomyocytes, membrane proteins serve as critical signaling receptors, Ca2+ cycling regulators, and electrical propagation regulators, all functioning in concert to maintain spontaneous and synchronous contractions of cardiomyocytes. Membrane proteins are excellent pharmaceutical targets due to their uniquely exposed position within the cell. Perturbations in cardiac membrane protein localization and function have been implicated in the progression and pathogenesis of many heart diseases. However, previous attempts at profiling the cardiac membrane proteome have yielded limited results due to poor technological developments for isolating hydrophobic, low-abundance membrane proteins. Comprehensive mapping and characterization of the cardiac membrane proteome thereby remains incomplete. This review will focus on recent advances in mapping the cardiac membrane proteome and the role of novel cardiac membrane proteins in the healthy and the diseased heart.
Collapse
|
32
|
|