1
|
Chu F, Jenson SC, Barente AS, Heller NC, Merkley ED, Jarman KH. MARLOWE: An Untargeted Proteomics, Statistical Approach to Taxonomic Classification for Forensics. J Proteome Res 2025; 24:995-1007. [PMID: 39898467 DOI: 10.1021/acs.jproteome.3c00477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2025]
Abstract
General proteomics research for fundamental science typically addresses laboratory- or patient-derived samples of known origin and composition. However, in a few research areas, such as environmental proteomics, clinical identification of infectious organisms, archeology, art/cultural history, and forensics, attributing the origin of a protein-containing sample to the organisms that produced it is a central focus. A small number of groups have approached this problem and developed software tools for taxonomic characterization and/or identification using bottom-up proteomics. Most such tools identify peptides via database search, and many rely on organism-specific peptides as markers. Our group recently introduced MARLOWE, a software tool for taxonomic characterization of unknown samples based on de novo peptide identification and signal-erosion-resistant strong peptides, which are shared peptides distributed in a taxonomy-dependent manner. In the current work, we further characterize the utility of MARLOWE using publicly available proteomics data from forensically-relevant samples. MARLOWE characterizes samples based on their protein profile, and returns ranked organism lists of potential contributors and taxonomic scores based on shared strong peptides between organisms. Overall, the correct characterization rate ranges between 44 and 100%, depending on the sample type and data acquisition parameters (with lower numbers associated with lower-quality data sets). MARLOWE demonstrates successful characterization of true contributors and close relatives, and provides sufficient specificity to distinguish certain microbial species. MARLOWE demonstrates its ability to provide insight into potential taxonomic sources for a wide range of sample types without prior assumptions about sample contents. This approach can find utility in forensic science and also broadly in bioanalytical applications that utilize proteomics approaches for taxonomic characterization.
Collapse
Affiliation(s)
- Fanny Chu
- Chemical & Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Sarah C Jenson
- Chemical & Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Anthony S Barente
- Chemical & Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, United States
| | - Natalie C Heller
- Applied Statistics and Computational Modeling Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Eric D Merkley
- Chemical & Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| | - Kristin H Jarman
- Chemical & Biological Signatures Group, Pacific Northwest National Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
2
|
Van Den Bossche T, Beslic D, van Puyenbroeck S, Suomi T, Holstein T, Martens L, Elo LL, Muth T. Metaproteomics Beyond Databases: Addressing the Challenges and Potentials of De Novo Sequencing. Proteomics 2025:e202400321. [PMID: 39888246 DOI: 10.1002/pmic.202400321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 01/09/2025] [Accepted: 01/10/2025] [Indexed: 02/01/2025]
Abstract
Metaproteomics enables the large-scale characterization of microbial community proteins, offering crucial insights into their taxonomic composition, functional activities, and interactions within their environments. By directly analyzing proteins, metaproteomics offers insights into community phenotypes and the roles individual members play in diverse ecosystems. Although database-dependent search engines are commonly used for peptide identification, they rely on pre-existing protein databases, which can be limiting for complex, poorly characterized microbiomes. De novo sequencing presents a promising alternative, which derives peptide sequences directly from mass spectra without requiring a database. Over time, this approach has evolved from manual annotation to advanced graph-based, tag-based, and deep learning-based methods, significantly improving the accuracy of peptide identification. This Viewpoint explores the evolution, advantages, limitations, and future opportunities of de novo sequencing in metaproteomics. We highlight recent technological advancements that have improved its potential for detecting unsequenced species and for providing deeper functional insights into microbial communities.
Collapse
Affiliation(s)
- Tim Van Den Bossche
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Denis Beslic
- Centre for Artificial Intelligence in Public Health Research, Robert Koch Institute, Berlin, Germany
| | - Sam van Puyenbroeck
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Tomi Suomi
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
| | - Tanja Holstein
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
- Data Competence Center MF 2, Robert Koch Institute, Berlin, Germany
| | - Lennart Martens
- VIB - UGent Center for Medical Biotechnology, VIB, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Medicine and Health Sciences, Ghent University, Ghent, Belgium
| | - Laura L Elo
- Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland
- Institute of Biomedicine, University of Turku, Turku, Finland
| | - Thilo Muth
- Data Competence Center MF 2, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
3
|
Proteins, possibly human, found in World War II concentration camp artifact. Sci Rep 2022; 12:12369. [PMID: 35858951 PMCID: PMC9300652 DOI: 10.1038/s41598-022-16192-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 07/06/2022] [Indexed: 11/23/2022] Open
Abstract
Museums displaying artifacts of the human struggle against oppression are often caught in their own internal struggle between presenting factual and unbiased descriptions of their collections, or relying on testament of survivors. Often this quandary is resolved in favor of what can be verified, not what is remembered. However, with improving instrumentation, methods and informatic approaches, science can help uncover evidence able to reconcile memory and facts. Following World War II, thousands of small, cement-like disks with numbers impressed on one side were found at concentration camps throughout Europe. Survivors claimed these disks were made of human cremains; museums erred on the side of caution—without documentation of the claims, was it justifiable to present them as fact? The ability to detect species relevant biological material in these disks could help resolve this question. Proteomic mass spectrometry of five disks revealed all contained proteins, including collagens and hemoglobins, suggesting they were made, at least in part, of animal remains. A new protein/informatics approach to species identification showed that while human was not always identified as the top contributor, human was the most likely explanation for one disk. To our knowledge, this is the first demonstration of protein recovery from cremains. Data are available via ProteomeXchange with identifier PXD035267.
Collapse
|
4
|
Yang H, Butler ER, Monier SA, Teubl J, Fenyö D, Ueberheide B, Siegel D. A predictive model for vertebrate bone identification from collagen using proteomic mass spectrometry. Sci Rep 2021; 11:10900. [PMID: 34035355 PMCID: PMC8149876 DOI: 10.1038/s41598-021-90231-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 05/06/2021] [Indexed: 11/17/2022] Open
Abstract
Proteogenomics is an increasingly common method for species identification as it allows for rapid and inexpensive interrogation of an unknown organism’s proteome—even when the proteome is partially degraded. The proteomic method typically uses tandem mass spectrometry to survey all peptides detectable in a sample that frequently contains hundreds or thousands of proteins. Species identification is based on detection of a small numbers of species-specific peptides. Genetic analysis of proteins by mass spectrometry, however, is a developing field, and the bone proteome, typically consisting of only two proteins, pushes the limits of this technology. Nearly 20% of highly confident spectra from modern human bone samples identify non-human species when searched against a vertebrate database—as would be necessary with a fragment of unknown bone. These non-human peptides are often the result of current limitations in mass spectrometry or algorithm interpretation errors. Consequently, it is difficult to know if a “species-specific” peptide used to identify a sample is actually present in that sample. Here we evaluate the causes of peptide sequence errors and propose an unbiased, probabilistic approach to determine the likelihood that a species is correctly identified from bone without relying on species-specific peptides.
Collapse
Affiliation(s)
- Heyi Yang
- Office of Chief Medical Examiner, 421 East 26th Street, New York, NY, 10016, USA
| | - Erin R Butler
- Office of Chief Medical Examiner, 421 East 26th Street, New York, NY, 10016, USA
| | - Samantha A Monier
- Office of Chief Medical Examiner, 421 East 26th Street, New York, NY, 10016, USA
| | - Jennifer Teubl
- Institute for Systems Genetics, Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY, 10016, USA
| | - David Fenyö
- Institute for Systems Genetics, Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY, 10016, USA
| | - Beatrix Ueberheide
- Institute for Systems Genetics, Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY, 10016, USA.,Department of Biochemistry and Molecular Pharmacology, Department of Neurology, Director Proteomics Laboratory, Division of Advanced Research Technologies, NYU Grossman School of Medicine, New York, NY, 10016, USA
| | - Donald Siegel
- Office of Chief Medical Examiner, 421 East 26th Street, New York, NY, 10016, USA.
| |
Collapse
|
5
|
Abstract
Proteomics, the large-scale study of all proteins of an organism or system, is a powerful tool for studying biological systems. It can provide a holistic view of the physiological and biochemical states of given samples through identification and quantification of large numbers of peptides and proteins. In forensic science, proteomics can be used as a confirmatory and orthogonal technique for well-built genomic analyses. Proteomics is highly valuable in cases where nucleic acids are absent or degraded, such as hair and bone samples. It can be used to identify body fluids, ethnic group, gender, individual, and estimate post-mortem interval using bone, muscle, and decomposition fluid samples. Compared to genomic analysis, proteomics can provide a better global picture of a sample. It has been used in forensic science for a wide range of sample types and applications. In this review, we briefly introduce proteomic methods, including sample preparation techniques, data acquisition using liquid chromatography-tandem mass spectrometry, and data analysis using database search, spectral library search, and de novo sequencing. We also summarize recent applications in the past decade of proteomics in forensic science with a special focus on human samples, including hair, bone, body fluids, fingernail, muscle, brain, and fingermark, and address the challenges, considerations, and future developments of forensic proteomics.
Collapse
|
6
|
O'Bryon I, Jenson SC, Merkley ED. Flying blind, or just flying under the radar? The underappreciated power of de novo methods of mass spectrometric peptide identification. Protein Sci 2020; 29:1864-1878. [PMID: 32713088 PMCID: PMC7454419 DOI: 10.1002/pro.3919] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/21/2020] [Accepted: 07/23/2020] [Indexed: 12/15/2022]
Abstract
Mass spectrometry-based proteomics is a popular and powerful method for precise and highly multiplexed protein identification. The most common method of analyzing untargeted proteomics data is called database searching, where the database is simply a collection of protein sequences from the target organism, derived from genome sequencing. Experimental peptide tandem mass spectra are compared to simplified models of theoretical spectra calculated from the translated genomic sequences. However, in several interesting application areas, such as forensics, archaeology, venomics, and others, a genome sequence may not be available, or the correct genome sequence to use is not known. In these cases, de novo peptide identification can play an important role. De novo methods infer peptide sequence directly from the tandem mass spectrum without reference to a sequence database, usually using graph-based or machine learning algorithms. In this review, we provide a basic overview of de novo peptide identification methods and applications, briefly covering de novo algorithms and tools, and focusing in more depth on recent applications from venomics, metaproteomics, forensics, and characterization of antibody drugs.
Collapse
Affiliation(s)
- Isabelle O'Bryon
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Sarah C. Jenson
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Eric D. Merkley
- Chemical and Biological SignaturesPacific Northwest National LaboratoryRichlandWashingtonUSA
| |
Collapse
|
7
|
Heller NC, Garrett AM, Merkley ED, Cendrowski SR, Melville AM, Arce JS, Jenson SC, Wahl KL, Jarman KH. Probabilistic Limit of Detection for Ricin Identification Using a Shotgun Proteomics Assay. Anal Chem 2019; 91:12399-12406. [PMID: 31490662 DOI: 10.1021/acs.analchem.9b02721] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Robust and highly specific methods for the detection of the protein toxin ricin are of interest to the law enforcement community. In previous studies, methods based on liquid chromatography-tandem mass spectrometry shotgun proteomics have been proposed. The successful implementation of this approach relies on specific data evaluation criteria addressing (1) the quality of the mass spectrometric data, (2) the confidence of peptide identifications (peptide-spectrum matches), and (3) the number and sequence specificity of peptides detected. We present such data evaluation criteria and use a novel approach to establish the limit of detection for this ricin assay. Specifically, we use logistic regression to determine the probability of detection for individual ricin peptides at different concentrations. We then apply basic rules from probability theory, combining these individual peptide probabilities into an overall assay limit of detection. This procedure yields an assay limit of detection for ricin at 42.5 ng on column or 21.25 ng/μL for a 2-μL injection. We also show that, despite the conventional wisdom that detergents are deleterious to mass spectrometric analyses, the presence of Tween-20 did not prevent detection of ricin peptides, and indeed assays performed in buffers that included Tween-20 gave better results than assays performed using other buffer formulations with or without detergent removal.
Collapse
Affiliation(s)
| | - Alaine M Garrett
- National Biodefense Analysis and Countermeasures Center , Operated by BNBI for the U.S. Department of Homeland Security Science and Technology Directorate , Frederick , Maryland , United States
| | | | - Stephen R Cendrowski
- National Biodefense Analysis and Countermeasures Center , Operated by BNBI for the U.S. Department of Homeland Security Science and Technology Directorate , Frederick , Maryland , United States
| | | | | | | | | | | |
Collapse
|
8
|
O’Bryon I, Tucker AE, Kaiser BLD, Wahl KL, Merkley ED. Constructing a Tandem Mass Spectral Library for Forensic Ricin Identification. J Proteome Res 2019; 18:3926-3935. [DOI: 10.1021/acs.jproteome.9b00377] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Isabelle O’Bryon
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Abigail E. Tucker
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Brooke L. D. Kaiser
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Karen L. Wahl
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| | - Eric D. Merkley
- Chemical and Biological Signature Sciences Group, Pacific Northwest National Laboratory, Richland, Washington 99354, United States
| |
Collapse
|
9
|
Applications and challenges of forensic proteomics. Forensic Sci Int 2019; 297:350-363. [DOI: 10.1016/j.forsciint.2019.01.022] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 01/09/2019] [Accepted: 01/13/2019] [Indexed: 12/23/2022]
|