1
|
Reuschenbach M, Drees F, Leupold MS, Tintrop LK, Schmidt TC, Renner G. qPeaks: A Linear Regression-Based Asymmetric Peak Model for Parameter-Free Automatized Detection and Characterization of Chromatographic Peaks in Non-Target Screening Data. Anal Chem 2024; 96:7120-7129. [PMID: 38666514 DOI: 10.1021/acs.analchem.4c00494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
We present qPeaks (quality peaks), a novel, user-parameter-free algorithm for peak detection and peak characterization applicable to chromatographic data. The algorithm is based on a linearizable regression model that analyzes asymmetric peaks and estimates the specific uncertainties associated with the peak regression parameters. The uncertainties of the parameters are used to derive a data quality score DQSpeak, rendering low reliability results more transparent during processing and allowing for the prioritization of generated features. High DQSpeak chromatographic peaks have a lower chance of being classified as false-positive and show higher repeatability over multiple measurements. The high efficiency of the algorithm makes it particularly useful for application within processing routines of nontarget screening through chromatography coupled with high-resolution mass spectrometry. qPeaks is integrated into the qAlgorithms nontarget screening processing toolbox and appends a parameter-free chromatographic peak detection and characterization step to it. With qAlgorithms, now high-resolution mass spectra are centroided using the qCentroids algorithms, centroids are clustered to form extracted ion chromatograms (EICs) with the qBinning algorithm, and chromatographic peaks are found on the generated EICs with qPeaks. However, all tools from qAlgorithms can also be used independently.
Collapse
Affiliation(s)
- Max Reuschenbach
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
| | - Felix Drees
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
| | - Michael S Leupold
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
| | - Lucie K Tintrop
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
| | - Torsten C Schmidt
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
- IWW Water Center, Moritzstr.26, Mülheim an der Ruhr 45476, Germany
| | - Gerrit Renner
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr.5, Essen 45141, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr.2, Essen 45141, Germany
| |
Collapse
|
2
|
van Herwerden D, O’Brien JW, Lege S, Pirok BWJ, Thomas KV, Samanipour S. Cumulative Neutral Loss Model for Fragment Deconvolution in Electrospray Ionization High-Resolution Mass Spectrometry Data. Anal Chem 2023; 95:12247-12255. [PMID: 37549176 PMCID: PMC10448439 DOI: 10.1021/acs.analchem.3c00896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Accepted: 07/03/2023] [Indexed: 08/09/2023]
Abstract
Clean high-resolution mass spectra (HRMS) are essential to a successful structural elucidation of an unknown feature during nontarget analysis (NTA) workflows. This is a crucial step, particularly for the spectra generated during data-independent acquisition or during direct infusion experiments. The most commonly available tools only take advantage of the time domain for spectral cleanup. Here, we present an algorithm that combines the time domain and mass domain information to perform spectral deconvolution. The algorithm employs a probability-based cumulative neutral loss (CNL) model for fragment deconvolution. The optimized model, with a mass tolerance of 0.005 Da and a scoreCNL threshold of 0.00, was able to achieve a true positive rate (TPr) of 95.0%, a false discovery rate (FDr) of 20.6%, and a reduction rate of 35.4%. Additionally, the CNL model was extensively tested on real samples containing predominantly pesticides at different concentration levels and with matrix effects. Overall, the model was able to obtain a TPr above 88.8% with FD rates between 33 and 79% and reduction rates between 9 and 45%. Finally, the CNL model was compared with the retention time difference method and peak shape correlation analysis, showing that a combination of correlation analysis and the CNL model was the most effective for fragment deconvolution, obtaining a TPr of 84.7%, an FDr of 54.4%, and a reduction rate of 51.0%.
Collapse
Affiliation(s)
- Denice van Herwerden
- Van
’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1012 WX, The Netherlands
| | - Jake W. O’Brien
- Van
’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1012 WX, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Brisbane 4102, Australia
| | - Sascha Lege
- Agilent
Technologies Deutschland GmbH, Waldbronn 76337, Germany
| | - Bob W. J. Pirok
- Van
’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1012 WX, The Netherlands
| | - Kevin V. Thomas
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Brisbane 4102, Australia
| | - Saer Samanipour
- Van
’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam 1012 WX, The Netherlands
- Queensland
Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Brisbane 4102, Australia
- UvA
Data Science Center, University of Amsterdam, Amsterdam 1012 WP, The Netherlands
| |
Collapse
|
3
|
Renner G, Reuschenbach M. Critical review on data processing algorithms in non-target screening: challenges and opportunities to improve result comparability. Anal Bioanal Chem 2023; 415:4111-4123. [PMID: 37380744 PMCID: PMC10328864 DOI: 10.1007/s00216-023-04776-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 04/23/2023] [Accepted: 05/15/2023] [Indexed: 06/30/2023]
Abstract
Non-target screening (NTS) is a powerful environmental and analytical chemistry approach for detecting and identifying unknown compounds in complex samples. High-resolution mass spectrometry has enhanced NTS capabilities but created challenges in data analysis, including data preprocessing, peak detection, and feature extraction. This review provides an in-depth understanding of NTS data processing methods, focusing on centroiding, extracted ion chromatogram (XIC) building, chromatographic peak characterization, alignment, componentization, and prioritization of features. We discuss the strengths and weaknesses of various algorithms, the influence of user input parameters on the results, and the need for automated parameter optimization. We address uncertainty and data quality issues, emphasizing the importance of incorporating confidence intervals and raw data quality assessment in data processing workflows. Furthermore, we highlight the need for cross-study comparability and propose potential solutions, such as utilizing standardized statistics and open-access data exchange platforms. In conclusion, we offer future perspectives and recommendations for developers and users of NTS data processing algorithms and workflows. By addressing these challenges and capitalizing on the opportunities presented, the NTS community can advance the field, improve the reliability of results, and enhance data comparability across different studies.
Collapse
Affiliation(s)
- Gerrit Renner
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr. 5, Essen, D-45141, NRW, Germany.
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr. 2, Essen, D-45141, NRW, Germany.
| | - Max Reuschenbach
- Instrumental Analytical Chemistry, University of Duisburg-Essen, Universitätsstr. 5, Essen, D-45141, NRW, Germany
- Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr. 2, Essen, D-45141, NRW, Germany
| |
Collapse
|
4
|
Feraud M, O'Brien JW, Samanipour S, Dewapriya P, van Herwerden D, Kaserzon S, Wood I, Rauert C, Thomas KV. InSpectra - A platform for identifying emerging chemical threats. J Hazard Mater 2023; 455:131486. [PMID: 37172382 DOI: 10.1016/j.jhazmat.2023.131486] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 04/20/2023] [Accepted: 04/23/2023] [Indexed: 05/14/2023]
Abstract
Non-target analysis (NTA) employing high-resolution mass spectrometry (HRMS) coupled with liquid chromatography is increasingly being used to identify chemicals of biological relevance. HRMS datasets are large and complex making the identification of potentially relevant chemicals extremely challenging. As they are recorded in vendor-specific formats, interpreting them is often reliant on vendor-specific software that may not accommodate advancements in data processing. Here we present InSpectra, a vendor independent automated platform for the systematic detection of newly identified emerging chemical threats. InSpectra is web-based, open-source/access and modular providing highly flexible and extensible NTA and suspect screening workflows. As a cloud-based platform, InSpectra exploits parallel computing and big data archiving capabilities with a focus for sharing and community curation of HRMS data. InSpectra offers a reproducible and transparent approach for the identification, tracking and prioritisation of emerging chemical threats.
Collapse
Affiliation(s)
- Mathieu Feraud
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Jake W O'Brien
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Netherlands.
| | - Saer Samanipour
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Netherlands; UvA Data Science Center, University of Amsterdam, Netherlands.
| | - Pradeep Dewapriya
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Denice van Herwerden
- Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Netherlands
| | - Sarit Kaserzon
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Ian Wood
- School of Mathematics and Physics, The University of Queensland, Australia
| | - Cassandra Rauert
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| | - Kevin V Thomas
- Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Australia
| |
Collapse
|
5
|
Boelrijk J, van Herwerden D, Ensing B, Forré P, Samanipour S. Predicting RP-LC retention indices of structurally unknown chemicals from mass spectrometry data. J Cheminform 2023; 15:28. [PMID: 36829215 PMCID: PMC9960388 DOI: 10.1186/s13321-023-00699-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 02/13/2023] [Indexed: 02/26/2023] Open
Abstract
Non-target analysis combined with liquid chromatography high resolution mass spectrometry is considered one of the most comprehensive strategies for the detection and identification of known and unknown chemicals in complex samples. However, many compounds remain unidentified due to data complexity and limited number structures in chemical databases. In this work, we have developed and validated a novel machine learning algorithm to predict the retention index (r[Formula: see text]) values for structurally (un)known chemicals based on their measured fragmentation pattern. The developed model, for the first time, enabled the predication of r[Formula: see text] values without the need for the exact structure of the chemicals, with an [Formula: see text] of 0.91 and 0.77 and root mean squared error (RMSE) of 47 and 67 r[Formula: see text] units for the NORMAN ([Formula: see text]) and amide ([Formula: see text]) test sets, respectively. This fragment based model showed comparable accuracy in r[Formula: see text] prediction compared to conventional descriptor-based models that rely on known chemical structure, which obtained an [Formula: see text] of 0.85 with an RMSE of 67.
Collapse
Affiliation(s)
- Jim Boelrijk
- AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands. .,Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands.
| | - Denice van Herwerden
- grid.7177.60000000084992262Van’t Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, The Netherlands
| | - Bernd Ensing
- grid.7177.60000000084992262AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands ,Computational Chemistry Group, Van’t Hoff Institute for Molecular Sciences (HIMS), Amsterdam, The Netherlands
| | - Patrick Forré
- grid.7177.60000000084992262AI4Science Lab, University of Amsterdam, Amsterdam, The Netherlands ,grid.7177.60000000084992262Institute for Informatics, University of Amsterdam, Amsterdam, The Netherlands
| | - Saer Samanipour
- Van't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, The Netherlands. .,UvA Data Science Center, University of Amsterdam, Amsterdam, The Netherlands. .,Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Woolloongabba, Australia.
| |
Collapse
|
6
|
Reuschenbach M, Hohrenk-danzouma LL, Schmidt TC, Renner G. Development of a scoring parameter to characterize data quality of centroids in high-resolution mass spectra. Anal Bioanal Chem. [PMID: 35871703 PMCID: PMC9411079 DOI: 10.1007/s00216-022-04224-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 05/31/2022] [Accepted: 07/06/2022] [Indexed: 11/29/2022]
Abstract
High-resolution mass spectrometry is widely used in many research fields allowing for accurate mass determinations. In this context, it is pretty standard that high-resolution profile mode mass spectra are reduced to centroided data, which many data processing routines rely on for further evaluation. Yet information on the peak profile quality is not conserved in those approaches; i.e., describing results reliability is almost impossible. Therefore, we overcome this limitation by developing a new statistical parameter called data quality score (DQS). For the DQS calculations, we performed a very fast and robust regression analysis of the individual high-resolution peak profiles and considered error propagation to estimate the uncertainties of the regression coefficients. We successfully validated the new algorithm with the vendor-specific algorithm implemented in Proteowizard’s msConvert. Moreover, we show that the DQS is a sum parameter associated with centroid accuracy and precision. We also demonstrate the benefit of the new algorithm in nontarget screenings as the DQS prioritizes signals that are not influenced by non-resolved isobaric ions or isotopic fine structures. The algorithm is implemented in Python, R, and Julia programming languages and supports multi- and cross-platform downstream data handling.
Collapse
|