1
|
Heuckeroth S, Damiani T, Smirnov A, Mokshyna O, Brungs C, Korf A, Smith JD, Stincone P, Dreolin N, Nothias LF, Hyötyläinen T, Orešič M, Karst U, Dorrestein PC, Petras D, Du X, van der Hooft JJJ, Schmid R, Pluskal T. Reproducible mass spectrometry data processing and compound annotation in MZmine 3. Nat Protoc 2024; 19:2597-2641. [PMID: 38769143 DOI: 10.1038/s41596-024-00996-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 02/26/2024] [Indexed: 05/22/2024]
Abstract
Untargeted mass spectrometry (MS) experiments produce complex, multidimensional data that are practically impossible to investigate manually. For this reason, computational pipelines are needed to extract relevant information from raw spectral data and convert it into a more comprehensible format. Depending on the sample type and/or goal of the study, a variety of MS platforms can be used for such analysis. MZmine is an open-source software for the processing of raw spectral data generated by different MS platforms. Examples include liquid chromatography-MS, gas chromatography-MS and MS-imaging. These data might typically be associated with various applications including metabolomics and lipidomics. Moreover, the third version of the software, described herein, supports the processing of ion mobility spectrometry (IMS) data. The present protocol provides three distinct procedures to perform feature detection and annotation of untargeted MS data produced by different instrumental setups: liquid chromatography-(IMS-)MS, gas chromatography-MS and (IMS-)MS imaging. For training purposes, example datasets are provided together with configuration batch files (i.e., list of processing steps and parameters) to allow new users to easily replicate the described workflows. Depending on the number of data files and available computing resources, we anticipate this to take between 2 and 24 h for new MZmine users and nonexperts. Within each procedure, we provide a detailed description for all processing parameters together with instructions/recommendations for their optimization. The main generated outputs are represented by aligned feature tables and fragmentation spectra lists that can be used by other third-party tools for further downstream analysis.
Collapse
Affiliation(s)
| | - Tito Damiani
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | | | - Olena Mokshyna
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Corinna Brungs
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Ansgar Korf
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
| | - Joshua David Smith
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic
- First Faculty of Medicine, Charles University, Prague, Czech Republic
| | | | | | - Louis-Félix Nothias
- University of Geneva, Geneva, Switzerland
- Université Côte d'Azur, CNRS, ICN, Nice, France
| | | | - Matej Orešič
- Örebro University, Örebro, Sweden
- University of Turku and Åbo Akademi University, Turku, Finland
| | - Uwe Karst
- University of Münster, Münster, Germany
| | - Pieter C Dorrestein
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
| | - Daniel Petras
- University of Tuebingen, Tuebingen, Germany
- University of California Riverside, Riverside, CA, USA
| | - Xiuxia Du
- University of North Carolina at Charlotte, Charlotte, NC, USA
| | - Justin J J van der Hooft
- Wageningen University & Research, Wageningen, the Netherlands
- University of Johannesburg, Johannesburg, South Africa
| | - Robin Schmid
- University of Münster, Münster, Germany.
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic.
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA.
| | - Tomáš Pluskal
- Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences, Prague, Czech Republic.
| |
Collapse
|
2
|
Tong J, Lu M, Wang R, An S, Wang J, Wang T, Xie C, Yu C. How Much Storage Precision Can Be Lost: Guidance for Near-Lossless Compression of Untargeted Metabolomics Mass Spectrometry Data. J Proteome Res 2024; 23:1702-1712. [PMID: 38640356 DOI: 10.1021/acs.jproteome.3c00851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2024]
Abstract
Several lossy compressors have achieved superior compression rates for mass spectrometry (MS) data at the cost of storage precision. Currently, the impacts of precision losses on MS data processing have not been thoroughly evaluated, which is critical for the future development of lossy compressors. We first evaluated different storage precision (32 bit and 64 bit) in lossless mzML files. We then applied 10 truncation transformations to generate precision-lossy files: five relative errors for intensities and five absolute errors for m/z values. MZmine3 and XCMS were used for feature detection and GNPS for compound annotation. Lastly, we compared Precision, Recall, F1 - score, and file sizes between lossy files and lossless files under different conditions. Overall, we revealed that the discrepancy between 32 and 64 bit precision was under 1%. We proposed an absolute m/z error of 10-4 and a relative intensity error of 2 × 10-2, adhering to a 5% error threshold (F1 - scores above 95%). For a stricter 1% error threshold (F1 - scores above 99%), an absolute m/z error of 2 × 10-5 and a relative intensity error of 2 × 10-3 were advised. This guidance aims to help researchers improve lossy compression algorithms and minimize the negative effects of precision losses on downstream data processing.
Collapse
Affiliation(s)
- Junjie Tong
- Central Hospital Affiliated to Shandong First Medical University, Jinan 250000, Shandong, China
- Key Laboratory of Tropical Medicinal Plant Chemistry of Ministry of Education, College of Chemistry and Chemical Engineering, Hainan Normal University, Haikou 571158, Hainan, China
| | - Miaoshan Lu
- Central Hospital Affiliated to Shandong First Medical University, Jinan 250000, Shandong, China
| | - Ruimin Wang
- Central Hospital Affiliated to Shandong First Medical University, Jinan 250000, Shandong, China
- Fudan University, Shanghai 200000, China
- Westlake University, Hangzhou 310024, Zhejiang, China
| | - Shaowei An
- Fudan University, Shanghai 200000, China
- Westlake University, Hangzhou 310024, Zhejiang, China
| | - Jinyin Wang
- Westlake University, Hangzhou 310024, Zhejiang, China
- Zhejiang University, Hangzhou 310009, Zhejiang, China
| | - Tong Wang
- Central Hospital Affiliated to Shandong First Medical University, Jinan 250000, Shandong, China
| | - Cong Xie
- Central Hospital Affiliated to Shandong First Medical University, Jinan 250000, Shandong, China
- Key Laboratory of Tropical Medicinal Plant Chemistry of Ministry of Education, College of Chemistry and Chemical Engineering, Hainan Normal University, Haikou 571158, Hainan, China
| | - Changbin Yu
- Central Hospital Affiliated to Shandong First Medical University, Jinan 250000, Shandong, China
| |
Collapse
|
3
|
Wang R, Jiang H, Lu M, Tong J, An S, Wang J, Yu C. MRMPro: a web-based tool to improve the speed of manual calibration for multiple reaction monitoring data analysis by mass spectrometry. BMC Bioinformatics 2024; 25:60. [PMID: 38321388 PMCID: PMC10848457 DOI: 10.1186/s12859-024-05685-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 01/30/2024] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND As a gold-standard quantitative technique based on mass spectrometry, multiple reaction monitoring (MRM) has been widely used in proteomics and metabolomics. In the analysis of MRM data, as no peak picking algorithm can achieve perfect accuracy, manual inspection is necessary to correct the errors. In large cohort analysis scenarios, the time required for manual inspection is often considerable. Apart from the commercial software that comes with mass spectrometers, the open-source and free software Skyline is the most popular software for quantitative omics. However, this software is not optimized for manual inspection of hundreds of samples, the interactive experience also needs to be improved. RESULTS Here we introduce MRMPro, a web-based MRM data analysis platform for efficient manual inspection. MRMPro supports data analysis of MRM and schedule MRM data acquired by mass spectrometers of mainstream vendors. With the goal of improving the speed of manual inspection, we implemented a collaborative review system based on cloud architecture, allowing multiple users to review through browsers. To reduce bandwidth usage and improve data retrieval speed, we proposed a MRM data compression algorithm, which reduced data volume by more than 60% and 80% respectively compared to vendor and mzML format. To improve the efficiency of manual inspection, we proposed a retention time drift estimation algorithm based on similarity of chromatograms. The estimated retention time drifts were then used for peak alignment and automatic EIC grouping. Compared with Skyline, MRMPro has higher quantification accuracy and better manual inspection support. CONCLUSIONS In this study, we proposed MRMPro to improve the usability of manual calibration for MRM data analysis. MRMPro is free for non-commercial use. Researchers can access MRMPro through http://mrmpro.csibio.com/ . All major mass spectrometry formats (wiff, raw, mzML, etc.) can be analyzed on the platform. The final identification results can be exported to a common.xlsx format for subsequent analysis.
Collapse
Affiliation(s)
- Ruimin Wang
- Shandong First Medical University (SDFMU) & Central Hospital Affiliated to SDFMU, Jinan, China
- School of Engineering, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Institute of Advanced Technology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Fudan University, Shanghai, China
- Carbon Silicon (Hangzhou) Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Hengxuan Jiang
- Shandong First Medical University (SDFMU) & Central Hospital Affiliated to SDFMU, Jinan, China
- Carbon Silicon (Hangzhou) Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Miaoshan Lu
- Shandong First Medical University (SDFMU) & Central Hospital Affiliated to SDFMU, Jinan, China
- School of Engineering, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Institute of Advanced Technology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Zhejiang University, Hangzhou, Zhejiang, China
- Carbon Silicon (Hangzhou) Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Junjie Tong
- Shandong First Medical University (SDFMU) & Central Hospital Affiliated to SDFMU, Jinan, China
- College of Chemistry and Chemical Engineering, Hainan Normal University, Haikou, Hainan, China
| | - Shaowei An
- Shandong First Medical University (SDFMU) & Central Hospital Affiliated to SDFMU, Jinan, China
- School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Fudan University, Shanghai, China
- Carbon Silicon (Hangzhou) Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Jinyin Wang
- Shandong First Medical University (SDFMU) & Central Hospital Affiliated to SDFMU, Jinan, China
- School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Institute of Biology, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang, China
- Zhejiang University, Hangzhou, Zhejiang, China
- Carbon Silicon (Hangzhou) Biotechnology Co., Ltd., Hangzhou, Zhejiang, China
| | - Changbin Yu
- Shandong First Medical University (SDFMU) & Central Hospital Affiliated to SDFMU, Jinan, China.
- Carbon Silicon (Hangzhou) Biotechnology Co., Ltd., Hangzhou, Zhejiang, China.
| |
Collapse
|
4
|
Wang R, Lu M, An S, Wang J, Yu C. G-Aligner: a graph-based feature alignment method for untargeted LC-MS-based metabolomics. BMC Bioinformatics 2023; 24:431. [PMID: 37964228 PMCID: PMC10644574 DOI: 10.1186/s12859-023-05525-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 10/09/2023] [Indexed: 11/16/2023] Open
Abstract
BACKGROUND Liquid chromatography-mass spectrometry is widely used in untargeted metabolomics for composition profiling. In multi-run analysis scenarios, features of each run are aligned into consensus features by feature alignment algorithms to observe the intensity variations across runs. However, most of the existing feature alignment methods focus more on accurate retention time correction, while underestimating the importance of feature matching. None of the existing methods can comprehensively consider feature correspondences among all runs and achieve optimal matching. RESULTS To comprehensively analyze feature correspondences among runs, we propose G-Aligner, a graph-based feature alignment method for untargeted LC-MS data. In the feature matching stage, G-Aligner treats features and potential correspondences as nodes and edges in a multipartite graph, considers the multi-run feature matching problem an unbalanced multidimensional assignment problem, and provides three combinatorial optimization algorithms to find optimal matching solutions. In comparison with the feature alignment methods in OpenMS, MZmine2 and XCMS on three public metabolomics benchmark datasets, G-Aligner achieved the best feature alignment performance on all the three datasets with up to 9.8% and 26.6% increase in accurately aligned features and analytes, and helped all comparison software obtain more accurate results on their self-extracted features by integrating G-Aligner to their analysis workflow. G-Aligner is open-source and freely available at https://github.com/CSi-Studio/G-Aligner under a permissive license. Benchmark datasets, manual annotation results, evaluation methods and results are available at https://doi.org/10.5281/zenodo.8313034 CONCLUSIONS: In this study, we proposed G-Aligner to improve feature matching accuracy for untargeted metabolomics LC-MS data. G-Aligner comprehensively considered potential feature correspondences between all runs, converting the feature matching problem as a multidimensional assignment problem (MAP). In evaluations on three public metabolomics benchmark datasets, G-Aligner achieved the highest alignment accuracy on manual annotated and popular software extracted features, proving the effectiveness and robustness of the algorithm.
Collapse
Affiliation(s)
- Ruimin Wang
- Fudan University, Shanghai, 200433, Shanghai, China
- School of Engineering, Westlake University, Hangzhou, 310030, Zhejiang, China
- Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250021, Shandong, China
| | - Miaoshan Lu
- School of Engineering, Westlake University, Hangzhou, 310030, Zhejiang, China
- Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250021, Shandong, China
- Zhejiang University, Hangzhou, 310058, Zhejiang, China
| | - Shaowei An
- Fudan University, Shanghai, 200433, Shanghai, China
- Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250021, Shandong, China
- School of Life Sciences, Westlake University, Hangzhou, 310030, Zhejiang, China
| | - Jinyin Wang
- Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250021, Shandong, China
- Zhejiang University, Hangzhou, 310058, Zhejiang, China
- School of Life Sciences, Westlake University, Hangzhou, 310030, Zhejiang, China
| | - Changbin Yu
- Shandong First Medical University and Shandong Academy of Medical Sciences, Jinan, 250021, Shandong, China.
| |
Collapse
|
5
|
Lu M, Tong J, Fang W, Wang J, An S, Wang R, Jiang H, Yu C. Column storage enables edge computation of biological big data on 5G networks. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:17197-17219. [PMID: 37920052 DOI: 10.3934/mbe.2023766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/04/2023]
Abstract
With the continuous improvement of biological detection technology, the scale of biological data is also increasing, which overloads the central-computing server. The use of edge computing in 5G networks can provide higher processing performance for large biological data analysis, reduce bandwidth consumption and improve data security. Appropriate data compression and reading strategy becomes the key technology to implement edge computing. We introduce the column storage strategy into mass spectrum data so that part of the analysis scenario can be completed by edge computing. Data produced by mass spectrometry is a typical biological big data based. A blood sample analysed by mass spectrometry can produce a 10 gigabytes digital file. By introducing the column storage strategy and combining the related prior knowledge of mass spectrometry, the structure of the mass spectrum data is reorganized, and the result file is effectively compressed. Data can be processed immediately near the scientific instrument, reducing the bandwidth requirements and the pressure of the central server. Here, we present Aird-Slice, a mass spectrum data format using the column storage strategy. Aird-Slice reduces volume by 48% compared to vendor files and speeds up the critical computational step of ion chromatography extraction by an average of 116 times over the test dataset. Aird-Slice provides the ability to analyze biological data using an edge computing architecture on 5G networks.
Collapse
Affiliation(s)
- Miaoshan Lu
- Zhejiang University, Hangzhou 310009, Zhejiang Province, China
- School of Engineering, Westlake University, Hangzhou, China
- Institute of Advanced Technology, Westlake Institute for Advanced Study, Hangzhou, China
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
| | - Junjie Tong
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
| | - Weidong Fang
- Guangxi Key Laboratory of Wireless Wideband Communication and Signal Processing, Guilin University of Electronic Technology, Guilin 541004, China
| | - Jinyin Wang
- Zhejiang University, Hangzhou 310009, Zhejiang Province, China
| | | | | | - Hengxuan Jiang
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
| | - Changbin Yu
- Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, China
| |
Collapse
|
6
|
An S, Wang R, Lu M, Zhang C, Liu H, Wang J, Xie C, Yu C. MetaPro: a web-based metabolomics application for LC-MS data batch inspection and library curation. Metabolomics 2023; 19:57. [PMID: 37289291 DOI: 10.1007/s11306-023-02018-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 05/10/2023] [Indexed: 06/09/2023]
Abstract
INTRODUCTION Metabolomics analysis based on liquid chromatography-mass spectrometry (LC-MS) has been a prevalent method in the metabolic field. However, accurately quantifying all the metabolites in large metabolomics sample cohorts is challenging. The analysis efficiency is restricted by the abilities of software in many labs, and the lack of spectra for some metabolites also hinders metabolite identification. OBJECTIVES Develop software that performs semi-targeted metabolomics analysis with an optimized workflow to improve quantification accuracy. The software also supports web-based technologies and increases laboratory analysis efficiency. A spectral curation function is provided to promote the prosperity of homemade MS/MS spectral libraries in the metabolomics community. METHODS MetaPro is developed based on an industrial-grade web framework and a computation-oriented MS data format to improve analysis efficiency. Algorithms from mainstream metabolomics software are integrated and optimized for more accurate quantification results. A semi-targeted analysis workflow is designed based on the concept of combining artificial judgment and algorithm inference. RESULTS MetaPro supports semi-targeted analysis workflow and functions for fast QC inspection and self-made spectral library curation with easy-to-use interfaces. With curated authentic or high-quality spectra, it can improve identification accuracy using different peak identification strategies. It demonstrates practical value in analyzing large amounts of metabolomics samples. CONCLUSION We offer MetaPro as a web-based application characterized by fast batch QC inspection and credible spectral curation towards high-throughput metabolomics data. It aims to resolve the analysis difficulty in semi-targeted metabolomics.
Collapse
Affiliation(s)
- Shaowei An
- Fudan University, 220 Handan Road, Shanghai, 200433, China
- Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang Province, 310024, China
- Shandong First Medical University, 6699 Qingdao Road, Jinan, Shandong Province, 250117, China
- Carbon Silicon (Hangzhou) Biotechnology Co., Ltd, 368 Jinpeng Street, Hangzhou, Zhejiang Province, 310030, China
| | - Ruimin Wang
- Fudan University, 220 Handan Road, Shanghai, 200433, China
- Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang Province, 310024, China
- Shandong First Medical University, 6699 Qingdao Road, Jinan, Shandong Province, 250117, China
- Carbon Silicon (Hangzhou) Biotechnology Co., Ltd, 368 Jinpeng Street, Hangzhou, Zhejiang Province, 310030, China
| | - Miaoshan Lu
- Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang Province, 310024, China
- Shandong First Medical University, 6699 Qingdao Road, Jinan, Shandong Province, 250117, China
- Carbon Silicon (Hangzhou) Biotechnology Co., Ltd, 368 Jinpeng Street, Hangzhou, Zhejiang Province, 310030, China
- Zhejiang University, 866 Yuhangtang Road, Hangzhou, Zhejiang Province, 310009, China
| | - Chao Zhang
- Calibra Diagnostics Co., Ltd, 329 Jinpeng Street, Hangzhou, Zhejiang Province, 310030, China
| | - Huafen Liu
- Calibra Diagnostics Co., Ltd, 329 Jinpeng Street, Hangzhou, Zhejiang Province, 310030, China
| | - Jinyin Wang
- Westlake University, 18 Shilongshan Road, Hangzhou, Zhejiang Province, 310024, China
- Shandong First Medical University, 6699 Qingdao Road, Jinan, Shandong Province, 250117, China
- Carbon Silicon (Hangzhou) Biotechnology Co., Ltd, 368 Jinpeng Street, Hangzhou, Zhejiang Province, 310030, China
- Zhejiang University, 866 Yuhangtang Road, Hangzhou, Zhejiang Province, 310009, China
| | - Cong Xie
- Shandong First Medical University, 6699 Qingdao Road, Jinan, Shandong Province, 250117, China
| | - Changbin Yu
- Shandong First Medical University, 6699 Qingdao Road, Jinan, Shandong Province, 250117, China.
| |
Collapse
|
7
|
Bilbao A, Ross DH, Lee JY, Donor MT, Williams SM, Zhu Y, Ibrahim YM, Smith RD, Zheng X. MZA: A Data Conversion Tool to Facilitate Software Development and Artificial Intelligence Research in Multidimensional Mass Spectrometry. J Proteome Res 2023; 22:508-513. [PMID: 36414245 PMCID: PMC9898216 DOI: 10.1021/acs.jproteome.2c00313] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Modern mass spectrometry-based workflows employing hybrid instrumentation and orthogonal separations collect multidimensional data, potentially allowing deeper understanding in omics studies through adoption of artificial intelligence methods. However, the large volume of these rich spectra challenges existing data storage and access technologies, therefore precluding informatics advancements. We present MZA (pronounced m-za), the mass-to-charge (m/z) generic data storage and access tool designed to facilitate software development and artificial intelligence research in multidimensional mass spectrometry measurements. Composed of a data conversion tool and a simple file structure based on the HDF5 format, MZA provides easy, cross-platform and cross-programming language access to raw MS-data, enabling fast development of new tools in data science programming languages such as Python and R. The software executable, example MS-data and example Python and R scripts are freely available at https://github.com/PNNL-m-q/mza.
Collapse
Affiliation(s)
- Aivett Bilbao
- Pacific Northwest National Laboratory, Richland, WA, 99352, USA,Corresponding authors Aivett Bilbao – Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, 99352, United States; .; Xueyun Zheng – Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, 99352, United States;
| | - Dylan H. Ross
- Pacific Northwest National Laboratory, Richland, WA, 99352, USA
| | - Joon-Yong Lee
- Pacific Northwest National Laboratory, Richland, WA, 99352, USA
| | - Micah T. Donor
- Pacific Northwest National Laboratory, Richland, WA, 99352, USA
| | | | - Ying Zhu
- Pacific Northwest National Laboratory, Richland, WA, 99352, USA
| | | | | | - Xueyun Zheng
- Pacific Northwest National Laboratory, Richland, WA, 99352, USA,Corresponding authors Aivett Bilbao – Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, 99352, United States; .; Xueyun Zheng – Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA, 99352, United States;
| |
Collapse
|
8
|
StackZDPD: a novel encoding scheme for mass spectrometry data optimized for speed and compression ratio. Sci Rep 2022; 12:5384. [PMID: 35354909 PMCID: PMC8967824 DOI: 10.1038/s41598-022-09432-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 03/23/2022] [Indexed: 11/29/2022] Open
Abstract
As the pervasive, standardized format for interchange and deposition of raw mass spectrometry (MS) proteomics and metabolomics data, text-based mzML is inefficiently utilized on various analysis platforms due to its sheer volume of samples and limited read/write speed. Most research on compression algorithms rarely provides flexible random file reading scheme. Database-developed solution guarantees the efficiency of random file reading, but nevertheless the efforts in compression and third-party software support are insufficient. Under the premise of ensuring the efficiency of decompression, we propose an encoding scheme “Stack-ZDPD” that is optimized for storage of raw MS data, designed for the format “Aird”, a computation-oriented format with fast accessing and decoding time, where the core compression algorithm is “ZDPD”. Stack-ZDPD reduces the volume of data stored in mzML format by around 80% or more, depending on the data acquisition pattern, and the compression ratio is approximately 30% compared to ZDPD for data generated using Time of Flight technology. Our approach is available on AirdPro, for file conversion and the Java-API Aird-SDK, for data parsing.
Collapse
|