Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chandak S, Tatwawadi K, Ochoa I, Hernaez M, Weissman T. SPRING: a next-generation compressor for FASTQ data. Bioinformatics 2020;35:2674-2676. [PMID: 30535063 DOI: 10.1093/bioinformatics/bty1015] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 12/06/2018] [Indexed: 01/31/2023] Open

For:	Chandak S, Tatwawadi K, Ochoa I, Hernaez M, Weissman T. SPRING: a next-generation compressor for FASTQ data. Bioinformatics 2020;35:2674-2676. [PMID: 30535063 DOI: 10.1093/bioinformatics/bty1015] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2018] [Accepted: 12/06/2018] [Indexed: 01/31/2023] Open

Number

Cited by Other Article(s)

Betschart RO, Thalén F, Blankenberg S, Zoche M, Zeller T, Ziegler A. A benchmark study of compression software for human short-read sequence data. Sci Rep 2025;15:15358. [PMID: 40316539 PMCID: PMC12048562 DOI: 10.1038/s41598-025-00491-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2024] [Accepted: 04/28/2025] [Indexed: 05/04/2025] Open

Affiliation(s)

Raphael O Betschart Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, Davos Wolfgang, 7265, Davos, Switzerland Institute of Cardiogenetics, University of Lübeck, Lübeck, Germany
Felix Thalén Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, Davos Wolfgang, 7265, Davos, Switzerland
Stefan Blankenberg Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, Davos Wolfgang, 7265, Davos, Switzerland Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany Centre for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany German Center for Cardiovascular Research, Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
Martin Zoche Institute of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland
Tanja Zeller Institute of Cardiogenetics, University of Lübeck, Lübeck, Germany. Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany. Centre for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany. German Center for Cardiovascular Research, Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany.
Andreas Ziegler Cardio-CARE, Medizincampus Davos, Herman-Burchard-Str. 12, Davos Wolfgang, 7265, Davos, Switzerland. Department of Cardiology, University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany. Centre for Population Health Innovation (POINT), University Heart and Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany. School Mathematics, Statistics and Computer Science, Scottsville, Private Bag X01, Pietermaritzburg, 3209, South Africa.

Collapse

Kowalski TM, Grabowski S. PgRC2: engineering the compression of sequencing reads. Bioinformatics 2025;41:btaf101. [PMID: 40037801 PMCID: PMC11908645 DOI: 10.1093/bioinformatics/btaf101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 01/23/2025] [Accepted: 02/28/2025] [Indexed: 03/06/2025] Open

Um DH, Knowles DA, Kaiser GE. Vector embeddings by sequence similarity and context for improved compression, similarity search, clustering, organization, and manipulation of cDNA libraries. Comput Biol Chem 2025;114:108251. [PMID: 39602973 DOI: 10.1016/j.compbiolchem.2024.108251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 10/10/2024] [Accepted: 10/11/2024] [Indexed: 11/29/2024]

Nazari F, Patel S, LaRocca M, Sansevich A, Czarny R, Schena G, Murray EK. Lossless and reference-free compression of FASTQ/A files using GeneSqueeze. Sci Rep 2025;15:322. [PMID: 39747361 PMCID: PMC11696233 DOI: 10.1038/s41598-024-79258-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 11/07/2024] [Indexed: 01/04/2025] Open

Abstract

As sequencing becomes more accessible, there is an acute need for novel compression methods to efficiently store sequencing files. Omics analytics can leverage sequencing technologies to enhance biomedical research and individualize patient care, but sequencing files demand immense storage capabilities, particularly when sequencing is utilized for longitudinal studies. Addressing the storage challenges posed by these technologies is crucial for omics analytics to achieve their full potential. We present a novel lossless, reference-free compression algorithm, GeneSqueeze, that leverages the patterns inherent in the underlying components of FASTQ files to solve this need. GeneSqueeze's benefits include an auto-tuning compression protocol based on each file's distribution, lossless preservation of IUPAC nucleotides and read identifiers, and unrestricted FASTQ/A file attributes (i.e., read length, number of reads, or read identifier format). We compared GeneSqueeze to the general-purpose compressor, gzip, and to a domain-specific compressor, SPRING, to assess performance. Due to GeneSqueeze's current Python implementation, GeneSqueeze underperformed as compared to gzip and SPRING in the time domain. GeneSqueeze and gzip achieved 100% lossless compression across all elements of the FASTQ files (i.e. the read identifier, sequence, quality score and ' + ' lines). GeneSqueeze and gzip compressed all files losslessly, while both SPRING's traditional and lossless modes exhibited data loss of non-ACGTN IUPAC nucleotides and of metadata following the ' + ' on the separator line. GeneSqueeze showed up to three times higher compression ratios as compared to gzip, regardless of read length, number of reads, or file size, and had comparable compression ratios to SPRING across a variety of factors. Overall, GeneSqueeze represents a competitive and specialized compression method for FASTQ/A files containing nucleotide sequences. As such, GeneSqueeze has the potential to significantly reduce the storage and transmission costs associated with large omics datasets without sacrificing data integrity.

Collapse

Sousa MJP, Pinho AJ, Pratas D. JARVIS3: an efficient encoder for genomic data. Bioinformatics 2024;40:btae725. [PMID: 39673739 PMCID: PMC11645547 DOI: 10.1093/bioinformatics/btae725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 10/04/2024] [Accepted: 11/29/2024] [Indexed: 12/16/2024] Open

Müntefering F, Adhisantoso YG, Chandak S, Ostermann J, Hernaez M, Voges J. Genie: the first open-source ISO/IEC encoder for genomic data. Commun Biol 2024;7:553. [PMID: 38724695 PMCID: PMC11082222 DOI: 10.1038/s42003-024-06249-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 04/26/2024] [Indexed: 05/12/2024] Open

Sun H, Zheng Y, Xie H, Ma H, Zhong C, Yan M, Liu X, Wang G. PQSDC: a parallel lossless compressor for quality scores data via sequences partition and run-length prediction mapping. Bioinformatics 2024;40:btae323. [PMID: 38759114 PMCID: PMC11139522 DOI: 10.1093/bioinformatics/btae323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 04/22/2024] [Accepted: 05/16/2024] [Indexed: 05/19/2024] Open

Abstract

MOTIVATION

The quality scores data (QSD) account for 70% in compressed FastQ files obtained from the short and long reads sequencing technologies. Designing effective compressors for QSD that counterbalance compression ratio, time cost, and memory consumption is essential in scenarios such as large-scale genomics data sharing and long-term data backup. This study presents a novel parallel lossless QSD-dedicated compression algorithm named PQSDC, which fulfills the above requirements well. PQSDC is based on two core components: a parallel sequences-partition model designed to reduce peak memory consumption and time cost during compression and decompression processes, as well as a parallel four-level run-length prediction mapping model to enhance compression ratio. Besides, the PQSDC algorithm is also designed to be highly concurrent using multicore CPU clusters.

RESULTS

We evaluate PQSDC and four state-of-the-art compression algorithms on 27 real-world datasets, including 61.857 billion QSD characters and 632.908 million QSD sequences. (1) For short reads, compared to baselines, the maximum improvement of PQSDC reaches 7.06% in average compression ratio, and 8.01% in weighted average compression ratio. During compression and decompression, the maximum total time savings of PQSDC are 79.96% and 84.56%, respectively; the maximum average memory savings are 68.34% and 77.63%, respectively. (2) For long reads, the maximum improvement of PQSDC reaches 12.51% and 13.42% in average and weighted average compression ratio, respectively. The maximum total time savings during compression and decompression are 53.51% and 72.53%, respectively; the maximum average memory savings are 19.44% and 17.42%, respectively. (3) Furthermore, PQSDC ranks second in compression robustness among the tested algorithms, indicating that it is less affected by the probability distribution of the QSD collections. Overall, our work provides a promising solution for QSD parallel compression, which balances storage cost, time consumption, and memory occupation primely.

AVAILABILITY AND IMPLEMENTATION

The proposed PQSDC compressor can be downloaded from https://github.com/fahaihi/PQSDC.

Collapse

Ji F, Zhou Q, Ruan J, Zhu Z, Liu X. A compressive seeding algorithm in conjunction with reordering-based compression. Bioinformatics 2024;40:btae100. [PMID: 38377404 PMCID: PMC10955252 DOI: 10.1093/bioinformatics/btae100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/29/2024] [Accepted: 02/19/2024] [Indexed: 02/22/2024] Open

Sun H, Zheng Y, Xie H, Ma H, Liu X, Wang G. PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering. BMC Bioinformatics 2023;24:454. [PMID: 38036969 PMCID: PMC10691058 DOI: 10.1186/s12859-023-05566-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 11/13/2023] [Indexed: 12/02/2023] Open

Chen S, Chen Y, Wang Z, Qin W, Zhang J, Nand H, Zhang J, Li J, Zhang X, Liang X, Xu M. Efficient sequencing data compression and FPGA acceleration based on a two-step framework. Front Genet 2023;14:1260531. [PMID: 37811144 PMCID: PMC10552150 DOI: 10.3389/fgene.2023.1260531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 09/07/2023] [Indexed: 10/10/2023] Open

Meng Q, Chandak S, Zhu Y, Weissman T. Reference-free lossless compression of nanopore sequencing reads using an approximate assembly approach. Sci Rep 2023;13:2082. [PMID: 36747011 PMCID: PMC9902536 DOI: 10.1038/s41598-023-29267-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 02/01/2023] [Indexed: 02/08/2023] Open

Karasikov M, Mustafa H, Rätsch G, Kahles A. Lossless indexing with counting de Bruijn graphs. Genome Res 2022;32:1754-1764. [PMID: 35609994 PMCID: PMC9528980 DOI: 10.1101/gr.276607.122] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 05/05/2022] [Indexed: 11/25/2022]

Kryukov K, Jin L, Nakagawa S. Efficient compression of SARS-CoV-2 genome data using Nucleotide Archival Format. PATTERNS (NEW YORK, N.Y.) 2022;3:100562. [PMID: 35818472 PMCID: PMC9259476 DOI: 10.1016/j.patter.2022.100562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Tang T, Hutvagner G, Wang W, Li J. Simultaneous compression of multiple error-corrected short-read sets for faster data transmission and better de novo assemblies. Brief Funct Genomics 2022;21:387-398. [PMID: 35848773 DOI: 10.1093/bfgp/elac016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 11/14/2022] Open

Abstract

Next-Generation Sequencing has produced incredible amounts of short-reads sequence data for de novo genome assembly over the last decades. For efficient transmission of these huge datasets, high-performance compression algorithms have been intensively studied. As both the de novo assembly and error correction methods utilize the overlaps between reads data, a concern is that the will the sequencing errors bring up negative effects on genome assemblies also affect the compression of the NGS data. This work addresses two problems: how current error correction algorithms can enable the compression algorithms to make the sequence data much more compact, and whether the sequence-modified reads by the error-correction algorithms will lead to quality improvement for de novo contig assembly. As multiple sets of short reads are often produced by a single biomedical project in practice, we propose a graph-based method to reorder the files in the collection of multiple sets and then compress them simultaneously for a further compression improvement after error correction. We use examples to illustrate that accurate error correction algorithms can significantly reduce the number of mismatched nucleotides in the reference-free compression, hence can greatly improve the compression performance. Extensive test on practical collections of multiple short-read sets does confirm that the compression performance on the error-corrected data (with unchanged size) significantly outperforms that on the original data, and that the file reordering idea contributes furthermore. The error correction on the original reads has also resulted in quality improvements of the genome assemblies, sometimes remarkably. However, it is still an open question that how to combine appropriate error correction methods with an assembly algorithm so that the assembly performance can be always significantly improved.

Collapse

Niu Y, Ma M, Li F, Liu X, Shi G. ACO:lossless quality score compression based on adaptive coding order. BMC Bioinformatics 2022;23:219. [PMID: 35672665 PMCID: PMC9175485 DOI: 10.1186/s12859-022-04712-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 04/25/2022] [Indexed: 08/30/2023] Open

SFQ: Constructing and Querying a Succinct Representation of FASTQ Files. ELECTRONICS 2022. [DOI: 10.3390/electronics11111783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Xie S, He X, He S, Zhu Z. CURC: A CUDA-based reference-free read compressor. Bioinformatics 2022;38:3294-3296. [PMID: 35579371 DOI: 10.1093/bioinformatics/btac333] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 04/07/2022] [Accepted: 05/12/2022] [Indexed: 11/13/2022] Open

CoLoRd: compressing long reads. Nat Methods 2022;19:441-444. [PMID: 35347321 DOI: 10.1038/s41592-022-01432-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 02/23/2022] [Indexed: 01/30/2023]

Huo H, Liu P, Wang C, Jiang H, Vitter JS. CIndex: compressed indexes for fast retrieval of FASTQ files. Bioinformatics 2022;38:335-343. [PMID: 34524416 DOI: 10.1093/bioinformatics/btab655] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 08/12/2021] [Accepted: 09/10/2021] [Indexed: 02/03/2023] Open

Lee D, Song G. FastqCLS: a FASTQ compressor for long-read sequencing via read reordering using a novel scoring model. Bioinformatics 2022;38:351-356. [PMID: 34623374 DOI: 10.1093/bioinformatics/btab696] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 09/29/2021] [Accepted: 10/05/2021] [Indexed: 02/03/2023] Open

Cho M, No A. FCLQC: fast and concurrent lossless quality scores compressor. BMC Bioinformatics 2021;22:606. [PMID: 34930110 PMCID: PMC8686598 DOI: 10.1186/s12859-021-04516-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2021] [Accepted: 12/06/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Advances in sequencing technology have drastically reduced sequencing costs. As a result, the amount of sequencing data increases explosively. Since FASTQ files (standard sequencing data formats) are huge, there is a need for efficient compression of FASTQ files, especially quality scores. Several quality scores compression algorithms are recently proposed, mainly focused on lossy compression to boost the compression rate further. However, for clinical applications and archiving purposes, lossy compression cannot replace lossless compression. One of the main challenges for lossless compression is time complexity, where it takes thousands of seconds to compress a 1 GB file. Also, there are desired features for compression algorithms, such as random access. Therefore, there is a need for a fast lossless compressor with a reasonable compression rate and random access functionality.

RESULTS

This paper proposes a Fast and Concurrent Lossless Quality scores Compressor (FCLQC) that supports random access and achieves a lower running time based on concurrent programming. Experimental results reveal that FCLQC is significantly faster than the baseline compressors on compression and decompression at the expense of compression ratio. Compared to LCQS (baseline quality score compression algorithm), FCLQC shows at least 31x compression speed improvement in all settings, where a performance degradation in compression ratio is up to 13.58% (8.26% on average). Compared to general-purpose compressors (such as 7-zip), FCLQC shows 3x faster compression speed while having better compression ratios, at least 2.08% (4.69% on average). Moreover, the speed of random access decompression also outperforms the others. The concurrency of FCLQC is implemented using Rust; the performance gain increases near-linearly with the number of threads.

CONCLUSION

The superiority of compression and decompression speed makes FCLQC a practical lossless quality score compressor candidate for speed-sensitive applications of DNA sequencing data. FCLQC is available at https://github.com/Minhyeok01/FCLQC and is freely available for non-commercial usage.

Collapse

Kryukov K, Ueda MT, Nakagawa S, Imanishi T. Sequence Compression Benchmark (SCB) database-A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences. Gigascience 2021;9:5867695. [PMID: 32627830 PMCID: PMC7336184 DOI: 10.1093/gigascience/giaa072] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Revised: 06/01/2020] [Accepted: 06/15/2020] [Indexed: 01/22/2023] Open

Anžel A, Heider D, Hattab G. The visual story of data storage: From storage properties to user interfaces. Comput Struct Biotechnol J 2021;19:4904-4918. [PMID: 34527195 PMCID: PMC8430386 DOI: 10.1016/j.csbj.2021.08.031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 08/19/2021] [Accepted: 08/19/2021] [Indexed: 12/15/2022] Open

Liu Y, Li J. Hamming-shifting graph of genomic short reads: Efficient construction and its application for compression. PLoS Comput Biol 2021;17:e1009229. [PMID: 34280186 PMCID: PMC8321399 DOI: 10.1371/journal.pcbi.1009229] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 07/29/2021] [Accepted: 06/30/2021] [Indexed: 11/21/2022] Open

Sardaraz M, Tahir M. SCA-NGS: Secure compression algorithm for next generation sequencing data using genetic operators and block sorting. Sci Prog 2021;104:368504211023276. [PMID: 34143692 PMCID: PMC10454964 DOI: 10.1177/00368504211023276] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Ferraro Petrillo U, Palini F, Cattaneo G, Giancarlo R. FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy. BMC Bioinformatics 2021;22:144. [PMID: 33752596 PMCID: PMC7986029 DOI: 10.1186/s12859-021-04063-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 03/04/2021] [Indexed: 11/10/2022] Open

Lan D, Tobler R, Souilmi Y, Llamas B. Genozip - A Universal Extensible Genomic Data Compressor. Bioinformatics 2021;37:2225-2230. [PMID: 33585897 PMCID: PMC8388020 DOI: 10.1093/bioinformatics/btab102] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Revised: 01/25/2021] [Accepted: 02/12/2021] [Indexed: 11/14/2022] Open

Tang T, Li J. Transformation of FASTA files into feature vectors for unsupervised compression of short reads databases. J Bioinform Comput Biol 2021;19:2050048. [PMID: 33472569 DOI: 10.1142/s0219720020500481] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]

Kowalski TM, Grabowski S. PgRC: pseudogenome-based read compressor. Bioinformatics 2020;36:2082-2089. [PMID: 31893286 DOI: 10.1093/bioinformatics/btz919] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2019] [Revised: 11/28/2019] [Accepted: 12/05/2019] [Indexed: 01/26/2023] Open

Liu Y, Wong L, Li J. Allowing mutations in maximal matches boosts genome compression performance. Bioinformatics 2020;36:4675-4681. [PMID: 33118018 DOI: 10.1093/bioinformatics/btaa572] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2020] [Revised: 05/05/2020] [Accepted: 06/10/2020] [Indexed: 01/23/2023] Open

Dufort Y Álvarez G, Seroussi G, Smircich P, Sotelo J, Ochoa I, Martín Á. ENANO: Encoder for NANOpore FASTQ files. Bioinformatics 2020;36:4506-4507. [PMID: 32470109 DOI: 10.1093/bioinformatics/btaa551] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2020] [Revised: 05/07/2020] [Accepted: 05/26/2020] [Indexed: 02/01/2023] Open

Jespersgaard C, Syed A, Chmura P, Løngreen P. Supercomputing and Secure Cloud Infrastructures in Biology and Medicine. Annu Rev Biomed Data Sci 2020. [DOI: 10.1146/annurev-biodatasci-012920-013357] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

FQSqueezer: k-mer-based compression of sequencing data. Sci Rep 2020;10:578. [PMID: 31953467 PMCID: PMC6969201 DOI: 10.1038/s41598-020-57452-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Accepted: 12/19/2019] [Indexed: 12/26/2022] Open

Al Yami S, Huang CH. LFastqC: A lossless non-reference-based FASTQ compressor. PLoS One 2019;14:e0224806. [PMID: 31725736 PMCID: PMC6855649 DOI: 10.1371/journal.pone.0224806] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 10/22/2019] [Indexed: 11/19/2022] Open

Hernaez M, Pavlichin D, Weissman T, Ochoa I. Genomic Data Compression. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-072018-021229] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]