1
|
Lu R, Dumonceaux T, Anzar M, Zovoilis A, Antonation K, Barker D, Corbett C, Nadon C, Robertson J, Eagle SHC, Lung O, Rudar J, Surujballi O, Laing C. MNBC: a multithreaded Minimizer-based Naïve Bayes Classifier for improved metagenomic sequence classification. Bioinformatics 2024; 40:btae601. [PMID: 39388213 PMCID: PMC11522871 DOI: 10.1093/bioinformatics/btae601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 10/03/2024] [Accepted: 10/08/2024] [Indexed: 10/15/2024] Open
Abstract
MOTIVATION State-of-the-art tools for classifying metagenomic sequencing reads provide both rapid and accurate options, although the combination of both in a single tool is a constantly improving area of research. The machine learning-based Naïve Bayes Classifier (NBC) approach provides a theoretical basis for accurate classification of all reads in a sample. RESULTS We developed the multithreaded Minimizer-based Naïve Bayes Classifier (MNBC) tool to improve the NBC approach by applying minimizers, as well as plurality voting for closely related classification scores. A standard reference- and test-sequence framework using simulated variable-length reads benchmarked MNBC with six other state-of-the-art tools: MetaMaps, Ganon, Kraken2, KrakenUniq, CLARK, and Centrifuge. We also applied MNBC to the "marine" and "strain-madness" short-read metagenomic datasets in the Critical Assessment of Metagenome Interpretation (CAMI) II challenge using a corresponding database from the time. MNBC efficiently identified reads from unknown microorganisms, and exhibited the highest species- and genus-level precision and recall on short reads, as well as the highest species-level precision on long reads. It also achieved the highest accuracy on the "strain-madness" dataset. AVAILABILITY AND IMPLEMENTATION MNBC is freely available at: https://github.com/ComputationalPathogens/MNBC.
Collapse
Affiliation(s)
- Ruipeng Lu
- National Centre for Animal Disease, Canadian Food Inspection Agency, Lethbridge County, AB, T1J 5R7, Canada
| | - Tim Dumonceaux
- Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0X2, Canada
| | - Muhammad Anzar
- Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0X2, Canada
| | - Athanasios Zovoilis
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E 0J9, Canada
| | - Kym Antonation
- National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada, Winnipeg, MB, R3E 3M4, Canada
| | - Dillon Barker
- National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada, Winnipeg, MB, R3E 3M4, Canada
| | - Cindi Corbett
- National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada, Winnipeg, MB, R3E 3M4, Canada
| | - Celine Nadon
- National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada, Winnipeg, MB, R3E 3M4, Canada
| | - James Robertson
- National Microbiology Laboratory at Guelph, Public Health Agency of Canada, Guelph, ON, N1G 3W4, Canada
| | - Shannon H C Eagle
- National Microbiology Laboratory at Guelph, Public Health Agency of Canada, Guelph, ON, N1G 3W4, Canada
| | - Oliver Lung
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, MB, R3E 3M4, Canada
| | - Josip Rudar
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, MB, R3E 3M4, Canada
| | - Om Surujballi
- Ottawa Animal Health Laboratory, Canadian Food Inspection Agency, Ottawa, ON, K2J 4S1, Canada
| | - Chad Laing
- National Centre for Animal Disease, Canadian Food Inspection Agency, Lethbridge County, AB, T1J 5R7, Canada
| |
Collapse
|
2
|
Patrick N, Markey M. Long-Read MDM4 Sequencing Reveals Aberrant Isoform Landscape in Metastatic Melanomas. Int J Mol Sci 2024; 25:9415. [PMID: 39273363 PMCID: PMC11395681 DOI: 10.3390/ijms25179415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 08/22/2024] [Accepted: 08/26/2024] [Indexed: 09/15/2024] Open
Abstract
MDM4 is upregulated in the majority of melanoma cases and has been described as a "key therapeutic target in cutaneous melanoma". Numerous isoforms of MDM4 exist, with few studies examining their specific expression in human tissues. The changes in splicing of MDM4 during human melanomagenesis are critical to p53 activity and represent potential therapeutic targets. Compounding this, studies relying on short reads lose "connectivity" data, so full transcripts are frequently only inferred from the presence of splice junction reads. To address this problem, long-read nanopore sequencing was utilized to read the entire length of transcripts. Here, MDM4 transcripts, both alternative and canonical, are characterized in a pilot cohort of human melanoma specimens. RT-PCR was first used to identify the presence of novel splice junctions in these specimens. RT-qPCR then quantified the expression of major MDM4 isoforms observed during sequencing. The current study both identifies and quantifies MDM4 isoforms present in melanoma tumor samples. In the current study, we observed high expression levels of MDM4-S, MDM4-FL, MDM4-A, and the previously undescribed Ensembl transcript MDM4-209. A novel transcript lacking both exons 6 and 9 is observed and named MDM4-A/S for its resemblance to both MDM4-A and MDM4-S isoforms.
Collapse
Affiliation(s)
| | - Michael Markey
- Department of Biochemistry and Molecular Biology, Wright State University, 3640 Colonel Glenn Hwy, Dayton, OH 45435, USA;
| |
Collapse
|
3
|
Ulrich JU, Renard BY. Fast and space-efficient taxonomic classification of long reads with hierarchical interleaved XOR filters. Genome Res 2024; 34:914-924. [PMID: 38886068 PMCID: PMC11293544 DOI: 10.1101/gr.278623.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 05/23/2024] [Indexed: 06/20/2024]
Abstract
Metagenomic long-read sequencing is gaining popularity for various applications, including pathogen detection and microbiome studies. To analyze the large data created in those studies, software tools need to taxonomically classify the sequenced molecules and estimate the relative abundances of organisms in the sequenced sample. Because of the exponential growth of reference genome databases, the current taxonomic classification methods have large computational requirements. This issue motivated us to develop a new data structure for fast and memory-efficient querying of long reads. Here, we present Taxor as a new tool for long-read metagenomic classification using a hierarchical interleaved XOR filter data structure for indexing and querying large reference genome sets. Taxor implements several k-mer-based approaches, such as syncmers, for pseudoalignment to classify reads and an expectation-maximization algorithm for metagenomic profiling. Our results show that Taxor outperforms state-of-the-art tools regarding precision while having a similar recall for long-read taxonomic classification. Most notably, Taxor reduces the memory requirements and index size by >50% and is among the fastest tools regarding query times. This enables real-time metagenomics analysis with large reference databases on a small laptop in the field.
Collapse
Affiliation(s)
- Jens-Uwe Ulrich
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany;
- Phylogenomics Unit, Center for Artificial Intelligence in Public Health Research, Robert Koch Institute, 15745 Wildau, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Bernhard Y Renard
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany;
| |
Collapse
|
4
|
Hauswedell H, Hetzel S, Gottlieb SG, Kretzmer H, Meissner A, Reinert K. Lambda3: homology search for protein, nucleotide, and bisulfite-converted sequences. Bioinformatics 2024; 40:btae097. [PMID: 38485699 PMCID: PMC10955267 DOI: 10.1093/bioinformatics/btae097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 12/22/2023] [Accepted: 03/13/2024] [Indexed: 03/22/2024] Open
Abstract
MOTIVATION Local alignments of query sequences in large databases represent a core part of metagenomic studies and facilitate homology search. Following the development of NCBI Blast, many applications aimed to provide faster and equally sensitive local alignment frameworks. Most applications focus on protein alignments, while only few also facilitate DNA-based searches. None of the established programs allow searching DNA sequences from bisulfite sequencing experiments commonly used for DNA methylation profiling, for which specific alignment strategies need to be implemented. RESULTS Here, we introduce Lambda3, a new version of the local alignment application Lambda. Lambda3 is the first solution that enables the search of protein, nucleotide as well as bisulfite-converted nucleotide query sequences. Its protein mode achieves comparable performance to that of the highly optimized protein alignment application Diamond, while the nucleotide mode consistently outperforms established local nucleotide aligners. Combined, Lambda3 presents a universal local alignment framework that enables fast and sensitive homology searches for a wide range of use-cases. AVAILABILITY AND IMPLEMENTATION Lambda3 is free and open-source software publicly available at https://github.com/seqan/lambda/.
Collapse
Affiliation(s)
| | - Sara Hetzel
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Simon G Gottlieb
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
- Institute for Bio- and Geosciences, Forschungszentrum Jülich GmbH, Jülich 52428, Germany
| | - Helene Kretzmer
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Alexander Meissner
- Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
- Department of Biology, Chemistry and Pharmacy, Freie Universität Berlin, Berlin 14195, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
- Efficient Algorithms for Omics Data Group, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| |
Collapse
|
5
|
Nowatzky Y, Benner P, Reinert K, Muth T. Mistle: bringing spectral library predictions to metaproteomics with an efficient search index. Bioinformatics 2023; 39:btad376. [PMID: 37294786 PMCID: PMC10313348 DOI: 10.1093/bioinformatics/btad376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 05/11/2023] [Accepted: 06/08/2023] [Indexed: 06/11/2023] Open
Abstract
MOTIVATION Deep learning has moved to the forefront of tandem mass spectrometry-driven proteomics and authentic prediction for peptide fragmentation is more feasible than ever. Still, at this point spectral prediction is mainly used to validate database search results or for confined search spaces. Fully predicted spectral libraries have not yet been efficiently adapted to large search space problems that often occur in metaproteomics or proteogenomics. RESULTS In this study, we showcase a workflow that uses Prosit for spectral library predictions on two common metaproteomes and implement an indexing and search algorithm, Mistle, to efficiently identify experimental mass spectra within the library. Hence, the workflow emulates a classic protein sequence database search with protein digestion but builds a searchable index from spectral predictions as an in-between step. We compare Mistle to popular search engines, both on a spectral and database search level, and provide evidence that this approach is more accurate than a database search using MSFragger. Mistle outperforms other spectral library search engines in terms of run time and proves to be extremely memory efficient with a 4- to 22-fold decrease in RAM usage. This makes Mistle universally applicable to large search spaces, e.g. covering comprehensive sequence databases of diverse microbiomes. AVAILABILITY AND IMPLEMENTATION Mistle is freely available on GitHub at https://github.com/BAMeScience/Mistle.
Collapse
Affiliation(s)
- Yannek Nowatzky
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| | - Philipp Benner
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, FU Berlin, Berlin 14195, Germany
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin 14195, Germany
| | - Thilo Muth
- Section S.3 eScience, Federal Institute for Materials Research and Testing (BAM), Berlin 12205, Germany
| |
Collapse
|
6
|
Sequre: a high-performance framework for secure multiparty computation enables biomedical data sharing. Genome Biol 2023; 24:5. [PMID: 36631897 PMCID: PMC9832703 DOI: 10.1186/s13059-022-02841-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 12/21/2022] [Indexed: 01/12/2023] Open
Abstract
Secure multiparty computation (MPC) is a cryptographic tool that allows computation on top of sensitive biomedical data without revealing private information to the involved entities. Here, we introduce Sequre, an easy-to-use, high-performance framework for developing performant MPC applications. Sequre offers a set of automatic compile-time optimizations that significantly improve the performance of MPC applications and incorporates the syntax of Python programming language to facilitate rapid application development. We demonstrate its usability and performance on various bioinformatics tasks showing up to 3-4 times increased speed over the existing pipelines with 7-fold reductions in codebase sizes.
Collapse
|
7
|
Shen W, Xiang H, Huang T, Tang H, Peng M, Cai D, Hu P, Ren H. KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping. Bioinformatics 2023; 39:btac845. [PMID: 36579886 PMCID: PMC9828150 DOI: 10.1093/bioinformatics/btac845] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 12/17/2022] [Accepted: 12/28/2022] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION The growing number of microbial reference genomes enables the improvement of metagenomic profiling accuracy but also imposes greater requirements on the indexing efficiency, database size and runtime of taxonomic profilers. Additionally, most profilers focus mainly on bacterial, archaeal and fungal populations, while less attention is paid to viral communities. RESULTS We present KMCP (K-mer-based Metagenomic Classification and Profiling), a novel k-mer-based metagenomic profiling tool that utilizes genome coverage information by splitting the reference genomes into chunks and stores k-mers in a modified and optimized Compact Bit-Sliced Signature Index for fast alignment-free sequence searching. KMCP combines k-mer similarity and genome coverage information to reduce the false positive rate of k-mer-based taxonomic classification and profiling methods. Benchmarking results based on simulated and real data demonstrate that KMCP, despite a longer running time than all other methods, not only allows the accurate taxonomic profiling of prokaryotic and viral populations but also provides more confident pathogen detection in clinical samples of low depth. AVAILABILITY AND IMPLEMENTATION The software is open-source under the MIT license and available at https://github.com/shenwei356/kmcp. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Shen
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Hongyan Xiang
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Tianquan Huang
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Hui Tang
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Mingli Peng
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Dachuan Cai
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Peng Hu
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Hong Ren
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| |
Collapse
|
8
|
Darvish M, Seiler E, Mehringer S, Rahn R, Reinert K. Needle: a fast and space-efficient prefilter for estimating the quantification of very large collections of expression experiments. Bioinformatics 2022; 38:4100-4108. [PMID: 35801930 PMCID: PMC9438961 DOI: 10.1093/bioinformatics/btac492] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 05/23/2022] [Accepted: 07/07/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The ever-growing size of sequencing data is a major bottleneck in bioinformatics as the advances of hardware development cannot keep up with the data growth. Therefore, an enormous amount of data is collected but rarely ever reused, because it is nearly impossible to find meaningful experiments in the stream of raw data. RESULTS As a solution, we propose Needle, a fast and space-efficient index which can be built for thousands of experiments in <2 h and can estimate the quantification of a transcript in these experiments in seconds, thereby outperforming its competitors. The basic idea of the Needle index is to create multiple interleaved Bloom filters that each store a set of representative k-mers depending on their multiplicity in the raw data. This is then used to quantify the query. AVAILABILITY AND IMPLEMENTATION https://github.com/seqan/needle. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Enrico Seiler
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, Berlin, Germany,Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, 14195 Berlin, Germany
| | - Svenja Mehringer
- Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, 14195 Berlin, Germany
| | - René Rahn
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Knut Reinert
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, Berlin, Germany,Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, 14195 Berlin, Germany
| |
Collapse
|
9
|
Santoro D, Pellegrina L, Comin M, Vandin F. SPRISS: approximating frequent k-mers by sampling reads, and applications. Bioinformatics 2022; 38:3343-3350. [PMID: 35583271 PMCID: PMC9237683 DOI: 10.1093/bioinformatics/btac180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/25/2022] [Accepted: 05/16/2022] [Indexed: 11/29/2022] Open
Abstract
MOTIVATION The extraction of k-mers is a fundamental component in many complex analyses of large next-generation sequencing datasets, including reads classification in genomics and the characterization of RNA-seq datasets. The extraction of all k-mers and their frequencies is extremely demanding in terms of running time and memory, owing to the size of the data and to the exponential number of k-mers to be considered. However, in several applications, only frequent k-mers, which are k-mers appearing in a relatively high proportion of the data, are required by the analysis. RESULTS In this work, we present SPRISS, a new efficient algorithm to approximate frequent k-mers and their frequencies in next-generation sequencing data. SPRISS uses a simple yet powerful reads sampling scheme, which allows to extract a representative subset of the dataset that can be used, in combination with any k-mer counting algorithm, to perform downstream analyses in a fraction of the time required by the analysis of the whole data, while obtaining comparable answers. Our extensive experimental evaluation demonstrates the efficiency and accuracy of SPRISS in approximating frequent k-mers, and shows that it can be used in various scenarios, such as the comparison of metagenomic datasets, the identification of discriminative k-mers, and SNP (single nucleotide polymorphism) genotyping, to extract insights in a fraction of the time required by the analysis of the whole dataset. AVAILABILITY AND IMPLEMENTATION SPRISS [a preliminary version (Santoro et al., 2021) of this work was presented at RECOMB 2021] is available at https://github.com/VandinLab/SPRISS. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Diego Santoro
- Department of Information Engineering, University of Padova, 35131 Padova, Italy
| | - Leonardo Pellegrina
- Department of Information Engineering, University of Padova, 35131 Padova, Italy
| | - Matteo Comin
- Department of Information Engineering, University of Padova, 35131 Padova, Italy
| | - Fabio Vandin
- Department of Information Engineering, University of Padova, 35131 Padova, Italy
| |
Collapse
|
10
|
Abstract
MOTIVATION Nanopore sequencers allow targeted sequencing of interesting nucleotide sequences by rejecting other sequences from individual pores. This feature facilitates the enrichment of low-abundant sequences by depleting overrepresented ones in-silico. Existing tools for adaptive sampling either apply signal alignment, which cannot handle human-sized reference sequences, or apply read mapping in sequence space relying on fast graphical processing units (GPU) base callers for real-time read rejection. Using nanopore long-read mapping tools is also not optimal when mapping shorter reads as usually analyzed in adaptive sampling applications. RESULTS Here, we present a new approach for nanopore adaptive sampling that combines fast CPU and GPU base calling with read classification based on Interleaved Bloom Filters. ReadBouncer improves the potential enrichment of low abundance sequences by its high read classification sensitivity and specificity, outperforming existing tools in the field. It robustly removes even reads belonging to large reference sequences while running on commodity hardware without GPUs, making adaptive sampling accessible for in-field researchers. Readbouncer also provides a user-friendly interface and installer files for end-users without a bioinformatics background. AVAILABILITY AND IMPLEMENTATION The C++ source code is available at https://gitlab.com/dacs-hpi/readbouncer. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Ahmad Lutfi
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Kilian Rutzen
- Genome Sequencing Unit (MF2), Robert Koch Institute, 13353 Berlin, Germany
| | | |
Collapse
|
11
|
Alser M, Rotman J, Deshpande D, Taraszka K, Shi H, Baykal PI, Yang HT, Xue V, Knyazev S, Singer BD, Balliu B, Koslicki D, Skums P, Zelikovsky A, Alkan C, Mutlu O, Mangul S. Technology dictates algorithms: recent developments in read alignment. Genome Biol 2021; 22:249. [PMID: 34446078 PMCID: PMC8390189 DOI: 10.1186/s13059-021-02443-7] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 07/28/2021] [Indexed: 01/08/2023] Open
Abstract
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today's diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.
Collapse
Affiliation(s)
- Mohammed Alser
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Jeremy Rotman
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Dhrithi Deshpande
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA
| | - Kodi Taraszka
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Huwenbo Shi
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Pelin Icer Baykal
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Harry Taegyun Yang
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
- Bioinformatics Interdepartmental Ph.D. Program, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Victor Xue
- Department of Computer Science, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - Sergey Knyazev
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Benjamin D Singer
- Division of Pulmonary and Critical Care Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
- Department of Biochemistry & Molecular Genetics, Northwestern University Feinberg School of Medicine, Chicago, USA
- Simpson Querrey Institute for Epigenetics, Northwestern University Feinberg School of Medicine, Chicago, IL, 60611, USA
| | - Brunilda Balliu
- Department of Computational Medicine, University of California Los Angeles, Los Angeles, CA, 90095, USA
| | - David Koslicki
- Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16801, USA
- Biology Department, Pennsylvania State University, University Park, PA, 16801, USA
- The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA, 16801, USA
| | - Pavel Skums
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
| | - Alex Zelikovsky
- Department of Computer Science, Georgia State University, Atlanta, GA, 30302, USA
- The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, 119991, Russia
| | - Can Alkan
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Bilkent-Hacettepe Health Sciences and Technologies Program, Ankara, Turkey
| | - Onur Mutlu
- Computer Science Department, ETH Zürich, 8092, Zürich, Switzerland
- Computer Engineering Department, Bilkent University, 06800 Bilkent, Ankara, Turkey
- Information Technology and Electrical Engineering Department, ETH Zürich, Zürich, 8092, Switzerland
| | - Serghei Mangul
- Department of Clinical Pharmacy, School of Pharmacy, University of Southern California, Los Angeles, CA, 90089, USA.
| |
Collapse
|
12
|
Weging S, Gogol-Döring A, Grosse I. Taxonomic analysis of metagenomic data with kASA. Nucleic Acids Res 2021; 49:e68. [PMID: 33784400 PMCID: PMC8266618 DOI: 10.1093/nar/gkab200] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 03/10/2021] [Indexed: 11/14/2022] Open
Abstract
The taxonomic analysis of sequencing data has become important in many areas of life sciences. However, currently available tools for that purpose either consume large amounts of RAM or yield insufficient quality and robustness. Here, we present kASA, a k-mer based tool capable of identifying and profiling metagenomic DNA or protein sequences with high computational efficiency and a user-definable memory footprint. We ensure both high sensitivity and precision by using an amino acid-like encoding of k-mers together with a range of multiple k’s. Custom algorithms and data structures optimized for external memory storage enable a full-scale taxonomic analysis without compromise on laptop, desktop, and HPCC.
Collapse
Affiliation(s)
- Silvio Weging
- Institute of Computer Science, Martin-Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, Halle, Germany
| | - Andreas Gogol-Döring
- Department of Mathematics, Natural Sciences and Computer Science, TH Mittelhessen University of Applied Sciences, Wiesenstraße 14, Gießen, Germany
| | - Ivo Grosse
- Institute of Computer Science, Martin-Luther University Halle-Wittenberg, Von-Seckendorff-Platz 1, Halle, Germany
| |
Collapse
|
13
|
Seiler E, Mehringer S, Darvish M, Turc E, Reinert K. Raptor: A fast and space-efficient pre-filter for querying very large collections of nucleotide sequences. iScience 2021; 24:102782. [PMID: 34337360 PMCID: PMC8313605 DOI: 10.1016/j.isci.2021.102782] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Revised: 06/07/2021] [Accepted: 06/21/2021] [Indexed: 12/20/2022] Open
Abstract
We present Raptor, a system for approximately searching many queries such as next-generation sequencing reads or transcripts in large collections of nucleotide sequences. Raptor uses winnowing minimizers to define a set of representative k-mers, an extension of the interleaved Bloom filters (IBFs) as a set membership data structure and probabilistic thresholding for minimizers. Our approach allows compression and partitioning of the IBF to enable the effective use of secondary memory. We test and show the performance and limitations of the new features using simulated and real datasets. Our data structure can be used to accelerate various core bioinformatics applications. We show this by re-implementing the distributed read mapping tool DREAM-Yara.
Collapse
Affiliation(s)
- Enrico Seiler
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Svenja Mehringer
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Mitra Darvish
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | | | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| |
Collapse
|
14
|
Rahman A, Chikhi R, Medvedev P. Disk compression of k-mer sets. Algorithms Mol Biol 2021; 16:10. [PMID: 34154632 PMCID: PMC8218509 DOI: 10.1186/s13015-021-00192-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 06/08/2021] [Indexed: 12/23/2022] Open
Abstract
K-mer based methods have become prevalent in many areas of bioinformatics. In applications such as database search, they often work with large multi-terabyte-sized datasets. Storing such large datasets is a detriment to tool developers, tool users, and reproducibility efforts. General purpose compressors like gzip, or those designed for read data, are sub-optimal because they do not take into account the specific redundancy pattern in k-mer sets. In our earlier work (Rahman and Medvedev, RECOMB 2020), we presented an algorithm UST-Compress that uses a spectrum-preserving string set representation to compress a set of k-mers to disk. In this paper, we present two improved methods for disk compression of k-mer sets, called ESS-Compress and ESS-Tip-Compress. They use a more relaxed notion of string set representation to further remove redundancy from the representation of UST-Compress. We explore their behavior both theoretically and on real data. We show that they improve the compression sizes achieved by UST-Compress by up to 27 percent, across a breadth of datasets. We also derive lower bounds on how well this type of compression strategy can hope to do.
Collapse
Affiliation(s)
| | - Rayan Chikhi
- Department of Computational Biology, C3BI USR 3756 CNRS, Institut Pasteur, Paris, France
| | | |
Collapse
|
15
|
Piro VC, Dadi TH, Seiler E, Reinert K, Renard BY. ganon: precise metagenomics classification against large and up-to-date sets of reference sequences. Bioinformatics 2021; 36:i12-i20. [PMID: 32657362 PMCID: PMC7355301 DOI: 10.1093/bioinformatics/btaa458] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION The exponential growth of assembled genome sequences greatly benefits metagenomics studies. However, currently available methods struggle to manage the increasing amount of sequences and their frequent updates. Indexing the current RefSeq can take days and hundreds of GB of memory on large servers. Few methods address these issues thus far, and even though many can theoretically handle large amounts of references, time/memory requirements are prohibitive in practice. As a result, many studies that require sequence classification use often outdated and almost never truly up-to-date indices. RESULTS Motivated by those limitations, we created ganon, a k-mer-based read classification tool that uses Interleaved Bloom Filters in conjunction with a taxonomic clustering and a k-mer counting/filtering scheme. Ganon provides an efficient method for indexing references, keeping them updated. It requires <55 min to index the complete RefSeq of bacteria, archaea, fungi and viruses. The tool can further keep these indices up-to-date in a fraction of the time necessary to create them. Ganon makes it possible to query against very large reference sets and therefore it classifies significantly more reads and identifies more species than similar methods. When classifying a high-complexity CAMI challenge dataset against complete genomes from RefSeq, ganon shows strongly increased precision with equal or better sensitivity compared with state-of-the-art tools. With the same dataset against the complete RefSeq, ganon improved the F1-score by 65% at the genus level. It supports taxonomy- and assembly-level classification, multiple indices and hierarchical classification. AVAILABILITY AND IMPLEMENTATION The software is open-source and available at: https://gitlab.com/rki_bioinformatics/ganon. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Vitor C Piro
- Bioinformatics Unit (MF1), Robert Koch Institute, Berlin 13353, Germany.,CAPES Foundation, Ministry of Education of Brazil, Brasília 70040-020, Brazil.,Data Analytics and Computational Statistics, Hasso Plattner Insititute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| | - Temesgen H Dadi
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
| | - Enrico Seiler
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin 14195, Germany
| | - Bernhard Y Renard
- Bioinformatics Unit (MF1), Robert Koch Institute, Berlin 13353, Germany.,Data Analytics and Computational Statistics, Hasso Plattner Insititute, Digital Engineering Faculty, University of Potsdam, Potsdam 14482, Germany
| |
Collapse
|
16
|
Marchet C, Boucher C, Puglisi SJ, Medvedev P, Salson M, Chikhi R. Data structures based on k-mers for querying large collections of sequencing data sets. Genome Res 2021; 31:1-12. [PMID: 33328168 PMCID: PMC7849385 DOI: 10.1101/gr.260604.119] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 09/14/2020] [Indexed: 12/19/2022]
Abstract
High-throughput sequencing data sets are usually deposited in public repositories (e.g., the European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached petabyte scale, repositories do not allow one to perform online sequence searches, yet, such a feature would be highly useful to investigators. Toward this goal, in the last few years several computational approaches have been introduced to index and query large collections of data sets. Here, we propose an accessible survey of these approaches, which are generally based on representing data sets as sets of k-mers. We review their properties, introduce a classification, and present their general intuition. We summarize their performance and highlight their current strengths and limitations.
Collapse
Affiliation(s)
- Camille Marchet
- Université de Lille, CNRS, CRIStAL UMR 9189, F-59000 Lille, France
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, Florida 32611, USA
| | - Simon J Puglisi
- Department of Computer Science, University of Helsinki, FI-00014, Helsinki, Finland
| | - Paul Medvedev
- Department of Computer Science, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
- Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Mikaël Salson
- Université de Lille, CNRS, CRIStAL UMR 9189, F-59000 Lille, France
| | - Rayan Chikhi
- Institut Pasteur & CNRS, C3BI USR 3756, F-75015 Paris, France
| |
Collapse
|
17
|
Mismatch-tolerant, alignment-free sequence classification using multiple spaced seeds and multiindex Bloom filters. Proc Natl Acad Sci U S A 2020; 117:16961-16968. [PMID: 32641514 PMCID: PMC7382288 DOI: 10.1073/pnas.1903436117] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Alignment-free classification tools have enabled high-throughput processing of sequencing data in many bioinformatics analysis pipelines primarily due to their computational efficiency. Originally k-mer based, such tools often lack sensitivity when faced with sequencing errors and polymorphisms. In response, some tools have been augmented with spaced seeds, which are capable of tolerating mismatches. However, spaced seeds have seen little practical use in classification because they bring increased computational and memory costs compared to methods that use k-mers. These limitations have also caused the design and length of practical spaced seeds to be constrained, since storing spaced seeds can be costly. To address these challenges, we have designed a probabilistic data structure called a multiindex Bloom Filter (miBF), which can store multiple spaced seed sequences with a low memory cost that remains static regardless of seed length or seed design. We formalize how to minimize the false-positive rate of miBFs when classifying sequences from multiple targets or references. Available within BioBloom Tools, we illustrate the utility of miBF in two use cases: read-binning for targeted assembly, and taxonomic read assignment. In our benchmarks, an analysis pipeline based on miBF shows higher sensitivity and specificity for read-binning than sequence alignment-based methods, also executing in less time. Similarly, for taxonomic classification, miBF enables higher sensitivity than a conventional spaced seed-based approach, while using half the memory and an order of magnitude less computational time.
Collapse
|
18
|
Elworth RAL, Wang Q, Kota PK, Barberan CJ, Coleman B, Balaji A, Gupta G, Baraniuk RG, Shrivastava A, Treangen T. To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics. Nucleic Acids Res 2020; 48:5217-5234. [PMID: 32338745 PMCID: PMC7261164 DOI: 10.1093/nar/gkaa265] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/20/2020] [Accepted: 04/04/2020] [Indexed: 02/01/2023] Open
Abstract
As computational biologists continue to be inundated by ever increasing amounts of metagenomic data, the need for data analysis approaches that keep up with the pace of sequence archives has remained a challenge. In recent years, the accelerated pace of genomic data availability has been accompanied by the application of a wide array of highly efficient approaches from other fields to the field of metagenomics. For instance, sketching algorithms such as MinHash have seen a rapid and widespread adoption. These techniques handle increasingly large datasets with minimal sacrifices in quality for tasks such as sequence similarity calculations. Here, we briefly review the fundamentals of the most impactful probabilistic and signal processing algorithms. We also highlight more recent advances to augment previous reviews in these areas that have taken a broader approach. We then explore the application of these techniques to metagenomics, discuss their pros and cons, and speculate on their future directions.
Collapse
Affiliation(s)
| | - Qi Wang
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
| | - Pavan K Kota
- Department of Bioengineering, Houston, TX 77005, USA
| | - C J Barberan
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Benjamin Coleman
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Advait Balaji
- Department of Computer Science, Houston, TX 77005, USA
| | - Gaurav Gupta
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Richard G Baraniuk
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Anshumali Shrivastava
- Department of Computer Science, Houston, TX 77005, USA
- Department of Electrical and Computer Engineering, Rice University, Houston, TX 77005, USA
| | - Todd J Treangen
- Department of Computer Science, Houston, TX 77005, USA
- Systems, Synthetic, and Physical Biology (SSPB) Graduate Program, Houston, TX 77005, USA
| |
Collapse
|
19
|
Seiler E, Trappe K, Renard BY. Where did you come from, where did you go: Refining metagenomic analysis tools for horizontal gene transfer characterisation. PLoS Comput Biol 2019; 15:e1007208. [PMID: 31335917 PMCID: PMC6677323 DOI: 10.1371/journal.pcbi.1007208] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 08/02/2019] [Accepted: 06/24/2019] [Indexed: 12/22/2022] Open
Abstract
Horizontal gene transfer (HGT) has changed the way we regard evolution. Instead of waiting for the next generation to establish new traits, especially bacteria are able to take a shortcut via HGT that enables them to pass on genes from one individual to another, even across species boundaries. The tool Daisy offers the first HGT detection approach based on read mapping that provides complementary evidence compared to existing methods. However, Daisy relies on the acceptor and donor organism involved in the HGT being known. We introduce DaisyGPS, a mapping-based pipeline that is able to identify acceptor and donor reference candidates of an HGT event based on sequencing reads. Acceptor and donor identification is akin to species identification in metagenomic samples based on sequencing reads, a problem addressed by metagenomic profiling tools. However, acceptor and donor references have certain properties such that these methods cannot be directly applied. DaisyGPS uses MicrobeGPS, a metagenomic profiling tool tailored towards estimating the genomic distance between organisms in the sample and the reference database. We enhance the underlying scoring system of MicrobeGPS to account for the sequence patterns in terms of mapping coverage of an acceptor and donor involved in an HGT event, and report a ranked list of reference candidates. These candidates can then be further evaluated by tools like Daisy to establish HGT regions. We successfully validated our approach on both simulated and real data, and show its benefits in an investigation of an outbreak involving Methicillin-resistant Staphylococcus aureus data.
Collapse
Affiliation(s)
- Enrico Seiler
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, and Algorithmic Bioinformatics, Institute for Bioinformatics, Freie Universität Berlin, Berlin, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
20
|
Gamaarachchi H, Parameswaran S, Smith MA. Featherweight long read alignment using partitioned reference indexes. Sci Rep 2019; 9:4318. [PMID: 30867495 PMCID: PMC6416333 DOI: 10.1038/s41598-019-40739-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 02/22/2019] [Indexed: 02/06/2023] Open
Abstract
The advent of Nanopore sequencing has realised portable genomic research and applications. However, state of the art long read aligners and large reference genomes are not compatible with most mobile computing devices due to their high memory requirements. We show how memory requirements can be reduced through parameter optimisation and reference genome partitioning, but highlight the associated limitations and caveats of these approaches. We then demonstrate how these issues can be overcome through an appropriate merging technique. We incorporated multi-index merging into the Minimap2 aligner and demonstrate that long read alignment to the human genome can be performed on a system with 2 GB RAM with negligible impact on accuracy.
Collapse
Affiliation(s)
- Hasindu Gamaarachchi
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria St, Darlinghurst, NSW, Australia.,School of Computer Science and Engineering, UNSW Sydney, Kensington, NSW, Australia
| | - Sri Parameswaran
- School of Computer Science and Engineering, UNSW Sydney, Kensington, NSW, Australia
| | - Martin A Smith
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, 370 Victoria St, Darlinghurst, NSW, Australia. .,St-Vincent's Clinical School, Faculty of Medicine, UNSW Sydney, Darlinghurst, NSW, Australia.
| |
Collapse
|