Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rachtman E, Bafna V, Mirarab S. CONSULT: accurate contamination removal using locality-sensitive hashing. NAR Genom Bioinform 2021;3:lqab071. [PMID: 34377979 PMCID: PMC8340999 DOI: 10.1093/nargab/lqab071] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 06/30/2021] [Accepted: 07/19/2021] [Indexed: 12/27/2022] Open

For:	Rachtman E, Bafna V, Mirarab S. CONSULT: accurate contamination removal using locality-sensitive hashing. NAR Genom Bioinform 2021;3:lqab071. [PMID: 34377979 PMCID: PMC8340999 DOI: 10.1093/nargab/lqab071] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 06/30/2021] [Accepted: 07/19/2021] [Indexed: 12/27/2022] Open

Number

Cited by Other Article(s)

Şapcı AOB, Mirarab S. Memory-bound k-mer selection for large and evolutionarily diverse reference libraries. Genome Res 2024;34:1455-1467. [PMID: 39209553 PMCID: PMC11529837 DOI: 10.1101/gr.279339.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 08/06/2024] [Indexed: 09/04/2024]

Edwards SV, Cloutier A, Cockburn G, Driver R, Grayson P, Katoh K, Baldwin MW, Sackton TB, Baker AJ. A nuclear genome assembly of an extinct flightless bird, the little bush moa. SCIENCE ADVANCES 2024;10:eadj6823. [PMID: 38781323 PMCID: PMC11809649 DOI: 10.1126/sciadv.adj6823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 04/17/2024] [Indexed: 05/25/2024]

Şapcı AOB, Rachtman E, Mirarab S. CONSULT-II: accurate taxonomic identification and profiling using locality-sensitive hashing. Bioinformatics 2024;40:btae150. [PMID: 38492564 PMCID: PMC10985673 DOI: 10.1093/bioinformatics/btae150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 02/17/2024] [Accepted: 03/14/2024] [Indexed: 03/18/2024] Open

Chorlton SD. Ten common issues with reference sequence databases and how to mitigate them. FRONTIERS IN BIOINFORMATICS 2024;4:1278228. [PMID: 38560517 PMCID: PMC10978663 DOI: 10.3389/fbinf.2024.1278228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024] Open

Bálint B, Merényi Z, Hegedüs B, Grigoriev IV, Hou Z, Földi C, Nagy LG. ContScout: sensitive detection and removal of contamination from annotated genomes. Nat Commun 2024;15:936. [PMID: 38296951 PMCID: PMC10831095 DOI: 10.1038/s41467-024-45024-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 01/08/2024] [Indexed: 02/02/2024] Open

Alvarez RV, Landsman D. GTax: improving de novo transcriptome assembly by removing foreign RNA contamination. Genome Biol 2024;25:12. [PMID: 38191464 PMCID: PMC10773103 DOI: 10.1186/s13059-023-03141-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 12/08/2023] [Indexed: 01/10/2024] Open

Mirarab S, Bafna V. Analyses of Nuclear Reads Obtained Using Genome Skimming. Methods Mol Biol 2024;2744:247-265. [PMID: 38683324 DOI: 10.1007/978-1-0716-3581-0_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]

Zheng H, Marçais G, Kingsford C. Creating and Using Minimizer Sketches in Computational Genomics. J Comput Biol 2023;30:1251-1276. [PMID: 37646787 PMCID: PMC11082048 DOI: 10.1089/cmb.2023.0094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023] Open

Rumbavicius I, Rounge TB, Rognes T. HoCoRT: host contamination removal tool. BMC Bioinformatics 2023;24:371. [PMID: 37784008 PMCID: PMC10544359 DOI: 10.1186/s12859-023-05492-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 09/21/2023] [Indexed: 10/04/2023] Open

Abstract

BACKGROUND

Shotgun metagenome sequencing data obtained from a host environment will usually be contaminated with sequences from the host organism. Host sequences should be removed before further analysis to avoid biases, reduce downstream computational load, or ensure privacy in the case of a human host. The tools that we identified, as designed specifically to perform host contamination sequence removal, were either outdated, not maintained, or complicated to use. Consequently, we have developed HoCoRT, a fast and user-friendly tool that implements several methods for optimised host sequence removal. We have evaluated the speed and accuracy of these methods.

RESULTS

HoCoRT is an open-source command-line tool for host contamination removal. It is designed to be easy to install and use, offering a one-step option for genome indexing. HoCoRT employs a variety of well-known mapping, classification, and alignment methods to classify reads. The user can select the underlying classification method and its parameters, allowing adaptation to different scenarios. Based on our investigation of various methods and parameters using synthetic human gut and oral microbiomes, and on assessment of publicly available data, we provide recommendations for typical datasets with short and long reads.

CONCLUSIONS

To decontaminate a human gut microbiome with short reads using HoCoRT, we found the optimal combination of speed and accuracy with BioBloom, Bowtie2 in end-to-end mode, and HISAT2. Kraken2 consistently demonstrated the highest speed, albeit with a trade-off in accuracy. The same applies to an oral microbiome, but here Bowtie2 was notably slower than the other tools. For long reads, the detection of human host reads is more difficult. In this case, a combination of Kraken2 and Minimap2 achieved the highest accuracy and detected 59% of human reads. In comparison to the dedicated DeconSeq tool, HoCoRT using Bowtie2 in end-to-end mode proved considerably faster and slightly more accurate. HoCoRT is available as a Bioconda package, and the source code can be accessed at https://github.com/ignasrum/hocort along with the documentation. It is released under the MIT licence and is compatible with Linux and macOS (except for the BioBloom module).

Collapse

Rachtman E, Sarmashghi S, Bafna V, Mirarab S. Quantifying the uncertainty of assembly-free genome-wide distance estimates and phylogenetic relationships using subsampling. Cell Syst 2022;13:817-829.e3. [PMID: 36265468 PMCID: PMC9589918 DOI: 10.1016/j.cels.2022.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 03/14/2022] [Accepted: 06/28/2022] [Indexed: 01/26/2023]

Jin P, Dai J, Guo Y, Wang X, Lu J, Zhu Y, Yu F. Genomic Analysis of Mycobacterium abscessus Complex Isolates from Patients with Pulmonary Infection in China. Microbiol Spectr 2022;10:e0011822. [PMID: 35863029 PMCID: PMC9430165 DOI: 10.1128/spectrum.00118-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 06/19/2022] [Indexed: 11/20/2022] Open

Abstract

Members of the Mycobacterium abscessus complex (MABC) are multidrug-resistant nontuberculous mycobacteria and increasingly cause opportunistic pulmonary infections. However, the genetic typing of MABC isolates remains largely unclear in China. Genomic analyses were conducted for 69 MABC clinical isolates obtained from patients with lower respiratory tract infections in Shanghai Pulmonary Hospital between 2014 and 2016. The draft genomes of the 69 clinical strains were assembled, with a total length of 4.5 to 5.6 Mb, a percent GC content (GC%) ranging from 63.9 to 68.1%, and 4,492 to 5,404 genes per genome. Susceptibility test shows that most isolates are resistant to many antimicrobials, including clarithromycin, but susceptible to tigecycline. Analyses revealed the presence of genes conferring resistance to antibiotics, including macrolides, aminoglycosides, rifampicin, and tetracyclines. Furthermore, 80 to 114 virulence genes were identified per genome, including those related to the invasion of macrophages, iron incorporation, and avoidance of immune clearance. Mobile genetic elements, including insertion sequences, transposons, and genomic islands, were discovered in the genomes. Phylogenetic analyses of all MABC isolates with another 41 complete MABC genomes identified three clades; 46 isolates were clustered in clade I, corresponding to M. abscessus subsp. abscessus, and 25 strains belonged to existing clonal complexes. Overall, this is the first comparative genomic analysis of MABC clinical isolates in China. These results show significant intraspecies variations in genetic determinants encoding antimicrobial resistance, virulence, and mobile elements and controversial subspecies classification using current marker gene combinations. This information will be useful in understanding the evolution, antimicrobial resistance, and pathogenesis of MABC strains and facilitating future vaccine development and drug design. IMPORTANCE Over the past decade, infections by Mycobacterium abscessus complex (MABC) isolates have been increasingly reported worldwide. MABC strains often show a high incidence in cystic fibrosis (CF) patients, whereas in Asia, these strains are frequently recovered from non-CF patients with significant genomic diversity. The present work involves analyses of the antimicrobial resistance, virulence, and phylogeny of 69 selected MABC isolates from non-CF pulmonary patients in Shanghai Pulmonary Hospital by whole-genome sequencing; it represents the first comprehensive investigation of MABC strains in China at the genomic level. These findings highlight the diversity of this group of nontuberculous mycobacteria and provide a mechanistic understanding of evolution and pathogenesis, which is valuable for the development of novel and effective antimicrobial therapies for deadly MABC infections in China.

Collapse

Cornet L, Baurain D. Contamination detection in genomic data: more is not enough. Genome Biol 2022;23:60. [PMID: 35189924 PMCID: PMC8862208 DOI: 10.1186/s13059-022-02619-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 01/18/2022] [Indexed: 12/20/2022] Open

Sarmashghi S, Balaban M, Rachtman E, Touri B, Mirarab S, Bafna V. Estimating repeat spectra and genome length from low-coverage genome skims with RESPECT. PLoS Comput Biol 2021;17:e1009449. [PMID: 34780468 PMCID: PMC8629397 DOI: 10.1371/journal.pcbi.1009449] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 11/29/2021] [Accepted: 09/13/2021] [Indexed: 01/26/2023] Open

Abstract

The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome-skims) could be transformative for genomic ecology, and results using k-mers have shown the advantage of this approach in identification and phylogenetic placement of eukaryotic species. Here, we revisit the basic question of estimating genomic parameters such as genome length, coverage, and repeat structure, focusing specifically on estimating the k-mer repeat spectrum. We show using a mix of theoretical and empirical analysis that there are fundamental limitations to estimating the k-mer spectra due to ill-conditioned systems, and that has implications for other genomic parameters. We get around this problem using a novel constrained optimization approach (Spline Linear Programming), where the constraints are learned empirically. On reads simulated at 1X coverage from 66 genomes, our method, REPeat SPECTra Estimation (RESPECT), had 2.2% error in length estimation compared to 27% error previously achieved. In shotgun sequenced read samples with contaminants, RESPECT length estimates had median error 4%, in contrast to other methods that had median error 80%. Together, the results suggest that low-pass genomic sequencing can yield reliable estimates of the length and repeat content of the genome. The RESPECT software will be publicly available at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_shahab-2Dsarmashghi_RESPECT.git&d=DwIGAw&c=-35OiAkTchMrZOngvJPOeA&r=ZozViWvD1E8PorCkfwYKYQMVKFoEcqLFm4Tg49XnPcA&m=f-xS8GMHKckknkc7Xpp8FJYw_ltUwz5frOw1a5pJ81EpdTOK8xhbYmrN4ZxniM96&s=717o8hLR1JmHFpRPSWG6xdUQTikyUjicjkipjFsKG4w&e=.

The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome skims) could be transformative for genomic ecology. Analyzing genome skims, mostly based on statistics of small oligomers, remains challenging, but recent results have shown the advantage of this approach for the identification and phylogenetic placement of eukaryotic species. In this paper, we present a method, RESPECT, to estimate genomic properties such as genome length and repetitiveness from low-coverage genome skims. We trained RESPECT using assembled genomes and tested it on low-coverage simulated and real reads. Benchmarking results reveal that RESPECT has excellent accuracy in estimating the genome length compared to other methods, and can provide critical information regarding the repeat structure of the genome.

Collapse