Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Yim WC, Cushman JC. Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments. PeerJ 2017;5:e3486. [PMID: 28652936 PMCID: PMC5483034 DOI: 10.7717/peerj.3486] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 05/31/2017] [Indexed: 12/02/2022] Open

Number

Cited by Other Article(s)

Cheng T, Chin PJ, Cha K, Petrick N, Mikailov M. Profiling the BLAST bioinformatics application for load balancing on high-performance computing clusters. BMC Bioinformatics 2022;23:544. [PMID: 36526957 PMCID: PMC9758941 DOI: 10.1186/s12859-022-05029-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 10/31/2022] [Indexed: 12/23/2022] Open

Abstract

BACKGROUND

The Basic Local Alignment Search Tool (BLAST) is a suite of commonly used algorithms for identifying matches between biological sequences. The user supplies a database file and query file of sequences for BLAST to find identical sequences between the two. The typical millions of database and query sequences make BLAST computationally challenging but also well suited for parallelization on high-performance computing clusters. The efficacy of parallelization depends on the data partitioning, where the optimal data partitioning relies on an accurate performance model. In previous studies, a BLAST job was sped up by 27 times by partitioning the database and query among thousands of processor nodes. However, the optimality of the partitioning method was not studied. Unlike BLAST performance models proposed in the literature that usually have problem size and hardware configuration as the only variables, the execution time of a BLAST job is a function of database size, query size, and hardware capability. In this work, the nucleotide BLAST application BLASTN was profiled using three methods: shell-level profiling with the Unix "time" command, code-level profiling with the built-in "profiler" module, and system-level profiling with the Unix "gprof" program. The runtimes were measured for six node types, using six different database files and 15 query files, on a heterogeneous HPC cluster with 500+ nodes. The empirical measurement data were fitted with quadratic functions to develop performance models that were used to guide the data parallelization for BLASTN jobs.

RESULTS

Profiling results showed that BLASTN contains more than 34,500 different functions, but a single function, RunMTBySplitDB, takes 99.12% of the total runtime. Among its 53 child functions, five core functions were identified to make up 92.12% of the overall BLASTN runtime. Based on the performance models, static load balancing algorithms can be applied to the BLASTN input data to minimize the runtime of the longest job on an HPC cluster. Four test cases being run on homogeneous and heterogeneous clusters were tested. Experiment results showed that the runtime can be reduced by 81% on a homogeneous cluster and by 20% on a heterogeneous cluster by re-distributing the workload.

DISCUSSION

Optimal data partitioning can improve BLASTN's overall runtime 5.4-fold in comparison with dividing the database and query into the same number of fragments. The proposed methodology can be used in the other applications in the BLAST+ suite or any other application as long as source code is available.

Collapse

Yim WC, Swain ML, Ma D, An H, Bird KA, Curdie DD, Wang S, Ham HD, Luzuriaga-Neira A, Kirkwood JS, Hur M, Solomon JKQ, Harper JF, Kosma DK, Alvarez-Ponce D, Cushman JC, Edger PP, Mason AS, Pires JC, Tang H, Zhang X. The final piece of the Triangle of U: Evolution of the tetraploid Brassica carinata genome. Plant Cell 2022;34:4143-4172. [PMID: 35961044 PMCID: PMC9614464 DOI: 10.1093/plcell/koac249] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Accepted: 06/24/2022] [Indexed: 05/05/2023]

Affiliation(s)

Won Cheol Yim Author for correspondence:
Mia L Swain Author for correspondence:
Dongna Ma Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization, Fujian Agriculture and Forestry University, Fuzhou, China
Hong An Division of Biological Sciences, University of Missouri, Columbia, Missouri 65201, USA
Kevin A Bird Department of Horticulture, Michigan State University, East Lansing, Michigan 48824, USA
David D Curdie Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
Samuel Wang Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
Hyun Don Ham Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
Agusto Luzuriaga-Neira Department of Biology, University of Nevada, Reno, Nevada 89557, USA
Jay S Kirkwood Metabolomics Core Facility, Institute for Integrative Genome Biology, University of California, Riverside, California 92521, USA
Manhoi Hur Metabolomics Core Facility, Institute for Integrative Genome Biology, University of California, Riverside, California 92521, USA
Juan K Q Solomon Department of Agriculture, Veterinary & Rangeland Sciences, University of Nevada, Reno, Nevada 89557, USA
Jeffrey F Harper Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
Dylan K Kosma Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
David Alvarez-Ponce Department of Biology, University of Nevada, Reno, Nevada 89557, USA
John C Cushman Department of Biochemistry and Molecular Biology, University of Nevada, Reno, Nevada 89557, USA
Patrick P Edger Department of Horticulture, Michigan State University, East Lansing, Michigan 48824, USA
Annaliese S Mason Plant Breeding Department, INRES, The University of Bonn, Bonn 53115, Germany
J Chris Pires Division of Biological Sciences, Bond Life Sciences Center, , University of Missouri, Columbia, Missouri 65211, USA
Haibao Tang Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization, Fujian Agriculture and Forestry University, Fuzhou, China
Xingtan Zhang Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Ministry of Education for Genetics, Breeding and Multiple Utilization of Crops, Key Laboratory of National Forestry and Grassland Administration for Orchid Conservation and Utilization, Fujian Agriculture and Forestry University, Fuzhou, China

Collapse

Guerrero-Araya E, Muñoz M, Rodríguez C, Paredes-Sabja D. FastMLST: A Multi-core Tool for Multilocus Sequence Typing of Draft Genome Assemblies. Bioinform Biol Insights 2021;15:11779322211059238. [PMID: 34866905 PMCID: PMC8637782 DOI: 10.1177/11779322211059238] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Accepted: 10/19/2021] [Indexed: 11/21/2022] Open

Franco‐Sierra ND, Díaz‐Nieto JF. Rapid mitochondrial genome sequencing based on Oxford Nanopore Sequencing and a proxy for vertebrate species identification. Ecol Evol 2020;10:3544-3560. [PMID: 32274008 PMCID: PMC7141017 DOI: 10.1002/ece3.6151] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 02/09/2020] [Accepted: 02/12/2020] [Indexed: 02/06/2023] Open

Shirshikov FV, Pekov YA, Miroshnikov KA. MorphoCatcher: a multiple-alignment based web tool for target selection and designing taxon-specific primers in the loop-mediated isothermal amplification method. PeerJ 2019;7:e6801. [PMID: 31086739 PMCID: PMC6487805 DOI: 10.7717/peerj.6801] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Accepted: 03/18/2019] [Indexed: 11/20/2022] Open

Abstract

BACKGROUND

Advantages of loop-mediated isothermal amplification in molecular diagnostics allow to consider the method as a promising technology of nucleic acid detection in agriculture and medicine. A bioinformatics tool that provides rapid screening and selection of target nucleotide sequences with subsequent taxon-specific primer design toward polymorphic orthologous genes, not only unique or conserved common regions of genome, would contribute to the development of more specific and sensitive diagnostic assays. However, considering features of the original software for primer selection, also known as the PrimerExplorer (Eiken Chemical Co. LTD, Tokyo, Japan), the taxon-specific primer design using multiple sequence alignments of orthologs or even viral genomes with conservative architecture is still complicated.

FINDINGS

Here, MorphoCatcher is introduced as a fast and simple web plugin for PrimerExplorer with a clear interface. It enables an execution of multiple-alignment based search of taxon-specific mutations, visual screening and selection of target sequences, and easy-to-start specific primer design using the PrimerExplorer software. The combination of MorphoCatcher and PrimerExplorer allows to perform processing of the multiple alignments of orthologs for informative sliding-window plot analysis, which is used to identify the sequence regions with a high density of taxon-specific mutations and cover them by the primer ends for better specificity of amplification.

CONCLUSIONS

We hope that this new bioinformatics tool developed for target selection and taxon-specific primer design, called the MorphoCatcher, will gain more popularity of the loop-mediated isothermal amplification method for molecular diagnostics community. MorphoCatcher is a simple web plugin tool for the PrimerExplorer software which is freely available only for non-commercial and academic users at http://morphocatcher.ru.

Collapse

Jung J, Yi G. A performance analysis of genome search by matching whole targeted reads on different environments. Soft comput 2018. [DOI: 10.1007/s00500-018-3573-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]