Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rajasekaran S, Jin X, Spouge JL. The efficient computation of position-specific match scores with the fast fourier transform. J Comput Biol 2002;9:23-33. [PMID: 11911793 DOI: 10.1089/10665270252833172] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

For:	Rajasekaran S, Jin X, Spouge JL. The efficient computation of position-specific match scores with the fast fourier transform. J Comput Biol 2002;9:23-33. [PMID: 11911793 DOI: 10.1089/10665270252833172] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Zhan Q, Fu Y, Jiang Q, Liu B, Peng J, Wang Y. SpliVert: A Protein Multiple Sequence Alignment Refinement Method Based on Splitting-Splicing Vertically. Protein Pept Lett 2020;27:295-302. [PMID: 31385760 DOI: 10.2174/0929866526666190806143959] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 04/26/2019] [Accepted: 06/14/2019] [Indexed: 11/22/2022]

Abstract

BACKGROUND

Multiple Sequence Alignment (MSA) is a fundamental task in bioinformatics and is required for many biological analysis tasks. The more accurate the alignments are, the more credible the downstream analyses. Most protein MSA algorithms realign an alignment to refine it by dividing it into two groups horizontally and then realign the two groups. However, this strategy does not consider that different regions of the sequences have different conservation; this property may lead to incorrect residue-residue or residue-gap pairs, which cannot be corrected by this strategy.

OBJECTIVE

In this article, our motivation is to develop a novel refinement method based on splitting- splicing vertically.

METHODS

Here, we present a novel refinement method based on splitting-splicing vertically, called SpliVert. For an alignment, we split it vertically into 3 parts, remove the gap characters in the middle, realign the middle part alone, and splice the realigned middle parts with the other two initial pieces to obtain a refined alignment. In the realign procedure of our method, the aligner will only focus on a certain part, ignoring the disturbance of the other parts, which could help fix the incorrect pairs.

RESULTS

We tested our refinement strategy for 2 leading MSA tools on 3 standard benchmarks, according to the commonly used average SP (and TC) score. The results show that given appropriate proportions to split the initial alignment, the average scores are increased comparably or slightly after using our method. We also compared the alignments refined by our method with alignments directly refined by the original alignment tools. The results suggest that using our SpliVert method to refine alignments can also outperform direct use of the original alignment tools.

CONCLUSION

The results reveal that splitting vertically and realigning part of the alignment is a good strategy for the refinement of protein multiple sequence alignments.

Collapse

Gao L, Bao W, Zhang H, Yuan CA, Huang DS. Fast sequence analysis based on diamond sampling. PLoS One 2018;13:e0198922. [PMID: 29953448 PMCID: PMC6023231 DOI: 10.1371/journal.pone.0198922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Accepted: 05/29/2018] [Indexed: 12/02/2022] Open

Afshar PT, Wong WH. COSINE: non-seeding method for mapping long noisy sequences. Nucleic Acids Res 2017;45:e132. [PMID: 28586438 PMCID: PMC5737678 DOI: 10.1093/nar/gkx511] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2016] [Revised: 05/16/2017] [Accepted: 06/04/2017] [Indexed: 11/20/2022] Open

Qiao W, Takayanagi K, Niu Q, Shofie M, Li YY. Long-term stability of thermophilic co-digestion submerged anaerobic membrane reactor encountering high organic loading rate, persistent propionate and detectable hydrogen in biogas. BIORESOURCE TECHNOLOGY 2013;149:92-102. [PMID: 24090872 DOI: 10.1016/j.biortech.2013.09.023] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/23/2013] [Revised: 09/02/2013] [Accepted: 09/04/2013] [Indexed: 06/02/2023]

Kaznadzey A, Alexandrova N, Novichkov V, Kaznadzey D. PSimScan: algorithm and utility for fast protein similarity search. PLoS One 2013;8:e58505. [PMID: 23505522 PMCID: PMC3591303 DOI: 10.1371/journal.pone.0058505] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2012] [Accepted: 02/07/2013] [Indexed: 01/19/2023] Open

SHU JIANJUN, LI YAJING. HYPERCOMPLEX CROSS-CORRELATION OF DNA SEQUENCES. J BIOL SYST 2011. [DOI: 10.1142/s0218339010003470] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Pizzi C, Rastas P, Ukkonen E. Finding significant matches of position weight matrices in linear time. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011;8:69-79. [PMID: 21071798 DOI: 10.1109/tcbb.2009.35] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

Ye J, Su LH, Chen CL, Hu S, Wang J, Yu J, Chiu CH. Analysis of pSC138, the multidrug resistance plasmid of Salmonella enterica serotype Choleraesuis SC-B67. Plasmid 2010;65:132-40. [PMID: 21111756 DOI: 10.1016/j.plasmid.2010.11.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2009] [Revised: 10/19/2010] [Accepted: 11/21/2010] [Indexed: 11/16/2022]

Abstract

Salmonella enterica serotype Choleraesuis (S. Choleraesuis) usually causes systemic infections in man and needs antimicrobial treatment. Multidrug resistance (MDR) in S. Choleraesuis is thus a great concern in the treatment of systemic non-typhoid salmonellosis. A large plasmid, pSC138, was identified in 2002 from a S. Choleraesuis strain SC-B67 that was resistant to all antimicrobial agents commonly used to treat salmonellosis, including ciprofloxacin and ceftriaxone. Complete DNA sequence of the plasmid had been determined previously (Chiu et al., 2005). In the present study, the sequence of pSC138 was reannotated in detail and compared with several newly sequenced plasmids. Some transposable elements and drug resistance genes were further delineated. Plasmid pSC138 was 138,742 bp in length and consisted of 177 open reading frames (ORFs). While 134 of the ORFs displayed significant identity levels to other plasmid and prokaryotic sequences, the remaining 43 ORFs have not been previously reported. Mobile elements, including two integrons, seven insertion sequences and eight transposons, and a truncated prophage together encompass at least 66,781 bp (48.1%) of the plasmid genome. The sequence of pSC138 consists of three major regions: a large composite transposable region Tn6088 with a Tn21-like backbone inserted by a variety of integrons or transposable elements; a transfer/maintenance region that contains a conserved ISEcp1-mediated transposon-like element Tn6092, carrying an AmpC gene, bla(CMY-2), that confers the ceftriaxone resistance; and a Rep_3 type of replication region. Another seven bacteremic strains of S. Choleraesuis that expressed the same MDR phenotype were identified during 2003-2008. The same Rep_3 type replicase and the bla(CMY-2)-containing, ISEcp1-mediated transposon-like element were found in the MDR isolates, suggesting a successful preservation and dissemination of the MDR plasmid. Comparison of pSC138 with other recently published plasmids revealed a high identity level between partial sequences of pSC138 and plasmids of the same or different incompatibility groups. The large MDR region found in pSC138 may provide a niche for the future evolution of the plasmid by acquisition of relevant resistance genes through the panoply of mobile elements and illegitimate recombination events.

Collapse

Beckstette M, Homann R, Giegerich R, Kurtz S. Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinformatics 2006;7:389. [PMID: 16930469 PMCID: PMC1635428 DOI: 10.1186/1471-2105-7-389] [Citation(s) in RCA: 102] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Accepted: 08/24/2006] [Indexed: 11/10/2022] Open

Abstract

Background

In biological sequence analysis, position specific scoring matrices (PSSMs) are widely used to represent sequence motifs in nucleotide as well as amino acid sequences. Searching with PSSMs in complete genomes or large sequence databases is a common, but computationally expensive task.

Results

We present a new non-heuristic algorithm, called ESAsearch, to efficiently find matches of PSSMs in large databases. Our approach preprocesses the search space, e.g., a complete genome or a set of protein sequences, and builds an enhanced suffix array that is stored on file. This allows the searching of a database with a PSSM in sublinear expected time. Since ESAsearch benefits from small alphabets, we present a variant operating on sequences recoded according to a reduced alphabet. We also address the problem of non-comparable PSSM-scores by developing a method which allows the efficient computation of a matrix similarity threshold for a PSSM, given an E-value or a p-value. Our method is based on dynamic programming and, in contrast to other methods, it employs lazy evaluation of the dynamic programming matrix. We evaluated algorithm ESAsearch with nucleotide PSSMs and with amino acid PSSMs. Compared to the best previous methods, ESAsearch shows speedups of a factor between 17 and 275 for nucleotide PSSMs, and speedups up to factor 1.8 for amino acid PSSMs. Comparisons with the most widely used programs even show speedups by a factor of at least 3.8. Alphabet reduction yields an additional speedup factor of 2 on amino acid sequences compared to results achieved with the 20 symbol standard alphabet. The lazy evaluation method is also much faster than previous methods, with speedups of a factor between 3 and 330.

Conclusion

Our analysis of ESAsearch reveals sublinear runtime in the expected case, and linear runtime in the worst case for sequences not shorter than |A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFaeFqaaa@3821@|^m+ m - 1, where m is the length of the PSSM and A MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBamrtHrhAL1wy0L2yHvtyaeHbnfgDOvwBHrxAJfwnaebbnrfifHhDYfgasaacH8akY=wiFfYdH8Gipec8Eeeu0xXdbba9frFj0=OqFfea0dXdd9vqai=hGuQ8kuc9pgc9s8qqaq=dirpe0xb9q8qiLsFr0=vr0=vr0dc8meaabaqaciaacaGaaeqabaWaaeGaeaaakeaaimaacqWFaeFqaaa@3821@ a finite alphabet. In practice, ESAsearch shows superior performance over the most widely used programs, especially for DNA sequences. The new algorithm for accurate on-the-fly calculations of thresholds has the potential to replace formerly used approximation approaches. Beyond the algorithmic contributions, we provide a robust, well documented, and easy to use software package, implementing the ideas and algorithms presented in this manuscript.

Collapse

Freschi V, Bogliolo A. Using sequence compression to speedup probabilistic profile matching. Bioinformatics 2005;21:2225-9. [PMID: 15713733 DOI: 10.1093/bioinformatics/bti323] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Katoh K, Misawa K, Kuma KI, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 2002;30:3059-66. [PMID: 12136088 PMCID: PMC135756 DOI: 10.1093/nar/gkf436] [Citation(s) in RCA: 9373] [Impact Index Per Article: 426.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open