Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Notredame C. Recent progress in multiple sequence alignment: a survey. Pharmacogenomics 2002;3:131-44. [PMID: 11966409 DOI: 10.1517/14622416.3.1.131] [Citation(s) in RCA: 215] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open

Number

Cited by Other Article(s)

João M, Sena AC, Rebello VEF. On closing the inopportune gap with consistency transformation and iterative refinement. PLoS One 2023;18:e0287483. [PMID: 37440507 PMCID: PMC10343097 DOI: 10.1371/journal.pone.0287483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2023] [Accepted: 06/06/2023] [Indexed: 07/15/2023] Open

Daviet B, Fernandez R, Cabrera-Bosquet L, Pradal C, Fournier C. PhenoTrack3D: an automatic high-throughput phenotyping pipeline to track maize organs over time. PLANT METHODS 2022;18:130. [PMID: 36482291 PMCID: PMC9730636 DOI: 10.1186/s13007-022-00961-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Accepted: 11/22/2022] [Indexed: 06/17/2023]

Abstract

BACKGROUND

High-throughput phenotyping platforms allow the study of the form and function of a large number of genotypes subjected to different growing conditions (GxE). A number of image acquisition and processing pipelines have been developed to automate this process, for micro-plots in the field and for individual plants in controlled conditions. Capturing shoot development requires extracting from images both the evolution of the 3D plant architecture as a whole, and a temporal tracking of the growth of its organs.

RESULTS

We propose PhenoTrack3D, a new pipeline to extract a 3D + t reconstruction of maize. It allows the study of plant architecture and individual organ development over time during the entire growth cycle. The method tracks the development of each organ from a time-series of plants whose organs have already been segmented in 3D using existing methods, such as Phenomenal [Artzet et al. in BioRxiv 1:805739, 2019] which was chosen in this study. First, a novel stem detection method based on deep-learning is used to locate precisely the point of separation between ligulated and growing leaves. Second, a new and original multiple sequence alignment algorithm has been developed to perform the temporal tracking of ligulated leaves, which have a consistent geometry over time and an unambiguous topological position. Finally, growing leaves are back-tracked with a distance-based approach. This pipeline is validated on a challenging dataset of 60 maize hybrids imaged daily from emergence to maturity in the PhenoArch platform (ca. 250,000 images). Stem tip was precisely detected over time (RMSE < 2.1 cm). 97.7% and 85.3% of ligulated and growing leaves respectively were assigned to the correct rank after tracking, on 30 plants × 43 dates. The pipeline allowed to extract various development and architecture traits at organ level, with good correlation to manual observations overall, on random subsets of 10-355 plants.

CONCLUSIONS

We developed a novel phenotyping method based on sequence alignment and deep-learning. It allows to characterise the development of maize architecture at organ level, automatically and at a high-throughput. It has been validated on hundreds of plants during the entire development cycle, showing its applicability on GxE analyses of large maize datasets.

Collapse

Zhang Y, Zhang Q, Zhou J, Zou Q. A survey on the algorithm and development of multiple sequence alignment. Brief Bioinform 2022;23:6546258. [PMID: 35272347 DOI: 10.1093/bib/bbac069] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 01/30/2022] [Accepted: 02/09/2022] [Indexed: 12/21/2022] Open

Slipknot or Crystallographic Error: A Computational Analysis of the Plasmodium falciparum DHFR Structural Folds. Int J Mol Sci 2022;23:ijms23031514. [PMID: 35163439 PMCID: PMC8835989 DOI: 10.3390/ijms23031514] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 01/21/2022] [Accepted: 01/25/2022] [Indexed: 01/12/2023] Open

Biological sequence analysis. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00003-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Li Y. Sequence Alignment with Q-Learning Based on the Actor-Critic Model. ACM T ASIAN LOW-RESO 2021. [DOI: 10.1145/3433540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Hu T, Li J, Zhou H, Li C, Holmes EC, Shi W. Bioinformatics resources for SARS-CoV-2 discovery and surveillance. Brief Bioinform 2021;22:631-641. [PMID: 33416890 PMCID: PMC7929396 DOI: 10.1093/bib/bbaa386] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 11/10/2020] [Accepted: 11/27/2020] [Indexed: 12/22/2022] Open

Paul L, Mudogo CN, Mtei KM, Machunda RL, Ntie-Kang F. A computer-based approach for developing linamarase inhibitory agents. PHYSICAL SCIENCES REVIEWS 2020. [DOI: 10.1515/psr-2019-0098] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

A bi-objective function optimization approach for multiple sequence alignment using genetic algorithm. Soft comput 2020. [DOI: 10.1007/s00500-020-04917-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

NestMSA: a new multiple sequence alignment algorithm. JOURNAL OF SUPERCOMPUTING 2020. [DOI: 10.1007/s11227-020-03206-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Caetano DS, Beaulieu JM. Comparative Analyses of Phenotypic Sequences Using Phylogenetic Trees. Am Nat 2020;195:E38-E50. [PMID: 32017626 DOI: 10.1086/706912] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Daoud M. The extension of the largest generalized-eigenvalue based distance metric D_ij (γ_1 ) in arbitrary feature spaces to classify composite data points. Genomics Inform 2019;17:e39. [PMID: 31896239 PMCID: PMC6944050 DOI: 10.5808/gi.2019.17.4.e39] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/14/2019] [Accepted: 10/14/2019] [Indexed: 11/20/2022] Open

ERES: an extended regular expression signature for polymorphic worm detection. JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES 2019. [DOI: 10.1007/s11416-019-00330-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Wang Y, Wu H, Cai Y. A benchmark study of sequence alignment methods for protein clustering. BMC Bioinformatics 2018;19:529. [PMID: 30598070 PMCID: PMC6311937 DOI: 10.1186/s12859-018-2524-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Rubio-Largo Á, Vanneschi L, Castelli M, Vega-Rodríguez MA. Multiobjective characteristic-based framework for very-large multiple sequence alignment. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2017.06.022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Rubio-Largo Á, Castelli M, Vanneschi L, Vega-Rodríguez MA. A Parallel Multiobjective Metaheuristic for Multiple Sequence Alignment. J Comput Biol 2018;25:1009-1022. [PMID: 29671616 DOI: 10.1089/cmb.2018.0031] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Ali MO, El-Adl MA, Ibrahim HMM, Elseedy YY, Rizk MA, El-Khodery SA. Molecular characterization of the vitamin D receptor (VDR) gene in Holstein cows. Res Vet Sci 2018;118:146-150. [PMID: 29433008 DOI: 10.1016/j.rvsc.2018.02.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Revised: 01/31/2018] [Accepted: 02/03/2018] [Indexed: 11/28/2022]

Disease Sequences High-Accuracy Alignment Based on the Precision Medicine. BIOMED RESEARCH INTERNATIONAL 2018;2018:1718046. [PMID: 29682519 PMCID: PMC5842723 DOI: 10.1155/2018/1718046] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 01/18/2018] [Indexed: 11/18/2022]

Song L, Wu S, Tsang A. Phylogenetic Analysis of Protein Family. Methods Mol Biol 2018;1775:267-275. [PMID: 29876824 DOI: 10.1007/978-1-4939-7804-5_21] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Rubio-Largo A, Vanneschi L, Castelli M, Vega-Rodriguez MA. A Characteristic-Based Framework for Multiple Sequence Aligners. IEEE TRANSACTIONS ON CYBERNETICS 2018;48:41-51. [PMID: 27831898 DOI: 10.1109/tcyb.2016.2621129] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Chowdhury B, Garai G. A review on multiple sequence alignment from the perspective of genetic algorithm. Genomics 2017;109:419-431. [PMID: 28669847 DOI: 10.1016/j.ygeno.2017.06.007] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 05/27/2017] [Accepted: 06/27/2017] [Indexed: 01/04/2023]

Guo D, Yuan E, Hu X, Wu X. Co-occurrence pattern mining based on a biological approximation scoring matrix. Pattern Anal Appl 2017. [DOI: 10.1007/s10044-017-0609-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Bernard G, Ragan MA, Chan CX. Recapitulating phylogenies using k-mers: from trees to networks. F1000Res 2016;5:2789. [PMID: 28105314 PMCID: PMC5224691 DOI: 10.12688/f1000research.10225.2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/20/2016] [Indexed: 02/04/2023] Open

Abstract

Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k-mers (subsequences at fixed length k). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel’s idea of ontogeny, we argue that genome phylogenies can be inferred using k-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.

Collapse

Rubio-Largo Á, Vega-Rodríguez MA, González-Álvarez DL. Hybrid multiobjective artificial bee colony for multiple sequence alignment. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2015.12.034] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Potha N, Maragoudakis M, Lyras D. A biology-inspired, data mining framework for extracting patterns in sexual cyberbullying data. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2015.12.021] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Zhu H, He Z, Jia Y. A Novel Approach to Multiple Sequence Alignment Using Multiobjective Evolutionary Algorithm Based on Decomposition. IEEE J Biomed Health Inform 2016;20:717-27. [DOI: 10.1109/jbhi.2015.2403397] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Al-Shatnawi M, Ahmad MO, Swamy MNS. MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions. BMC Bioinformatics 2015;16:393. [PMID: 26597571 PMCID: PMC4657235 DOI: 10.1186/s12859-015-0826-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2015] [Accepted: 11/14/2015] [Indexed: 11/16/2022] Open

Abstract

Background

The alignment of multiple protein sequences is one of the most commonly performed tasks in bioinformatics. In spite of considerable research and efforts that have been recently deployed for improving the performance of multiple sequence alignment (MSA) algorithms, finding a highly accurate alignment between multiple protein sequences is still a challenging problem.

Results

We propose a novel and efficient algorithm called, MSAIndelFR, for multiple sequence alignment using the information on the predicted locations of IndelFRs and the computed average log–loss values obtained from IndelFR predictors, each of which is designed for a different protein fold. We demonstrate that the introduction of a new variable gap penalty function based on the predicted locations of the IndelFRs and the computed average log–loss values into the proposed algorithm substantially improves the protein alignment accuracy. This is illustrated by evaluating the performance of the algorithm in aligning sequences belonging to the protein folds for which the IndelFR predictors already exist and by using the reference alignments of the four popular benchmarks, BAliBASE 3.0, OXBENCH, PREFAB 4.0, and SABRE (SABmark 1.65).

Conclusions

We have proposed a novel and efficient algorithm, the MSAIndelFR algorithm, for multiple protein sequence alignment incorporating a new variable gap penalty function. It is shown that the performance of the proposed algorithm is superior to that of the most–widely used alignment algorithms, Clustal W2, Clustal Omega, Kalign2, MSAProbs, MAFFT, MUSCLE, ProbCons and Probalign, in terms of both the sum–of–pairs and total column metrics.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0826-3) contains supplementary material, which is available to authorized users.

Collapse

Andreakis N, Høj L, Kearns P, Hall MR, Ericson G, Cobb RE, Gordon BR, Evans-Illidge E. Diversity of Marine-Derived Fungal Cultures Exposed by DNA Barcodes: The Algorithm Matters. PLoS One 2015;10:e0136130. [PMID: 26308620 PMCID: PMC4550264 DOI: 10.1371/journal.pone.0136130] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Accepted: 07/29/2015] [Indexed: 01/11/2023] Open

Abstract

Marine fungi are an understudied group of eukaryotic microorganisms characterized by unresolved genealogies and unstable classification. Whereas DNA barcoding via the nuclear ribosomal internal transcribed spacer (ITS) provides a robust and rapid tool for fungal species delineation, accurate classification of fungi is often arduous given the large number of partial or unknown barcodes and misidentified isolates deposited in public databases. This situation is perpetuated by a paucity of cultivable fungal strains available for phylogenetic research linked to these data sets. We analyze ITS barcodes produced from a subsample (290) of 1781 cultured isolates of marine-derived fungi in the Bioresources Library located at the Australian Institute of Marine Science (AIMS). Our analysis revealed high levels of under-explored fungal diversity. The majority of isolates were ascomycetes including representatives of the subclasses Eurotiomycetidae, Hypocreomycetidae, Sordariomycetidae, Pleosporomycetidae, Dothideomycetidae, Xylariomycetidae and Saccharomycetidae. The phylum Basidiomycota was represented by isolates affiliated with the genera Tritirachium and Tilletiopsis. BLAST searches revealed 26 unknown OTUs and 50 isolates corresponding to previously uncultured, unidentified fungal clones. This study makes a significant addition to the availability of barcoded, culturable marine-derived fungi for detailed future genomic and physiological studies. We also demonstrate the influence of commonly used alignment algorithms and genetic distance measures on the accuracy and comparability of estimating Operational Taxonomic Units (OTUs) by the automatic barcode gap finder (ABGD) method. Large scale biodiversity screening programs that combine datasets using algorithmic OTU delineation pipelines need to ensure compatible algorithms have been used because the algorithm matters.

Collapse

Garai G, Chowdhury B. A cascaded pairwise biomolecular sequence alignment technique using evolutionary algorithm. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2014.11.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Protein sectors: statistical coupling analysis versus conservation. PLoS Comput Biol 2015;11:e1004091. [PMID: 25723535 PMCID: PMC4344308 DOI: 10.1371/journal.pcbi.1004091] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Accepted: 12/15/2014] [Indexed: 11/19/2022] Open

Al-Shatnawi M, Ahmad MO, Swamy MNS. Prediction of Indel flanking regions in protein sequences using a variable-order Markov model. Bioinformatics 2015;31:40-7. [PMID: 25178462 DOI: 10.1093/bioinformatics/btu556] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Suplatov D, Voevodin V, Švedas V. Robust enzyme design: bioinformatic tools for improved protein stability. Biotechnol J 2014;10:344-55. [PMID: 25524647 DOI: 10.1002/biot.201400150] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 09/30/2014] [Accepted: 11/04/2014] [Indexed: 01/22/2023]

Three-dimensional protein structure prediction: Methods and computational strategies. Comput Biol Chem 2014;53PB:251-276. [DOI: 10.1016/j.compbiolchem.2014.10.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 10/03/2014] [Accepted: 10/07/2014] [Indexed: 01/01/2023]

Lyras DP, Metzler D. ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach. BMC Bioinformatics 2014;15:265. [PMID: 25099134 PMCID: PMC4133627 DOI: 10.1186/1471-2105-15-265] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Accepted: 07/29/2014] [Indexed: 11/16/2022] Open

Abstract

Background

Obtaining an accurate sequence alignment is fundamental for consistently analyzing biological data. Although this problem may be efficiently solved when only two sequences are considered, the exact inference of the optimal alignment easily gets computationally intractable for the multiple sequence alignment case. To cope with the high computational expenses, approximate heuristic methods have been proposed that address the problem indirectly by progressively aligning the sequences in pairs according to their relatedness. These methods however are not flexible to change the alignment of an already aligned group of sequences in the view of new data, resulting thus in compromises on the quality of the deriving alignment. In this paper we present ReformAlign, a novel meta-alignment approach that may significantly improve on the quality of the deriving alignments from popular aligners. We call ReformAlign a meta-aligner as it requires an initial alignment, for which a variety of alignment programs can be used. The main idea behind ReformAlign is quite straightforward: at first, an existing alignment is used to construct a standard profile which summarizes the initial alignment and then all sequences are individually re-aligned against the formed profile. From each sequence-profile comparison, the alignment of each sequence against the profile is recorded and the final alignment is indirectly inferred by merging all the individual sub-alignments into a unified set. The employment of ReformAlign may often result in alignments which are significantly more accurate than the starting alignments.

Results

We evaluated the effect of ReformAlign on the generated alignments from ten leading alignment methods using real data of variable size and sequence identity. The experimental results suggest that the proposed meta-aligner approach may often lead to statistically significant more accurate alignments. Furthermore, we show that ReformAlign results in more substantial improvement in cases where the starting alignment is of relatively inferior quality or when the input sequences are harder to align.

Conclusions

The proposed profile-based meta-alignment approach seems to be a promising and computationally efficient method that can be combined with practically all popular alignment methods and may lead to significant improvements in the generated alignments.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-265) contains supplementary material, which is available to authorized users.

Collapse

Kaya M, Sarhan A, Alhajj R. Multiple sequence alignment with affine gap by using multi-objective genetic algorithm. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2014;114:38-49. [PMID: 24534604 DOI: 10.1016/j.cmpb.2014.01.013] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2013] [Revised: 11/29/2013] [Accepted: 01/12/2014] [Indexed: 06/03/2023]

Yoon BJ. Sequence alignment by passing messages. BMC Genomics 2014;15 Suppl 1:S14. [PMID: 24564436 PMCID: PMC4046711 DOI: 10.1186/1471-2164-15-s1-s14] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

Roshan U. Multiple sequence alignment using Probcons and Probalign. Methods Mol Biol 2014;1079:147-153. [PMID: 24170400 DOI: 10.1007/978-1-62703-646-7_9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]

Réblová M, Réblová K. RNA secondary structure, an important bioinformatics tool to enhance multiple sequence alignment: a case study (Sordariomycetes, Fungi). Mycol Prog 2012. [DOI: 10.1007/s11557-012-0836-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Plyusnin I, Holm L. Comprehensive comparison of graph based multiple protein sequence alignment strategies. BMC Bioinformatics 2012;13:64. [PMID: 22540977 PMCID: PMC3375188 DOI: 10.1186/1471-2105-13-64] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2011] [Accepted: 04/29/2012] [Indexed: 12/03/2022] Open

Jagadeesh Chandra Bose R, van der Aalst WM. Process diagnostics using trace alignment: Opportunities, issues, and challenges. INFORM SYST 2012. [DOI: 10.1016/j.is.2011.08.003] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Wang CK, Broder U, Weeratunga SK, Gasser RB, Loukas A, Hofmann A. SBAL: a practical tool to generate and edit structure-based amino acid sequence alignments. ACTA ACUST UNITED AC 2012;28:1026-7. [PMID: 22332239 DOI: 10.1093/bioinformatics/bts035] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Law E, Ahn LV. Human Computation. ACTA ACUST UNITED AC 2011. [DOI: 10.2200/s00371ed1v01y201107aim013] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Magrane M. UniProt Knowledgebase: a hub of integrated protein data. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2011;2011:bar009. [PMID: 21447597 PMCID: PMC3070428 DOI: 10.1093/database/bar009] [Citation(s) in RCA: 1057] [Impact Index Per Article: 81.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Zhang Z, Wang Y, Wang L, Gao P. The combined effects of amino acid substitutions and indels on the evolution of structure within protein families. PLoS One 2010;5:e14316. [PMID: 21179197 PMCID: PMC3001449 DOI: 10.1371/journal.pone.0014316] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Accepted: 11/16/2010] [Indexed: 01/02/2023] Open

Dubbs AJ, Seiler BA, Magnasco MO. A Fast ℒp Spike Alignment Metric. Neural Comput 2010;22:2785-808. [DOI: 10.1162/neco_a_00026] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]

Sahraeian SME, Yoon BJ. PicXAA: greedy probabilistic construction of maximum expected accuracy alignment of multiple sequences. Nucleic Acids Res 2010;38:4917-28. [PMID: 20413579 PMCID: PMC2926610 DOI: 10.1093/nar/gkq255] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Revised: 03/25/2010] [Accepted: 03/26/2010] [Indexed: 11/13/2022] Open

Di Lena P, Margara L. Optimal global alignment of signals by maximization of Pearson correlation. INFORM PROCESS LETT 2010. [DOI: 10.1016/j.ipl.2010.05.024] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Notredame C. Computing multiple sequence/structure alignments with the T-coffee package. ACTA ACUST UNITED AC 2010;Chapter 3:3.8.1-3.8.25. [PMID: 20205190 DOI: 10.1002/0471250953.bi0308s29] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]

Kück P, Meusemann K, Dambach J, Thormann B, von Reumont BM, Wägele JW, Misof B. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Front Zool 2010;7:10. [PMID: 20356385 PMCID: PMC2867768 DOI: 10.1186/1742-9994-7-10] [Citation(s) in RCA: 150] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2009] [Accepted: 03/31/2010] [Indexed: 12/16/2022] Open

Abstract

Background

Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective.

Results

ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict.

Conclusions

Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.

Collapse

Yoo PD, Zhou BB, Zomaya AY. A modular kernel approach for integrative analysis of protein domain boundaries. BMC Genomics 2009;10 Suppl 3:S21. [PMID: 19958485 PMCID: PMC2788374 DOI: 10.1186/1471-2164-10-s3-s21] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open