1
|
Rehman HA, Zafar K, Khan A, Imtiaz A. Multiple sequence alignment using enhanced bird swarm align algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-210055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Discovering structural, functional and evolutionary information in biological sequences have been considered as a core research area in Bioinformatics. Multiple Sequence Alignment (MSA) tries to align all sequences in a given query set to provide us ease in annotation of new sequences. Traditional methods to find the optimal alignment are computationally expensive in real time. This research presents an enhanced version of Bird Swarm Algorithm (BSA), based on bio inspired optimization. Enhanced Bird Swarm Align Algorithm (EBSAA) is proposed for multiple sequence alignment problem to determine the optimal alignment among different sequences. Twenty-one different datasets have been used in order to compare performance of EBSAA with Genetic Algorithm (GA) and Particle Swarm Align Algorithm (PSAA). The proposed technique results in better alignment as compared to GA and PSAA in most of the cases.
Collapse
Affiliation(s)
- Hafiz Asadul Rehman
- Department of Computer Science, NationalUniversity of Computer and Emerging Science Lahore, Pakistan
| | - Kashif Zafar
- Department of Computer Science, NationalUniversity of Computer and Emerging Science Lahore, Pakistan
| | - Ayesha Khan
- University of Management & Technology, Lahore, Pakistan
| | | |
Collapse
|
2
|
Naznooshsadat E, Elham P, Ali SZ. FAME: fast and memory efficient multiple sequences alignment tool through compatible chain of roots. Bioinformatics 2020; 36:3662-3668. [PMID: 32170927 DOI: 10.1093/bioinformatics/btaa175] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 02/10/2020] [Accepted: 03/12/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Multiple sequence alignment (MSA) is important and challenging problem of computational biology. Most of the existing methods can only provide a short length multiple alignments in an acceptable time. Nevertheless, when the researchers confront the genome size in the multiple alignments, the process has required a huge processing space/time. Accordingly, using the method that can align genome size rapidly and precisely has a great effect, especially on the analysis of the very long alignments. Herein, we have proposed an efficient method, called FAME, which vertically divides sequences from the places that they have common areas; then they are arranged in consecutive order. Then these common areas are shifted and placed under each other, and the subsequences between them are aligned using any existing MSA tool. RESULTS The results demonstrate that the combination of FAME and the MSA methods and deploying minimizer are capable to be executed on personal computer and finely align long length sequences with much higher sum-of-pair (SP) score compared to the standalone MSA tools. As we select genomic datasets with longer length, the SP score of the combinatorial methods is gradually improved. The calculated computational complexity of methods supports the results in a way that combining FAME and the MSA tools leads to at least four times faster execution on the datasets. AVAILABILITY AND IMPLEMENTATION The source code and all datasets and run-parameters are accessible free on http://github.com/naznoosh/msa. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Etminan Naznooshsadat
- Department of Computer Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran
| | - Parvinnia Elham
- Department of Computer Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran
| | - Sharifi-Zarchi Ali
- Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
| |
Collapse
|
3
|
Chang JM, Floden EW, Herrero J, Gascuel O, Di Tommaso P, Notredame C. Incorporating alignment uncertainty into Felsenstein's phylogenetic bootstrap to improve its reliability. Bioinformatics 2019; 37:1506-1514. [PMID: 30726875 PMCID: PMC8275982 DOI: 10.1093/bioinformatics/btz082] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 12/12/2018] [Accepted: 02/05/2019] [Indexed: 12/30/2022] Open
Abstract
Motivation Most evolutionary analyses are based on pre-estimated multiple sequence alignment. Wong et al. established the existence of an uncertainty induced by multiple sequence alignment when reconstructing phylogenies. They were able to show that in many cases different aligners produce different phylogenies, with no simple objective criterion sufficient to distinguish among these alternatives. Results We demonstrate that incorporating MSA induced uncertainty into bootstrap sampling can significantly increase correlation between clade correctness and its corresponding bootstrap value. Our procedure involves concatenating several alternative multiple sequence alignments of the same sequences, produced using different commonly used aligners. We then draw bootstrap replicates while favoring columns of the more unique aligner among the concatenated aligners. We named this concatenation and bootstrapping method, Weighted Partial Super Bootstrap (wpSBOOT). We show on three simulated datasets of 16, 32 and 64 tips that our method improves the predictive power of bootstrap values. We also used as a benchmark an empirical collection of 853 1-to-1 orthologous genes from seven yeast species and found wpSBOOT to significantly improve discrimination capacity between topologically correct and incorrect trees. Bootstrap values of wpSBOOT are comparable to similar readouts estimated using a single method. However, for reduced trees by 50% and 95% bootstrap thresholds, wpSBOOT comes out the lowest Type I error (less FP). Availability The automated generation of replicates has been implemented in the T-Coffee package, which is available as open source freeware available from www.tcoffee.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jia-Ming Chang
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Evan W Floden
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Olivier Gascuel
- Unité Bioinformatique Evolutive, Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI)-USR 3756 CNRS and Institut Pasteur, Paris, France
| | - Paolo Di Tommaso
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Cedric Notredame
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| |
Collapse
|
4
|
Chowdhury B, Garai G. A review on multiple sequence alignment from the perspective of genetic algorithm. Genomics 2017; 109:419-431. [PMID: 28669847 DOI: 10.1016/j.ygeno.2017.06.007] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 05/27/2017] [Accepted: 06/27/2017] [Indexed: 01/04/2023]
Abstract
Sequence alignment is an active research area in the field of bioinformatics. It is also a crucial task as it guides many other tasks like phylogenetic analysis, function, and/or structure prediction of biological macromolecules like DNA, RNA, and Protein. Proteins are the building blocks of every living organism. Although protein alignment problem has been studied for several decades, unfortunately, every available method produces alignment results differently for a single alignment problem. Multiple sequence alignment is characterized as a very high computational complex problem. Many stochastic methods, therefore, are considered for improving the accuracy of alignment. Among them, many researchers frequently use Genetic Algorithm. In this study, we have shown different types of the method applied in alignment and the recent trends in the multiobjective genetic algorithm for solving multiple sequence alignment. Many recent studies have demonstrated considerable progress in finding the alignment accuracy.
Collapse
Affiliation(s)
- Biswanath Chowdhury
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, WB, 700009, India.
| | - Gautam Garai
- Computational Sciences Division, Saha Institute of Nuclear Physics, Kolkata, WB 700064, India.
| |
Collapse
|
5
|
Zheng W, Li K, Li K, So HC. A Modified Multiple Alignment Fast Fourier Transform with Higher Efficiency. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:634-645. [PMID: 26890922 DOI: 10.1109/tcbb.2016.2530064] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Multiple sequence alignment (MSA) is the most common task in bioinformatics. Multiple alignment fast Fourier transform (MAFFT) is the fastest MSA program among those the accuracy of the resulting alignments can be comparable with the most accurate MSA programs. In this paper, we modify the correlation computation scheme of the MAFFT for further efficiency improvement in three aspects. First, novel complex number based amino acid and nucleotide expressions are utilized in the modified correlation. Second, linear convolution with a limitation is proposed for computing the correlation of amino acid and nucleotide sequences. Third, we devise a fast Fourier transform (FFT) algorithm for computing linear convolution. The FFT algorithm is based on conjugate pair split-radix FFT and does not require the permutation of order, and it is new as only real parts of the final outputs are required. Simulation results show that the speed of the modified scheme is 107.58 to 365.74 percent faster than that of the original MAFFT for one execution of the function Falign() of MAFFT, indicating its faster realization.
Collapse
|
6
|
Nguyen KD, Pan Y. A knowledge-based multiple-sequence alignment algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:884-896. [PMID: 24334383 DOI: 10.1109/tcbb.2013.102] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
A common and cost-effective mechanism to identify the functionalities, structures, or relationships between species is multiple-sequence alignment, in which DNA/RNA/protein sequences are arranged and aligned so that similarities between sequences are clustered together. Correctly identifying and aligning these sequence biological similarities help from unwinding the mystery of species evolution to drug design. We present our knowledge-based multiple sequence alignment (KB-MSA) technique that utilizes the existing knowledge databases such as SWISSPROT, GENBANK, or HOMSTRAD to provide a more realistic and reliable sequence alignment. We also provide a modified version of this algorithm (CB-MSA) that utilizes the sequence consistency information when sequence knowledge databases are not available. Our benchmark tests on BAliBASE, PREFAB, HOMSTRAD, and SABMARK references show accuracy improvements up to 10 percent on twilight data sets against many leading alignment tools such as ISPALIGN, PADT, CLUSTALW, MAFFT, PROBCONS, and T-COFFEE.
Collapse
Affiliation(s)
| | - Yi Pan
- Georgia State University, Atlanta
| |
Collapse
|
7
|
Engle EK, Fay JC. Divergence of the yeast transcription factor FZF1 affects sulfite resistance. PLoS Genet 2012; 8:e1002763. [PMID: 22719269 PMCID: PMC3375221 DOI: 10.1371/journal.pgen.1002763] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2012] [Accepted: 04/26/2012] [Indexed: 01/06/2023] Open
Abstract
Changes in gene expression are commonly observed during evolution. However, the phenotypic consequences of expression divergence are frequently unknown and difficult to measure. Transcriptional regulators provide a mechanism by which phenotypic divergence can occur through multiple, coordinated changes in gene expression during development or in response to environmental changes. Yet, some changes in transcriptional regulators may be constrained by their pleiotropic effects on gene expression. Here, we use a genome-wide screen for promoters that are likely to have diverged in function and identify a yeast transcription factor, FZF1, that has evolved substantial differences in its ability to confer resistance to sulfites. Chimeric alleles from four Saccharomyces species show that divergence in FZF1 activity is due to changes in both its coding and upstream noncoding sequence. Between the two closest species, noncoding changes affect the expression of FZF1, whereas coding changes affect the expression of SSU1, a sulfite efflux pump activated by FZF1. Both coding and noncoding changes also affect the expression of many other genes. Our results show how divergence in the coding and promoter region of a transcription factor alters the response to an environmental stress. Changes in gene regulation are thought to play an important role in evolution. While variation in gene expression between species is common, it is hard to identify the phenotypic consequences of this variation since many changes in gene expression may have subtle or no phenotypic effects. In this study, we investigate changes in sulfite resistance and gene expression caused by the transcription factor, FZF1, that has evolved rapidly during the divergence of related yeast species. We find that divergence in the ability of FZF1 to confer sulfite resistance is mediated by changes in its expression as well as changes in its protein structure, both of which cause changes in the expression of other genes. Our results show how the combination of multiple changes within a transcription factor can produce substantial changes in phenotype and the expression of many genes.
Collapse
Affiliation(s)
- Elizabeth K. Engle
- Molecular Genetics and Genomics Program, Washington University, St. Louis, Missouri, United States of America
| | - Justin C. Fay
- Department of Genetics and Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, United States of America
- * E-mail:
| |
Collapse
|
8
|
Nguyen KD, Pan Y. An improved scoring method for protein residue conservation and multiple sequence alignment. IEEE Trans Nanobioscience 2012; 10:275-85. [PMID: 22271798 DOI: 10.1109/tnb.2011.2179553] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
One of the most fundamental operation in biological sequence analysis is multiple sequence alignment (MSA). Optimally aligning multiple sequences is an intractable problem; however, it is a critical tool for biologists to identify the relationships between species and also possibly predict the structure and functionality of biological sequences. The most fundamental step of assembling MSA results is identifying the best location to place the sequence residues. And the accuracy of the sequence assembly depends heavily on the reliability of a scoring function used. With an appropriate scoring function, an MSA program can boost its accuracy of multiple sequence alignment up to 25%. In this study, we present a new, fast, and biologically reliable scoring method, hierarchical expected matching probability (HEP), to use in protein multiple sequence alignment. The new scoring method eliminates the burden of gap cost selection process. And it has consistently proven to be more biologically reliable than all other tested scoring methods through all tests on four different theoretical and experimental benchmarks, Valdar's theoretical conservation benchmark, RT-OSM, BAliBASE3.0, and PREFAB4.0. An implementation of our new scoring method into progressive multiple sequence alignment, resembling the alignment algorithm in PIMA, ClustalW, and T-COFFEE, has shown an accuracy improvement up to 7% on BAliBASE3.0 and up to 5% on PREFAB4.0 benchmarks.
Collapse
Affiliation(s)
- Ken D Nguyen
- Department of Information Technology, Clayton State University, Morrow, GA 30260, USA.
| | | |
Collapse
|
9
|
Simmons MP, Müller KF, Norton AP. Alignment of, and phylogenetic inference from, random sequences: the susceptibility of alternative alignment methods to creating artifactual resolution and support. Mol Phylogenet Evol 2010; 57:1004-16. [PMID: 20849963 DOI: 10.1016/j.ympev.2010.09.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Revised: 04/05/2010] [Accepted: 09/06/2010] [Indexed: 10/19/2022]
Abstract
We used random sequences to determine which alignment methods are most susceptible to aligning sequences so as to create artifactual resolution and branch support in phylogenetic trees derived from those alignments. We compared four alignment methods (progressive pairwise alignment, simultaneous multiple alignment of sequence fragments, local pairwise alignment, and direct optimization) to determine which methods are most susceptible to creating false positives in phylogenetic trees. Implied alignments created using direct optimization provided more artifactual support than progressive pairwise alignment methods, which in turn generally provided more artifactual support than simultaneous and local alignment methods. Artifactual support derived from base pairs was generally reinforced by the incorporation of gap characters for progressive pairwise alignment, local pairwise alignment, and implied alignments. The amount of artifactual resolution and support was generally greater for simulated nucleotide sequences than for simulated amino acid sequences. In the context of direct optimization, the differences between static and dynamic approaches to calculating support were extreme, ranging from maximal to nearly minimal support. When applied to highly divergent sequences, it is important that dynamic, rather than static, characters be used whenever calculating branch support using direct optimization. In contrast to the tree-based approaches to alignment, simultaneous alignment of sequences using the similarity criterion generally does not create alignments that are biased in favor of any particular tree topology.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523-1878, USA.
| | | | | |
Collapse
|
10
|
Simmons MP, Müller KF, Webb CT. The deterministic effects of alignment bias in phylogenetic inference. Cladistics 2010; 27:402-416. [DOI: 10.1111/j.1096-0031.2010.00333.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
11
|
Simmons MP, Müller KF, Webb CT. The relative sensitivity of different alignment methods and character codings in sensitivity analysis. Cladistics 2008; 24:1039-1050. [DOI: 10.1111/j.1096-0031.2008.00230.x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
12
|
Bisson WH, Westera G, Schubiger PA, Scapozza L. Homology modeling and dynamics of the extracellular domain of rat and human neuronal nicotinic acetylcholine receptor subtypes α4β2 and α7. J Mol Model 2008; 14:891-9. [DOI: 10.1007/s00894-008-0340-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2008] [Accepted: 06/13/2008] [Indexed: 12/14/2022]
|
13
|
Simmons MP, Richardson D, Reddy ASN. Incorporation of gap characters and lineage-specific regions into phylogenetic analyses of gene families from divergent clades: an example from the kinesin superfamily across eukaryotes. Cladistics 2008. [DOI: 10.1111/j.1096-0031.2007.00183.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
|
14
|
Simossis V, Kleinjung J, Heringa J. An overview of multiple sequence alignment. CURRENT PROTOCOLS IN BIOINFORMATICS 2008; Chapter 3:3.7.1-3.7.26. [PMID: 18428699 DOI: 10.1002/0471250953.bi0307s03] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Multiple sequence alignment is perhaps the most commonly applied bioinformatics technique. It often leads to fundamental biological insight into sequence-structure-function relationships of nucleotide or protein sequence families. In this unit, an overview of multiple sequence alignment techniques is presented, covering a history of nearly 30 years from the early pioneering methods to the current state-of-the-art techniques. Methodological and biological issues and end-user considerations, as well as alignment evaluation issues, are discussed.
Collapse
Affiliation(s)
- Victor Simossis
- Integrative Bioinformatics Institute (IBIVU), Free University, Amsterdam, The Netherlands
| | | | | |
Collapse
|
15
|
Abstract
The statistical methods applied to the analysis of genomic data do not account for uncertainty in the sequence alignment. Indeed, the alignment is treated as an observation, and all of the subsequent inferences depend on the alignment being correct. This may not have been too problematic for many phylogenetic studies, in which the gene is carefully chosen for, among other things, ease of alignment. However, in a comparative genomics study, the same statistical methods are applied repeatedly on thousands of genes, many of which will be difficult to align. Using genomic data from seven yeast species, we show that uncertainty in the alignment can lead to several problems, including different alignment methods resulting in different conclusions.
Collapse
Affiliation(s)
- Karen M Wong
- Section of Ecology, Behavior and Evolution, University of California, San Diego, La Jolla, CA 92093, USA
| | | | | |
Collapse
|
16
|
Detection of Stachybotrys chartarum using rRNA, tri5, and beta-tubulin primers and determining their relative copy number by real-time PCR. ACTA ACUST UNITED AC 2008; 112:845-51. [PMID: 18499423 DOI: 10.1016/j.mycres.2008.01.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2007] [Revised: 12/19/2007] [Accepted: 01/10/2008] [Indexed: 11/18/2022]
Abstract
Highly conserved regions are attractive targets for detection and quantitation by PCR, but designing species-specific primer sets can be difficult. Ultimately, almost all primer sets are designed based upon literature searches in public domain databases, such as the National Center for Biotechnology Information (NCBI). Prudence suggests that the researcher needs to evaluate as many sequences as available for designing species-specific PCR primers. In this report, we aligned 11, 9, and 16 DNA sequences entered for Stachybotrys spp. rRNA, tri5, and beta-tubulin regions, respectively. Although we were able to align and determine consensus primer sets for the 9 tri5 and the 16 beta-tubulin sequences, there was no consensus sequence that could be derived from alignment of the 11 rRNA sequences. However, by judicious clustering of the sequences that aligned well, we were able to design three sets of primers for the rRNA region of S. chartarum. The two primer sets for tri5 and beta-tubulin produced satisfactory PCR results for all four strains of S. chartarum used in this study whereas only one rRNA primer set of three produced similar satisfactory results. Ultimately, we were able to show that rRNA copy number is approximately 2-log greater than for tri5 and beta-tubulin in the four strains of S. chartarum tested.
Collapse
|
17
|
Lee ZJ, Su SF, Chuang CC, Liu KH. Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Appl Soft Comput 2008. [DOI: 10.1016/j.asoc.2006.10.012] [Citation(s) in RCA: 136] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
18
|
Abstract
Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the accuracy or scalability of multiple and pairwise alignment tools, or in expanding the scope of tasks handled by an alignment program. In this chapter, we review state-of-the-art protein sequence alignment and provide practical advice for users of alignment tools.
Collapse
Affiliation(s)
- Chuong B Do
- Computer Science Department, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
19
|
Krüger D, Gargas A. Secondary structure of ITS2 rRNA provides taxonomic characters for systematic studies--a case in Lycoperdaceae (Basidiomycota). ACTA ACUST UNITED AC 2007; 112:316-30. [PMID: 18342242 DOI: 10.1016/j.mycres.2007.10.019] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2006] [Revised: 09/24/2007] [Accepted: 10/30/2007] [Indexed: 11/26/2022]
Abstract
The secondary structure of the ITS2 rDNA transcript (pre-rRNA) could provide information for identifying homologous nucleotide characters useful for cladistic inference of relationships. Such structure data could become taxonomic characters. This work compares the effect of several modern nucleotide alignment strategies, including those making use of structure data, on phylogenetic inference. From both the phylogenetic analyses and comparative secondary structure, implications for taxonomy and evolution of puffball fungi are discussed. Lycoperdaceae remain insufficiently resolved with present taxon and data sampling. Neither alignment allows statistically robust phylogenetic hypotheses under any current optimality criterion. The secondary structure data at this time are best used as accessory taxonomic characters as their phylogenetic resolving power and confidence in validity is limited compared with underlying nucleotide characters. We introduce a preliminary nomenclature convention to describe secondary structure for defining consensus features. These consensus structures are illustrated for the clades /Calvatia, /Handkea-Echinatum, /Vascellum, /Morganella, and /Plumbea-Paludosa (Bovista).
Collapse
Affiliation(s)
- Dirk Krüger
- The Botany Department, University of Wisconsin, 430 Lincoln Drive, Madison, WI 53706, USA.
| | | |
Collapse
|
20
|
Progressive multiple sequence alignments from triplets. BMC Bioinformatics 2007; 8:254. [PMID: 17631683 PMCID: PMC1948021 DOI: 10.1186/1471-2105-8-254] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2006] [Accepted: 07/15/2007] [Indexed: 11/27/2022] Open
Abstract
Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mis)match scores.
Collapse
|
21
|
Kjer KM, Gillespie JJ, Ober KA. Opinions on multiple sequence alignment, and an empirical comparison of repeatability and accuracy between POY and structural alignment. Syst Biol 2007; 56:133-46. [PMID: 17366144 DOI: 10.1080/10635150601156305] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Affiliation(s)
- Karl M Kjer
- Department of Ecology, Evolution and Natural Resources, Rutgers University, New Brunswick, New Jersey 08901, USA.
| | | | | |
Collapse
|
22
|
Abstract
MOTIVATION We consider the problem of multiple alignment of protein sequences with the goal of achieving a large SP (Sum-of-Pairs) score. RESULTS We introduce a new graph-based method. We name our method QOMA (Quasi-Optimal Multiple Alignment). QOMA starts with an initial alignment. It represents this alignment using a K-partite graph. It then improves the SP score of the initial alignment through local optimizations within a window that moves greedily on the alignment. QOMA uses two parameters to permit flexibility in time/accuracy trade off: (1) The size of the window for local optimization. (2) The sparsity of the K-partite graph. Unlike traditional progressive methods, QOMA is independent of the order of sequences. The experimental results on BAliBASE benchmarks show that QOMA produces higher SP score than the existing tools including ClustalW, Probcons, Muscle, T-Coffee and DCA. The difference is more significant for distant proteins. AVAILABILITY The software is available from the authors upon request.
Collapse
Affiliation(s)
- Xu Zhang
- Computer and Information Science and Engineering, University of Florida, Gainesville, FL 32611, USA
| | | |
Collapse
|
23
|
XML schemas for common bioinformatic data types and their application in workflow systems. BMC Bioinformatics 2006; 7:490. [PMID: 17087823 PMCID: PMC2001303 DOI: 10.1186/1471-2105-7-490] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2006] [Accepted: 11/06/2006] [Indexed: 11/30/2022] Open
Abstract
Background Today, there is a growing need in bioinformatics to combine available software tools into chains, thus building complex applications from existing single-task tools. To create such workflows, the tools involved have to be able to work with each other's data – therefore, a common set of well-defined data formats is needed. Unfortunately, current bioinformatic tools use a great variety of heterogeneous formats. Results Acknowledging the need for common formats, the Helmholtz Open BioInformatics Technology network (HOBIT) identified several basic data types used in bioinformatics and developed appropriate format descriptions, formally defined by XML schemas, and incorporated them in a Java library (BioDOM). These schemas currently cover sequence, sequence alignment, RNA secondary structure and RNA secondary structure alignment formats in a form that is independent of any specific program, thus enabling seamless interoperation of different tools. All XML formats are available at , the BioDOM library can be obtained at . Conclusion The HOBIT XML schemas and the BioDOM library simplify adding XML support to newly created and existing bioinformatic tools, enabling these tools to interoperate seamlessly in workflow scenarios.
Collapse
|
24
|
Krüger D, Petersen RH, Hughes KW. Molecular phylogenies and mating study data in Polyporus with special emphasis on group “Melanopus” (Basidiomycota). Mycol Prog 2006. [DOI: 10.1007/s11557-006-0512-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
25
|
Tan X, Lefrançois L. Novel IL-15 isoforms generated by alternative splicing are expressed in the intestinal epithelium. Genes Immun 2006; 7:407-16. [PMID: 16791279 DOI: 10.1038/sj.gene.6364314] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Previous studies have identified mRNA three isoforms encoding interleukin-15 (IL-15) that are produced through differential splicing and encode for the same mature IL-15 protein with two different signal peptides. Our analysis of mouse intestinal epithelial cells revealed two new IL-15 mRNA isoforms generated by different alternative splicing events. In one form (IL-15DeltaE6), exon 6 is absent, and in the second form the first 48 nt of exon 7 are absent (IL-15DeltaE7) through usage of an alternative 5' splicing site within exon 7. These mRNA isoforms encoded in-frame IL-15 protein variants lacking either 15aa (IL-15DeltaE6) or 16aa (IL-15DeltaE7) both utilizing the normal long signal peptide. Significant structural changes were predicted for these new IL-15 isoforms. RNAse protection assays revealed the highest expression of isoform mRNA in the intestinal epithelium and functional analysis of recombinant IL-15 isoform proteins suggested possible regulatory functions.
Collapse
Affiliation(s)
- X Tan
- Department of Immunology, University of Connecticut Health Center, Farmington, CT 06030-1319, USA
| | | |
Collapse
|
26
|
Abstract
Repeating fragments in biological sequences are often essential for structure and function. Over the years, many methods have been developed to recognize repeats or to multiply align protein sequences. However, the integration of these two methodologies has been largely unexplored to date. Here, we present a new method capable of globally aligning multiple input sequences under the constraints of a given repeat analysis. The method supports different stringency modes to adapt to various levels of detail and reliability of the repeat information available.
Collapse
Affiliation(s)
- Michael Sammeth
- Centre for Integrative Bioinformatics (IBIVU), Vrije Universiteit, Amsterdam, The Netherlands.
| | | |
Collapse
|
27
|
Carpy AJM, Marchand-Geneste N. Structural e-bioinformatics and drug design. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2006; 17:1-10. [PMID: 16513548 DOI: 10.1080/10659360600560966] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Nowadays the in silico scenario for drug design is totally dependent on structural biology and structural bioinformatics. A myriad of free bioinformatics applications and services have been posted on the web. This mini-review mentions web sites that are useful in structure-based drug design. The information is given in a logical manner, following the drug design process i.e. characterization of a protein target, modelling the protein using sequence homology, optimization of the protein structure and finally docking of small ligands into the active site.
Collapse
Affiliation(s)
- A J M Carpy
- Laboratoire de Physico- & Toxico-Chimie des Systèmes Naturels UMR 5472 CNRS, Université de Bordeaux 1 351, Cours de la Libération, 33405 Talence cedex, France.
| | | |
Collapse
|
28
|
Abstract
Motivation: Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-called progressive approach to efficiently obtain a constrained alignment of several sequences. However, the kernels of these programs, the dynamic programming algorithms for computing an optimal constrained alignment between two sequences, run in 𝒪(γn2) memory, where γ is the number of the constraints and n is the maximum of the lengths of sequences. As a result, such a high memory requirement limits the overall programs to align short sequences~only. Results: We adopt the divide-and-conquer approach to design a memory-efficient algorithm for computing an optimal constrained alignment between two sequences, which greatly reduces the memory requirement of the dynamic programming approaches at the expense of a small constant factor in CPU time. This new algorithm consumes only 𝒪(αn) space, where α is the sum of the lengths of constraints and usually α ≪ n in practical applications. Based on this algorithm, we have developed a memory-efficient tool for multiple sequence alignment with constraints. Availability:http://genome.life.nctu.edu.tw/MUSICME Contact:cllu@mail.nctu.edu.tw
Collapse
Affiliation(s)
- Chin Lung Lu
- Department of Biological Science and Technology, National Chiao Tung University Hsinchu 300, Taiwan, Republic of China.
| | | |
Collapse
|
29
|
Schmollinger M, Nieselt K, Kaufmann M, Morgenstern B. DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors. BMC Bioinformatics 2004; 5:128. [PMID: 15357879 PMCID: PMC520757 DOI: 10.1186/1471-2105-5-128] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2004] [Accepted: 09/09/2004] [Indexed: 11/30/2022] Open
Abstract
Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
Collapse
Affiliation(s)
- Martin Schmollinger
- Wilhelm-Schickard-Institut fur Informatik, Sand 14, 72076 Tübingen, Germany.
| | | | | | | |
Collapse
|
30
|
Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Müller WEG, Wetter T, Suhai S. Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res 2004; 14:1147-59. [PMID: 15140833 PMCID: PMC419793 DOI: 10.1101/gr.1917404] [Citation(s) in RCA: 798] [Impact Index Per Article: 39.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2003] [Accepted: 01/28/2004] [Indexed: 11/24/2022]
Abstract
We present an EST sequence assembler that specializes in reconstruction of pristine mRNA transcripts, while at the same time detecting and classifying single nucleotide polymorphisms (SNPs) occuring in different variations thereof. The assembler uses iterative multipass strategies centered on high-confidence regions within sequences and has a fallback strategy for using low-confidence regions when needed. It features special functions to assemble high numbers of highly similar sequences without prior masking, an automatic editor that edits and analyzes alignments by inspecting the underlying traces, and detection and classification of sequence properties like SNPs with a high specificity and a sensitivity down to one mutation per sequence. In addition, it includes possibilities to use incorrectly preprocessed sequences, routines to make use of additional sequencing information such as base-error probabilities, template insert sizes, strain information, etc., and functions to detect and resolve possible misassemblies. The assembler is routinely used for such various tasks as mutation detection in different cell types, similarity analysis of transcripts between organisms, and pristine assembly of sequences from various sources for oligo design in clinical microarray experiments.
Collapse
Affiliation(s)
- Bastien Chevreux
- Department of Molecular Biophysics, German Cancer Research Centre Heidelberg, 69120 Heidelberg, Germany.
| | | | | | | | | | | | | |
Collapse
|
31
|
Keightley PD, Johnson T. MCALIGN: stochastic alignment of noncoding DNA sequences based on an evolutionary model of sequence evolution. Genome Res 2004; 14:442-50. [PMID: 14993209 PMCID: PMC353231 DOI: 10.1101/gr.1571904] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
A method is described for performing global alignment of noncoding DNA sequences based on an evolutionary model parameterized by the frequency distribution of lengths of insertion/deletion events (indels) and their rate relative to nucleotide substitutions. A stochastic hill-climbing algorithm is used to search for the most probable alignment between a pair of sequences or three sequences of known phylogenetic relationship. The performance of the procedure, parameterized according to the empirical distribution of indel lengths in noncoding DNA of Drosophila species, is investigated by simulation. We show that there is excellent agreement between true and estimated alignments over a wide range of sequence divergences, and that the method outperforms other available alignment methods.
Collapse
Affiliation(s)
- Peter D Keightley
- University of Edinburgh, School of Biological Sciences, Ashworth Laboratories, Edinburgh EH9 3JT, UK. Peter.Keightley_at_ed.ac.uk
| | | |
Collapse
|
32
|
Pollard DA, Bergman CM, Stoye J, Celniker SE, Eisen MB. Benchmarking tools for the alignment of functional noncoding DNA. BMC Bioinformatics 2004; 5:6. [PMID: 14736341 PMCID: PMC344529 DOI: 10.1186/1471-2105-5-6] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2003] [Accepted: 01/21/2004] [Indexed: 11/10/2022] Open
Abstract
Background Numerous tools have been developed to align genomic sequences. However, their relative performance in specific applications remains poorly characterized. Alignments of protein-coding sequences typically have been benchmarked against "correct" alignments inferred from structural data. For noncoding sequences, where such independent validation is lacking, simulation provides an effective means to generate "correct" alignments with which to benchmark alignment tools. Results Using rates of noncoding sequence evolution estimated from the genus Drosophila, we simulated alignments over a range of divergence times under varying models incorporating point substitution, insertion/deletion events, and short blocks of constrained sequences such as those found in cis-regulatory regions. We then compared "correct" alignments generated by a modified version of the ROSE simulation platform to alignments of the simulated derived sequences produced by eight pairwise alignment tools (Avid, BlastZ, Chaos, ClustalW, DiAlign, Lagan, Needle, and WABA) to determine the off-the-shelf performance of each tool. As expected, the ability to align noncoding sequences accurately decreases with increasing divergence for all tools, and declines faster in the presence of insertion/deletion evolution. Global alignment tools (Avid, ClustalW, Lagan, and Needle) typically have higher sensitivity over entire noncoding sequences as well as in constrained sequences. Local tools (BlastZ, Chaos, and WABA) have lower overall sensitivity as a consequence of incomplete coverage, but have high specificity to detect constrained sequences as well as high sensitivity within the subset of sequences they align. Tools such as DiAlign, which generate both local and global outputs, produce alignments of constrained sequences with both high sensitivity and specificity for divergence distances in the range of 1.25–3.0 substitutions per site. Conclusion For species with genomic properties similar to Drosophila, we conclude that a single pair of optimally diverged species analyzed with a high performance alignment tool can yield accurate and specific alignments of functionally constrained noncoding sequences. Further algorithm development, optimization of alignment parameters, and benchmarking studies will be necessary to extract the maximal biological information from alignments of functional noncoding DNA.
Collapse
Affiliation(s)
- Daniel A Pollard
- Biophysics Graduate Group, University of California, Berkeley, CA 94720, USA
| | - Casey M Bergman
- Department of Genome Science, Life Science Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Genetics, University of Cambridge, Cambridge, UK CB2 3EH
| | - Jens Stoye
- Technische Fakultät, Universität Bielefeld, 33594 Bielefeld, Germany
| | - Susan E Celniker
- Department of Genome Science, Life Science Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
- Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Michael B Eisen
- Department of Genome Science, Life Science Division, Lawrence Orlando Berkeley National Laboratory, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
| |
Collapse
|
33
|
McHardy AC, Tauch A, Rückert C, Pühler A, Kalinowski J. Genome-based analysis of biosynthetic aminotransferase genes of Corynebacterium glutamicum. J Biotechnol 2003; 104:229-40. [PMID: 12948641 DOI: 10.1016/s0168-1656(03)00161-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Due to broad and overlapping substrate specificities, aminotransferases remain the last uncharacterized enzymes from most amino acid biosynthetic pathways in Corynebacterium glutamicum. We report here a complete description of all aminotransferases participating in the biosynthesis of the branched-chain amino acids and phenylalanine in C. glutamicum. We used methods of profile analysis on the newly available genome sequence to systematically search for and characterize members of the four known aminotransferase classes. This led to the discovery of sixteen new, potential aminotransferase encoding genes in the C. glutamicum genome, eleven of which were subsequently characterized experimentally with respect to their participation in different amino acid biosynthetic pathways. Disruption by insertion mutagenesis of ilvE, encoding a branched-chain amino acid aminotransferase, confirmed its function in leucine and isoleucine biosynthesis. Two double mutants lacking both ilvE and genes classified as class I aminotransferases exhibited additional auxotrophic requirements for valine and phenylalanine, respectively. In C. glutamicum the branched-chain amino acid aminotransferase thus participates in four amino acid biosynthetic pathways, for which in case of valine and phenylalanine biosynthesis two additional enzymes with overlapping substrate specificity exist. The novel protein with aminotransferase activity in valine biosynthesis belongs to the very recently described MocR subfamily of GntR-type helix-turn-helix transcriptional regulators, is located upstream of a potential operon of a newly described pyridoxine biosynthetic pathway and when disrupted, gives rise to a pyridoxine auxotrophy. The theoretical and experimental data we present should further provide a solid platform for ongoing research and understanding of the network of aminotransferases which participate in amino acid biosynthesis in C. glutamicum.
Collapse
Affiliation(s)
- Alice C McHardy
- Institut für Genomforschung, Universität Bielefeld, Universitätsstrasse 25, D-33515 Bielefeld, Germany
| | | | | | | | | |
Collapse
|
34
|
IHLEN PERG, EKMAN STEFAN. Outline of phylogeny and character evolution in Rhizocarpon (Rhizocarpaceae, lichenized Ascomycota) based on nuclear ITS and mitochondrial SSU ribosomal DNA sequences. Biol J Linn Soc Lond 2002. [DOI: 10.1046/j.1095-8312.2002.00127.x] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
35
|
Abstract
A renewed interest in the multiple sequence alignment problem has given rise to several new algorithms. In contrast to traditional progressive methods, computationally expensive score optimization strategies are now predominantly employed. We systematically tested four methods (Poa, Dialign, T-Coffee and ClustalW) for the speed and quality of their alignments. As test sequences we used structurally derived alignments from BAliBASE and synthetic alignments generated by Rose. The tests included alignments of variable numbers of domains embedded in random spacer sequences. Overall, Dialign was the most accurate in cases with low sequence identity, while T-Coffee won in cases with high sequence identity. The fast Poa algorithm was almost as accurate, while ClustalW could compete only in strictly global cases with high sequence similarity.
Collapse
Affiliation(s)
- Timo Lassmann
- Center for Genomics and Bioinformatics, Karolinska Institutet, SE-17177, Stockholm, Sweden
| | | |
Collapse
|
36
|
Tsutsumi‐Ishii Y, Nagaoka I. NF‐κB‐mediated transcriptional regulation of human β‐defensin‐2 gene following lipopolysaccharide stimulation. J Leukoc Biol 2002. [DOI: 10.1189/jlb.71.1.154] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Affiliation(s)
- Yuko Tsutsumi‐Ishii
- Department of Biochemistry, Juntendo University, School of Medicine, Tokyo, Japan
| | - Isao Nagaoka
- Department of Biochemistry, Juntendo University, School of Medicine, Tokyo, Japan
| |
Collapse
|
37
|
Attwood TK, Croning MDR, Gaulton A. Deriving structural and functional insights from a ligand-based hierarchical classification of G protein-coupled receptors. Protein Eng Des Sel 2002; 15:7-12. [PMID: 11842232 DOI: 10.1093/protein/15.1.7] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
G protein-coupled receptors (GPCRs) constitute the largest known family of cell-surface receptors. With hundreds of members populating the rhodopsin-like GPCR superfamily and many more awaiting discovery in the human genome, they are of interest to the pharmaceutical industry because of the opportunities they afford for yielding potentially lucrative drug targets. Typical sequence analysis strategies for identifying novel GPCRs tend to involve similarity searches using standard primary database search tools. This will reveal the most similar sequence, generally without offering any insight into its family or superfamily relationships. Conversely, searches of most 'pattern' or family databases are likely to identify the superfamily, but not the closest matching subtype. Here we describe a diagnostic resource that allows identification of GPCRs in a hierarchical fashion, based principally upon their ligand preference. This resource forms part of the PRINTS database, which now houses approximately 250 GPCR-specific fingerprints (http://www.bioinf.man.ac.uk/dbbrowser/gpcrPRINTS/). This collection of fingerprints is able to provide more sensitive diagnostic opportunities than have been realized by related approaches and is currently the only diagnostic tool for assigning GPCR subtypes. Mapping such fingerprints on to three-dimensional GPCR models offers powerful insights into the structural and functional determinants of subtype specificity.
Collapse
Affiliation(s)
- T K Attwood
- School of Biological Sciences, University of Manchester, 2.19 Stopford Building, Oxford Road, Manchester, UK.
| | | | | |
Collapse
|
38
|
Lecompte O, Thompson JD, Plewniak F, Thierry J, Poch O. Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene 2001; 270:17-30. [PMID: 11403999 DOI: 10.1016/s0378-1119(01)00461-9] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Multiple alignment, since its introduction in the early seventies, has become a cornerstone of modern molecular biology. It has traditionally been used to deduce structure / function by homology, to detect conserved motifs and in phylogenetic studies. There has recently been some renewed interest in the development of multiple alignment techniques, with current opinion moving away from a single all-encompassing algorithm to iterative and / or co-operative strategies. The exploitation of multiple alignments in genome annotation projects represents a qualitative leap in the functional analysis process, opening the way to the study of the co-evolution of validated sets of proteins and to reliable phylogenomic analysis. However, the alignment of the highly complex proteins detected by today's advanced database search methods is a daunting task. In addition, with the explosion of the sequence databases and with the establishment of numerous specialized biological databases, multiple alignment programs must evolve if they are to successfully rise to the new challenges of the post-genomic era. The way forward is clearly an integrated system bringing together sequence data, knowledge-based systems and prediction methods with their inherent unreliability. The incorporation of such heterogeneous, often non-consistent, data will require major changes to the fundamental alignment algorithms used to date. Such an integrated multiple alignment system will provide an ideal workbench for the validation, propagation and presentation of this information in a format that is concise, clear and intuitive.
Collapse
Affiliation(s)
- O Lecompte
- Laboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire (CNRS/INSERM/ULP), BP 163, 67404 Cedex, Illkirch, France
| | | | | | | | | |
Collapse
|
39
|
Speeding Up the DIALIGN Multiple Alignment Program by Using the ‘Greedy Alignment of BIOlogical Sequences LIBrary’ (GABIOS-LIB). COMPUTATIONAL BIOLOGY 2001. [DOI: 10.1007/3-540-45727-5_1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
40
|
Notredame C, Higgins DG, Heringa J. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol 2000; 302:205-17. [PMID: 10964570 DOI: 10.1006/jmbi.2000.4042] [Citation(s) in RCA: 4842] [Impact Index Per Article: 201.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We describe a new method (T-Coffee) for multiple sequence alignment that provides a dramatic improvement in accuracy with a modest sacrifice in speed as compared to the most commonly used alternatives. The method is broadly based on the popular progressive approach to multiple alignment but avoids the most serious pitfalls caused by the greedy nature of this algorithm. With T-Coffee we pre-process a data set of all pair-wise alignments between the sequences. This provides us with a library of alignment information that can be used to guide the progressive alignment. Intermediate alignments are then based not only on the sequences to be aligned next but also on how all of the sequences align with each other. This alignment information can be derived from heterogeneous sources such as a mixture of alignment programs and/or structure superposition. Here, we illustrate the power of the approach by using a combination of local and global pair-wise alignments to generate the library. The resulting alignments are significantly more reliable, as determined by comparison with a set of 141 test cases, than any of the popular alternatives that we tried. The improvement, especially clear with the more difficult test cases, is always visible, regardless of the phylogenetic spread of the sequences in the tests.
Collapse
Affiliation(s)
- C Notredame
- National Institute for Medical Research, The Ridgeway, London, NW7 1AA, UK.
| | | | | |
Collapse
|
41
|
van Tuinen M, Sibley CG, Hedges SB. The early history of modern birds inferred from DNA sequences of nuclear and mitochondrial ribosomal genes. Mol Biol Evol 2000; 17:451-7. [PMID: 10723745 DOI: 10.1093/oxfordjournals.molbev.a026324] [Citation(s) in RCA: 187] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The traditional view of avian evolution places ratites and tinamous at the base of the phylogenetic tree of modern birds (Neornithes). In contrast, most recent molecular studies suggest that neognathous perching birds (Passeriformes) compose the oldest lineage of modern birds. Here, we report significant molecular support for the traditional view of neognath monophyly based on sequence analyses of nuclear and mitochondrial DNA (4.4 kb) from every modern avian order. Phylogenetic analyses further show that the ducks and gallinaceous birds are each other's closest relatives and together form the basal lineage of neognathous birds. To investigate why other molecular studies sampling fewer orders have reached different conclusions regarding neognath monophyly, we performed jackknife analyses on our mitochondrial data. Those analyses indicated taxon-sampling effects when basal galloanserine birds were included in combination with sparse taxon sampling. Our phylogenetic results suggest that the earliest neornithines were heavy-bodied, ground-dwelling, nonmarine birds. This inference, coupled with a fossil bias toward marine environments, provides a possible explanation for the large gap in the early fossil record of birds.
Collapse
Affiliation(s)
- M van Tuinen
- Department of Biology, Pennsylvania State University, University Park 16802, USA
| | | | | |
Collapse
|
42
|
Abstract
Elucidation of interrelationships among sequence, structure, function, and evolution (FESS relationships) of a family of genes or gene products is a central theme of modern molecular biology. Multiple sequence alignment has been proven to be a powerful tool for many fields of studies such as phylogenetic reconstruction, illumination of functionally important regions, and prediction of higher order structures of proteins and RNAs. However, it is far too trivial to automatically construct a multiple alignment from a set of related sequences. A variety of methods for solving this computationally difficult problem are reviewed. Several important applications of multiple alignment for elucidation of the FESS relationships are also discussed. For a long period, progressive methods have been the only practical means to solve a multiple alignment problem of appreciable size. This situation is now changing with the development of new techniques including several classes of iterative methods. Today's progress in multiple sequence alignment methods has been made by the multidisciplinary endeavors of mathematicians, computer scientists, and biologists in various fields including biophysicists in particular. The ideas are also originated from various backgrounds, pure algorithmics, statistics, thermodynamics, and others. The outcomes are now enjoyed by researchers in many fields of biological sciences. In the near future, generalized multiple alignment may play a central role in studies of FESS relationships. The organized mixture of knowledge from multiple fields will ferment to develop fruitful results which would be hard to obtain within each area. I hope this review provides a useful information resource for future development of theory and practice in this rapidly expanding area of bioinformatics.
Collapse
Affiliation(s)
- O Gotoh
- Saitama Cancer Center Research Institute, Japan
| |
Collapse
|