1
|
Applications of Bio-molecular Databases in Bioinformatics. MEDICAL IMAGING IN CLINICAL APPLICATIONS 2016. [DOI: 10.1007/978-3-319-33793-7_15] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
2
|
Abstract
UNLABELLED BioCloneDB is a user-friendly database with a web interface to assist molecular genetics laboratories in managing a local repository of sequence information linked to DNA clones. This tool is designed to assist in high-throughput sequence and gene expression projects, providing a link between both types of information. The unique feature of the application is the automation of batch sequence annotation following BLAST((R)) searches, which is supported by easy-to-use web interfaces. Furthermore, any set of sequences can be annotated against any sequence database. This replaces the need to perform and analyse individual web BLAST((R)) searches or the need to learn how to produce batch searches and perform analysis in a UNIX((R)) operating system. BioCloneDB is open-source software that can be installed on Linux or UNIX((R)) operating systems. To test the application, we used 1400 expressed sequence tags obtained from the filamentous fungus Neurospora crassa. The results were analysed and compared with published results and they show a significant change due to the accumulation of the data in the nr database (ftp://ftp.ncbi.nih.gov/blast/db/). AVAILABILITY BioCloneDB is available for academic use along with documentation, screenshots, database scheme and readme files at http://bioclonedb.agri.huji.ac.il/ CONTACT Oded Yarden (Oded.Yarden@huji.ac.il).
Collapse
|
3
|
C. elegans: an invaluable model organism for the proteomics studies of the cholesterol-mediated signaling pathway. Expert Rev Proteomics 2014; 3:439-53. [PMID: 16901202 DOI: 10.1586/14789450.3.4.439] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
With the availability of its complete genome sequence and unique biological features relevant to human disease, Caenorhabditis elegans has become an invaluable model organism for the studies of proteomics, leading to the elucidation of nematode gene function. A journey from the genome to proteome of C. elegans may begin with preparation of expressed proteins, which enables a large-scale analysis of all possible proteins expressed under specific physiological conditions. Although various techniques have been used for proteomic analysis of C. elegans, systematic high-throughput analysis is still to come in order to accommodate studies of post-translational modification and quantitative analysis. Given that no integrated C. elegans protein expression database is available, it is about time that a global C. elegans proteome project is launched through which datasets of transcriptomes, protein-protein interaction and functional annotation can be integrated. As an initial target of a pilot project of the C. elegans proteome project, the cholesterol-mediated signaling pathway will be an excellent example since, like in other organisms, it is one of the key controlling pathways in cell growth and development in C. elegans. As this field tends to broaden to functional proteomics, there is a high demand to develop the versatile proteome informatics tools that can mange many different data in an integrative manner.
Collapse
|
4
|
Gene functionality's influence on the second codon: A large-scale survey of second codon composition in three domains. Genomics 2010; 96:92-101. [DOI: 10.1016/j.ygeno.2010.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2009] [Revised: 02/03/2010] [Accepted: 04/07/2010] [Indexed: 10/19/2022]
|
5
|
Prediction of protein function improving sequence remote alignment search by a fuzzy logic algorithm. Protein J 2008; 27:130-9. [PMID: 18066655 DOI: 10.1007/s10930-007-9116-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The functional annotation of the new protein sequences represents a major drawback for genomic science. The best way to suggest the function of a protein from its sequence is by finding a related one for which biological information is available. Current alignment algorithms display a list of protein sequence stretches presenting significant similarity to different protein targets, ordered by their respective mathematical scores. However, statistical and biological significance do not always coincide, therefore, the rearrangement of the program output according to more biological characteristics than the mathematical scoring would help functional annotation. A new method that predicts the putative function for the protein integrating the results from the PSI-BLAST program and a fuzzy logic algorithm is described. Several protein sequence characteristics have been checked in their ability to rearrange a PSI-BLAST profile according more to their biological functions. Four of them: amino acid content, matched segment length and hydropathic and flexibility profiles positively contributed, upon being integrated by a fuzzy logic algorithm into a program, BYPASS, to the accurate prediction of the function of a protein from its sequence.
Collapse
|
6
|
Annotation, comparison and databases for hundreds of bacterial genomes. Res Microbiol 2007; 158:724-36. [DOI: 10.1016/j.resmic.2007.09.009] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2007] [Revised: 09/21/2007] [Accepted: 09/26/2007] [Indexed: 11/20/2022]
|
7
|
Abstract
Background One main research challenge in the post-genomic era is to understand the relationship between protein sequences and their biological functions. In recent years, several automated annotation systems have been developed for the functional assignment of uncharacterized proteins. The underlying assumption of these systems is that similar sequences imply similar biological functions. However, it has been noted that matching sequences do not always infer similar functions. Results In this paper, we present the correlation between protein sequences and protein functions for the yeast proteome in the context of gene ontology. A novel measure is introduced to define the overall similarity between two protein sequences. The effects of the level as well as the size of a gene ontology group on the degree of similarity were studied. The similarity distributions at different levels of gene ontology trees are presented. To evaluate the theoretical prediction power of similar sequences, we computed the posterior probability of correct predictions. Conclusion The results indicate that protein pairs of similar biological functions tend to have higher sequence similarity, although the similarity distribution in each functional group is heterogeneous and varies from group to group. We conclude that sequence similarity can serve as a key measure in protein function prediction. However, the resulting annotations must be verified through other means. A method that combines a broader range of measures is more likely to provide more accurate prediction. Our study indicates that the posterior probability of a correct prediction could serve as one of the key measures.
Collapse
|
8
|
Abstract
Magnifying Genomes (MaGe) is a microbial genome annotation system based on a relational database containing information on bacterial genomes, as well as a web interface to achieve genome annotation projects. Our system allows one to initiate the annotation of a genome at the early stage of the finishing phase. MaGe's main features are (i) integration of annotation data from bacterial genomes enhanced by a gene coding re-annotation process using accurate gene models, (ii) integration of results obtained with a wide range of bioinformatics methods, among which exploration of gene context by searching for conserved synteny and reconstruction of metabolic pathways, (iii) an advanced web interface allowing multiple users to refine the automatic assignment of gene product functions. MaGe is also linked to numerous well-known biological databases and systems. Our system has been thoroughly tested during the annotation of complete bacterial genomes (Acinetobacter baylyi ADP1, Pseudoalteromonas haloplanktis, Frankia alni) and is currently used in the context of several new microbial genome annotation projects. In addition, MaGe allows for annotation curation and exploration of already published genomes from various genera (e.g. Yersinia, Bacillus and Neisseria). MaGe can be accessed at .
Collapse
|
9
|
Enhanced functional and structural domain assignments using remote similarity detection procedures for proteins encoded in the genome of Mycobacterium tuberculosis H37Rv. J Biosci 2005; 29:245-59. [PMID: 15381846 DOI: 10.1007/bf02702607] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The sequencing of the Mycobacterium tuberculosis (MTB) H37Rv genome has facilitated deeper insights into the biology of MTB, yet the functions of many MTB proteins are unknown. We have used sensitive profile-based search procedures to assign functional and structural domains to infer functions of gene products encoded in MTB. These domain assignments have been made using a compendium of sequence and structural domain families. Functions are predicted for 78 % of the encoded gene products. For 69 % of these, functions can be inferred by domain assignments. The functions for the rest are deduced from their homology to proteins of known function. Superfamily relationships between families of unknown and known structures have increased structural information by approximately 11%. Remote similarity detection methods have enabled domain assignments for 1325 'hypothetical proteins'. The most populated families in MTB are involved in lipid metabolism, entry and survival of the bacillus in host. Interestingly, for 353 proteins, which we refer to as MTB-specific, no homologues have been identified. Numerous, previously unannotated, hypothetical proteins have been assigned domains and some of these could perhaps be the possible chemotherapeutic targets. MTB-specific proteins might include factors responsible for virulence. Importantly, these assignments could be valuable for experimental endeavors. The detailed results are publicly available at http://hodgkin.mbu.iisc.ernet.in/~dots.
Collapse
|
10
|
Abstract
The mission of the European Bioinformatics Institute (EBI), an outstation of the European Molecular Biology Laboratory (EMBL) in Heidelberg, is to ensure that the growing body of information from molecular biology and genome research is placed in the public domain and is accessible freely to all parts of the scientific community in ways that promote scientific progress. To fulfil this mission, the EBI provides a wide variety of free, publicly available bioinformatics services. These can be divided into data submissions processing; access to query, analysis and retrieval systems and tools; ftp downloads of software and databases; training and education and user support. All of these services are available at the EBI website: http://www.ebi.ac.uk/services. This paper provides a detailed introduction to the interactive analysis systems that are available from the EBI and a brief introduction to other, related services.
Collapse
|
11
|
3D-GENOMICS: a database to compare structural and functional annotations of proteins between sequenced genomes. Nucleic Acids Res 2004; 32:D245-50. [PMID: 14681404 PMCID: PMC308798 DOI: 10.1093/nar/gkh064] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The 3D-GENOMICS database (http://www.sbg.bio. ic.ac.uk/3dgenomics/) provides structural annotations for proteins from sequenced genomes. In August 2003 the database included data for 93 proteomes. The annotations stored in the database include homologous sequences from various sequence databases, domains from SCOP and Pfam, patterns from Prosite and other predicted sequence features such as transmembrane regions and coiled coils. In addition to annotations at the sequence level, several precomputed cross- proteome comparative analyses are available based on SCOP domain superfamily composition. Annotations are available to the user via a web interface to the database. Multiple points of entry are available so that a user is able to: (i) directly access annotations for a single protein sequence via keywords or accession codes, (ii) examine a sequence of interest chosen from a summary of annotations for a particular proteome, or (iii) access precomputed frequency-based cross-proteome comparative analyses.
Collapse
|
12
|
Abstract
All the protein sequences from plants (including Arabidopsis thaliana) available from SwissProt/TrEMBL have been the subject of an all-by-all systematic comparison and grouped into clusters of related proteins. Within each cluster, the sequences have been submitted to pyramidal classification; in the case where two or several subfamilies have been grouped together, the pyramidal tree helps in finding which sequences make the links between subfamilies. In addition, the 'domains' that are common to two or more sequences within a cluster were determined and displayed à la ProDom. The resulting graphical representations proved to be quite efficient in pinpointing those protein sequences suffering from a probable error in the annotation of their genes. The clusters can be searched through various criteria and their pyramidal classifications and their domain representations can be displayed by querying http://genoplante-info. infobiogen.fr/phytoprot. The user can also launch a BLAST search of a query sequence against all the clusters.
Collapse
|
13
|
The Genomic Threading Database: a comprehensive resource for structural annotations of the genomes from key organisms. Nucleic Acids Res 2004; 32:D196-9. [PMID: 14681393 PMCID: PMC308777 DOI: 10.1093/nar/gkh043] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Currently, the Genomic Threading Database (GTD) contains structural assignments for the proteins encoded within the genomes of nine eukaryotes and 101 prokaryotes. Structural annotations are carried out using a modified version of GenTHREADER, a reliable fold recognition method. The Gen THREADER annotation jobs are distributed across multiple clusters of processors using grid technology and the predictions are deposited in a relational database accessible via a web interface at http://bioinf.cs.ucl.ac.uk/GTD. Using this system, up to 84% of proteins encoded within a genome can be confidently assigned to known folds with 72% of the residues aligned. On average in the GTD, 64% of proteins encoded within a genome are confidently assigned to known folds and 58% of the residues are aligned to structures.
Collapse
|
14
|
Abstract
Accurate detection of protein families allows assignment of protein function and the analysis of functional diversity in complete genomes. Recently, we presented a novel algorithm called TribeMCL for the detection of protein families that is both accurate and efficient. This method allows family analysis to be carried out on a very large scale. Using TribeMCL, we have generated a resource called TRIBES that contains protein family information, comprising annotations, protein sequence alignments and phylogenetic distributions describing 311 257 proteins from 83 completely sequenced genomes. The analysis of at least 60 934 detected protein families reveals that, with the essential families excluded, paralogy levels are similar between prokaryotes, irrespective of genome size. The number of essential families is estimated to be between 366 and 426. We also show that the currently known space of protein families is scale free and discuss the implications of this distribution. In addition, we show that smaller families are often formed by shorter proteins and discuss the reasons for this intriguing pattern. Finally, we analyse the functional diversity of protein families in entire genome sequences. The TRIBES protein family resource is accessible at http://www.ebi.ac.uk/research/cgg/tribes/.
Collapse
|
15
|
A comparative proteomics resource: proteins of Arabidopsis thaliana. Genome Biol 2003; 4:R51. [PMID: 12914659 PMCID: PMC193643 DOI: 10.1186/gb-2003-4-8-r51] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2003] [Revised: 05/06/2003] [Accepted: 07/02/2003] [Indexed: 11/11/2022] Open
Abstract
Using an integrative genome annotation pipeline (iGAP) for proteome-wide protein structure and functional domain assignment, we analyzed all the proteins of Arabidopsis thaliana. Three-dimensional structures at the level of the domain are assigned by fold recognition and threading based on a novel fold library that extends common domain classifications. iGAP is being applied to proteins from all available proteomes as part of a comparative proteomics resource. The database is accessible from the web.
Collapse
|
16
|
Re-evaluation of primary structure, topology, and localization of Scamper, a putative intracellular Ca2+ channel activated by sphingosylphosphocholine. Biochem J 2002; 362:183-9. [PMID: 11829755 PMCID: PMC1222375 DOI: 10.1042/0264-6021:3620183] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Naturally occurring sphingoid molecules control vital functions of the cell through their interaction with specific receptors. Proliferation, differentiation and programmed death result in fact from a fine balance of signals, among which sphingosine and structurally related molecules play fundamental roles, acting as either first or second messengers. The corresponding receptors need to be identified in order that the role of sphingoid molecules can be established. Among them, several G-protein-coupled receptors specific for sphingosine 1-phosphate, sphingosylphosphocholine, or both, have already been investigated. In contrast, the identification of the postulated intracellular receptors has been problematical. In the present study we re-evaluated the molecular characterization of Scamper, the first proposed intracellular receptor for sphingosylphosphocholine [Mao, Kim, Almenoff, Rudner, Kearney and Kindman (1996) Proc. Natl. Acad. Sci. U.S.A. 93, 1993-1996] and commonly believed to be a Ca(2+) channel of the endoplasmic reticulum (the name "SCaMPER" used by Mao et al. being derived from "sphingolipid Ca(2+)-release-mediating protein of the endoplasmic reticulum"). In contrast with what has been believed hitherto, our primary-structure and overexpression experiments indicate that Scamper is a 110-amino-acid protein spanning the membrane once with a Nexo/Ccyt topology [von Heijne and Gavel (1988) Eur. J. Biochem. 174, 671-678]. Overexpression of either wild-type or tagged Scamper induces a specific phenotype characterized by the rapid extension of actin-containing protrusions, followed by cell death.
Collapse
|
17
|
Molecular fossils in the human genome: identification and analysis of the pseudogenes in chromosomes 21 and 22. Genome Res 2002; 12:272-80. [PMID: 11827946 PMCID: PMC155275 DOI: 10.1101/gr.207102] [Citation(s) in RCA: 151] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
We have developed an initial approach for annotating and surveying pseudogenes in the human genome. We search human genomic DNA for regions that are similar to known protein sequences and contain obvious disablements (i.e., mid-sequence stop codons or frameshifts), while ensuring minimal overlap with annotations of known genes. Pseudogenes can be divided into "processed" and "nonprocessed"; the former are reverse transcribed from mRNA (and therefore have no intron structure), whereas the latter presumably arise from genomic duplications. We annotate putative processed pseudogenes based on whether there is a continuous span of homology that is >70% of the length of the closest matching human protein (i.e., with introns removed), or whether there is evidence of polyadenylation. We have applied our approach to chromosomes 21 and 22, the first parts of the human genome completely sequenced, finding 190 new pseudogene annotations beyond the 264 reported by the sequencing centers. In total, on chromosomes 21 and 22, there are 189 processed pseudogenes, 195 nonprocessed pseudogenes, and, additionally, 70 pseudogenic immunoglobulin gene segments. (Detailed assignments are available at http://bioinfo.mbb.yale.edu/genome/pseudogene or http://genecensus.org/pseudogene.) By extrapolation, we predict that there could be up to approximately 20,000 pseudogenes in the whole human genome, with a little more than half of them processed. We have determined the main populations and clusters of pseudogenes on chromosomes 21 and 22. There are notable excesses of pseudogenes relative to genes near the centromeres of both chromosomes, indicating the existence of pseudogenic "hot-spots" in the genome. We have looked at the distribution of InterPro families and Gene Ontology (GO) functional categories in our pseudogenes. Overall, the families in both processed and nonprocessed pseudogene populations occur according to a similar power-law distribution as that found for the occurrence of gene families, with a few big families and many small ones. The processed population is, in particular, enriched in highly expressed ribosomal-protein sequences (approximately 20%), which appear fairly evenly distributed across the chromosomes. We compared processed pseudogenes of different evolutionary ages, observing a high degree of similarity between "ancient" and "modern" subpopulations. This may be attributable to the consistently high expression of ribosomal proteins over evolutionary time. Finally, we find that chromosome 22 pseudogene population is dominated by immunoglobulin segments, which have a greater rate of disablement per amino acid than the other pseudogene populations and are also substantially more diverged.
Collapse
MESH Headings
- Chromosome Mapping/methods
- Chromosomes, Human, Pair 21/genetics
- Chromosomes, Human, Pair 22/genetics
- Evolution, Molecular
- Fossils
- Genes, Immunoglobulin
- Genes, Overlapping
- Genome, Human
- Humans
- Multigene Family
- Pseudogenes
- RNA Processing, Post-Transcriptional/genetics
- Sequence Analysis, DNA/statistics & numerical data
Collapse
|
18
|
Characterization of Chelonus inanitus polydnavirus segments: sequences and analysis, excision site and demonstration of clustering. J Gen Virol 2002; 83:247-256. [PMID: 11752722 DOI: 10.1099/0022-1317-83-1-247] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Polydnaviruses (genera Ichnovirus and Bracovirus) have a segmented genome of circular double-stranded DNA molecules, replicate in the ovary of parasitic wasps and are essential for successful parasitism of the host. Here we show the first detailed analysis of various segments of a bracovirus, the Chelonus inanitus virus (CiV). Four segments were sequenced and two of them, CiV12 and CiV14, were found to be closely related while CiV14.5 and CiV16.8 were unrelated. CiV12, CiV14.5 and CiV16.8 are unique while CiV14 occurs also nested in another larger segment. All four segments are predicted to contain genes and predictions could be substantiated in most cases. Comparison with databases revealed no significant similarities at either the nucleotide or amino acid level. Inverted repeats with identities between 77% and 92% and lengths between 26 bp and 100 bp were found on all segments outside of predicted genes. Hybridization experiments indicate that CiV12 and CiV14 are both flanked by other virus segments, suggesting that proviral CiV segments are clustered in the genome of the wasp. The integration/excision site of CiV14 was analysed and compared to that of CiV12. On both termini of proviral CiV12 and CiV14 as well as in the excised circular molecule and the rejoined DNA a very similar repeat of 14 bp was found. A model to illustrate where the terminal repeats might recombine to yield the circular molecule is presented. Excision of CiV12 and CiV14 is restricted to the female and sets in at a very specific time-point in pupal-adult development.
Collapse
|
19
|
Abstract
Multiple sequence alignment is a fundamental tool in a number of different domains in modern molecular biology, including functional and evolutionary studies of a protein family. Multiple alignments also play an essential role in the new integrated systems for genome annotation and analysis. Thus, the development of new multiple alignment scores and statistics is essential, in the spirit of the work dedicated to the evaluation of pairwise sequence alignments for database searching techniques. We present here norMD, a new objective scoring function for multiple sequence alignments. NorMD combines the advantages of the column-scoring techniques with the sensitivity of methods incorporating residue similarity scores. In addition, norMD incorporates ab initio sequence information, such as the number, length and similarity of the sequences to be aligned. The sensitivity and reliability of the norMD objective function is demonstrated using structural alignments in the SCOP and BAliBASE databases. The norMD scores are then applied to the multiple alignments of the complete sequences (MACS) detected by BlastP with E-value<10, for a set of 734 hypothetical proteins encoded by the Vibrio cholerae genome. Unrelated or badly aligned sequences were automatically removed from the MACS, leaving a high-quality multiple alignment which could be reliably exploited in a subsequent functional and/or structural annotation process. After removal of unreliable sequences, 176 (24 %) of the alignments contained at least one sequence with a functional annotation. 103 of these new matches were supported by significant hits to the Interpro domain and motif database.
Collapse
|
20
|
Rational drug discovery revisited: interfacing experimental programs with bio- and chemo-informatics. Drug Discov Today 2001; 6:989-995. [PMID: 11576865 DOI: 10.1016/s1359-6446(01)01961-4] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Over the past few years, bio- and chemo-informatics have rapidly evolved as related yet distinct disciplines. In drug discovery, it is increasingly recognized that combining and integrating these approaches is crucial for their successful application. In addition, the use of complementary experimental and informatics techniques increases the chances of success in many stages of the discovery process, from the identification of novel targets and elucidation of their functions to the discovery and development of lead compounds with desired properties. This review highlights recent trends that emphasize the role of integrated bio- and chemo-informatics research in drug discovery and discusses representative concepts and methodologies.
Collapse
|
21
|
Abstract
Multiple alignment, since its introduction in the early seventies, has become a cornerstone of modern molecular biology. It has traditionally been used to deduce structure / function by homology, to detect conserved motifs and in phylogenetic studies. There has recently been some renewed interest in the development of multiple alignment techniques, with current opinion moving away from a single all-encompassing algorithm to iterative and / or co-operative strategies. The exploitation of multiple alignments in genome annotation projects represents a qualitative leap in the functional analysis process, opening the way to the study of the co-evolution of validated sets of proteins and to reliable phylogenomic analysis. However, the alignment of the highly complex proteins detected by today's advanced database search methods is a daunting task. In addition, with the explosion of the sequence databases and with the establishment of numerous specialized biological databases, multiple alignment programs must evolve if they are to successfully rise to the new challenges of the post-genomic era. The way forward is clearly an integrated system bringing together sequence data, knowledge-based systems and prediction methods with their inherent unreliability. The incorporation of such heterogeneous, often non-consistent, data will require major changes to the fundamental alignment algorithms used to date. Such an integrated multiple alignment system will provide an ideal workbench for the validation, propagation and presentation of this information in a format that is concise, clear and intuitive.
Collapse
|
22
|
Abstract
GOLD is a comprehensive resource for accessing information related to completed and ongoing genome projects world-wide. The database currently provides information on 350 genome projects, of which 48 have been completely sequenced and their analysis published. GOLD was created in 1997 and since April 2000 it has been licensed to Integrated Genomics. The database is freely available through the URL: http://igweb.integratedgenomics.com/GOLD/.
Collapse
|