1
|
Abstract
The retroviral capacity for integration into the host genome can give rise to endogenous retroviruses (ERVs): retroviral sequences that are transmitted vertically as part of the host germ line, within which they may continue to replicate and evolve. ERVs represent both a unique archive of ancient viral sequence information and a dynamic component of host genomes. As such they hold great potential as informative markers for studies of both virus evolution and host genome evolution. Numerous novel ERVs have been described in recent years, particularly as genome sequencing projects have advanced. This review discusses the evolution of ERV lineages, considering the processes by which ERV distribution and diversity is generated. The diversity of ERVs isolated so far is summarised in terms of both their distribution across host taxa, and their relationships to recognised retroviral genera. Finally the relevance of ERVs to studies of genome evolution, host disease and viral ecology is considered, and recent findings discussed.
Collapse
Affiliation(s)
- Robert Gifford
- Department of Biological Sciences, Imperial College, Silwood Park, Buckhurst Road, Ascot Berkshire, SL5 7PY, UK
| | | |
Collapse
|
2
|
Brooksbank C, Camon E, Harris MA, Magrane M, Martin MJ, Mulder N, O'Donovan C, Parkinson H, Tuli MA, Apweiler R, Birney E, Brazma A, Henrick K, Lopez R, Stoesser G, Stoehr P, Cameron G. The European Bioinformatics Institute's data resources. Nucleic Acids Res 2003; 31:43-50. [PMID: 12519944 PMCID: PMC165513 DOI: 10.1093/nar/gkg066] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2002] [Revised: 10/14/2002] [Accepted: 10/14/2002] [Indexed: 11/14/2022] Open
Abstract
As the amount of biological data grows, so does the need for biologists to store and access this information in central repositories in a free and unambiguous manner. The European Bioinformatics Institute (EBI) hosts six core databases, which store information on DNA sequences (EMBL-Bank), protein sequences (SWISS-PROT and TrEMBL), protein structure (MSD), whole genomes (Ensembl) and gene expression (ArrayExpress). But just as a cell would be useless if it couldn't transcribe DNA or translate RNA, our resources would be compromised if each existed in isolation. We have therefore developed a range of tools that not only facilitate the deposition and retrieval of biological information, but also allow users to carry out searches that reflect the interconnectedness of biological information. The EBI's databases and tools are all available on our website at www.ebi.ac.uk.
Collapse
Affiliation(s)
- Catherine Brooksbank
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Stoesser G, Baker W, van den Broek A, Garcia-Pastor M, Kanz C, Kulikova T, Leinonen R, Lin Q, Lombard V, Lopez R, Mancuso R, Nardone F, Stoehr P, Tuli MA, Tzouvara K, Vaughan R. The EMBL Nucleotide Sequence Database: major new developments. Nucleic Acids Res 2003; 31:17-22. [PMID: 12519939 PMCID: PMC165468 DOI: 10.1093/nar/gkg021] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) incorporates, organizes and distributes nucleotide sequences from all available public sources. The database is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis to achieve optimal synchronization. Webin is the preferred web-based submission system for individual submitters, while automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, Email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS) integrates and links the main nucleotide and protein databases plus many other specialized molecular biology databases. For sequence similarity searching, a variety of tools (e.g. Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
Collapse
Affiliation(s)
- Guenter Stoesser
- EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Stoesser G, Baker W, van den Broek A, Camon E, Garcia-Pastor M, Kanz C, Kulikova T, Leinonen R, Lin Q, Lombard V, Lopez R, Redaschi N, Stoehr P, Tuli MA, Tzouvara K, Vaughan R. The EMBL Nucleotide Sequence Database. Nucleic Acids Res 2002; 30:21-6. [PMID: 11752244 PMCID: PMC99098 DOI: 10.1093/nar/30.1.21] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The EMBL Nucleotide Sequence Database (aka EMBL-Bank; http://www.ebi.ac.uk/embl/) incorporates, organises and distributes nucleotide sequences from all available public sources. EMBL-Bank is located and maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK. In an international collaboration with DDBJ (Japan) and GenBank (USA), data are exchanged amongst the collaborating databases on a daily basis. Major contributors to the EMBL database are individual scientists and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via FTP, email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many other specialized databases. For sequence similarity searching, a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT. All resources can be accessed via the EBI home page at http://www.ebi.ac.uk.
Collapse
Affiliation(s)
- Guenter Stoesser
- EMBL Outstation, The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
5
|
Stoesser G, Baker W, van den Broek A, Camon E, Garcia-Pastor M, Kanz C, Kulikova T, Lombard V, Lopez R, Parkinson H, Redaschi N, Sterk P, Stoehr P, Tuli MA. The EMBL nucleotide sequence database. Nucleic Acids Res 2001; 29:17-21. [PMID: 11125039 PMCID: PMC29766 DOI: 10.1093/nar/29.1.17] [Citation(s) in RCA: 76] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI's Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.
Collapse
Affiliation(s)
- G Stoesser
- EMBL Outstation-The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Tristem M. Identification and characterization of novel human endogenous retrovirus families by phylogenetic screening of the human genome mapping project database. J Virol 2000; 74:3715-30. [PMID: 10729147 PMCID: PMC111881 DOI: 10.1128/jvi.74.8.3715-3730.2000] [Citation(s) in RCA: 231] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Human endogenous retroviruses (HERVs) were first identified almost 20 years ago, and since then numerous families have been described. It has, however, been difficult to obtain a good estimate of both the total number of independently derived families and their relationship to each other as well as to other members of the family Retroviridae. In this study, I used sequence data derived from over 150 novel HERVs, obtained from the Human Genome Mapping Project database, and a variety of recently identified nonhuman retroviruses to classify the HERVs into 22 independently acquired families. Of these, 17 families were loosely assigned to the class I HERVs, 3 to the class II HERVs and 2 to the class III HERVs. Many of these families have been identified previously, but six are described here for the first time and another four, for which only partial sequence information was previously available, were further characterized. Members of each of the 10 families are defective, and calculation of their integration dates suggested that most of them are likely to have been present within the human lineage since it diverged from the Old World monkeys more than 25 million years ago.
Collapse
Affiliation(s)
- M Tristem
- Department of Biology, Imperial College, Silwood Park, Ascot, Berkshire SL5 7PY, United Kingdom.
| |
Collapse
|
7
|
Haapa S, Suomalainen S, Eerikäinen S, Airaksinen M, Paulin L, Savilahti H. An Efficient DNA Sequencing Strategy Based on the Bacteriophage Mu in Vitro DNA Transposition Reaction. Genome Res 1999. [DOI: 10.1101/gr.9.3.308] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A highly efficient DNA sequencing strategy was developed on the basis of the bacteriophage Mu in vitro DNA transposition reaction. In the reaction, an artificial transposon with a chloramphenicol acetyltransferase (cat) gene as a selectable marker integrated into the target plasmid DNA containing a 10.3-kb mouse genomic insert to be sequenced. Bacterial clones carrying plasmids with the transposon insertions in different positions were produced by transforming transposition reaction products into Escherichia coli cells that were then selected on appropriate selection plates. Plasmids from individual clones were isolated and used as templates for DNA sequencing, each with two primers specific for the transposon sequence but reading the sequence into opposite directions, thus creating a minicontig. By combining the information from overlapping minicontigs, the sequence of the entire 10,288-bp region of mouse genome including six exons of mouse Kcc2 gene was obtained. The results indicated that the described methodology is extremely well suited for DNA sequencing projects in which considerable sequence information is on demand. In addition, massive DNA sequencing projects, including those of full genomes, are expected to benefit substantially from the Mu strategy.[The sequence data reported in this paper have been submitted to the GenBank data library under accession no. AJ011033.]
Collapse
|
8
|
Abstract
We have begun a joint program as part of a coordinated international effort to determine a complete human genome sequence. Our strategy is to map large-insert bacterial clones and to sequence each clone by a random shotgun approach followed by directed finishing. As of September 1998, we have identified the map positions of bacterial clones covering approximately 860 Mb for sequencing and completed >98 Mb ( approximately 3.3%) of the human genome sequence. Our progress and sequencing data can be accessed via the World Wide Web (http://webace.sanger.ac.uk/HGP/ or http://genome.wustl.edu/gsc/).
Collapse
|
9
|
Traini M, Gooley AA, Ou K, Wilkins MR, Tonella L, Sanchez JC, Hochstrasser DF, Williams KL. Towards an automated approach for protein identification in proteome projects. Electrophoresis 1998; 19:1941-9. [PMID: 9740054 DOI: 10.1002/elps.1150191112] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The development of automated, high throughput technologies for the rapid identification of proteins is essential for large-scale proteome projects. While a degree of automation already exists in some stages of the protein identification process, such as automated acquisition of matrix assisted laser desorption ionisation-time of flight (MALDI-TOF) mass spectra, efficient interfaces between different stages are still lacking. We report the development of a highly automated, integrated system for large scale identification of proteins separated by two-dimensional gel electrophoresis (2-DE), based on peptide mass fingerprinting. A prototype robotic system was used to image and excise 288 protein spots from an amido black stained polyvinylidene difluoride (PVDF) blot. Protein samples were enzymatically digested with a commercial automated liquid handling system. MALDI-TOF mass spectrometry was used to acquire mass spectra automatically, and the data analysed with novel automated peptide mass fingerprinting database interrogation software. Using this highly automated system, we were able to identify 95 proteins on the basis of peptide mass fingerprinting, isoelectric point and molecular weight, in a period of less than ten working days. Advantages, problems, and future developments in robotic excision systems, liquid handling, and automated database interrogation software are discussed.
Collapse
Affiliation(s)
- M Traini
- Macquarie University Centre for Analytical Biotechnology, School of Biological Sciences, Macquarie University, Sydney, NSW, Australia
| | | | | | | | | | | | | | | |
Collapse
|