1
|
Fernandes GR, Barbosa DVC, Prosdocimi F, Pena IA, Santana-Santos L, Coelho Junior O, Barbosa-Silva A, Velloso HM, Mudado MA, Natale DA, Faria-Campos AC, Aguiar SCV, Ortega JM. A procedure to recruit members to enlarge protein family databases--the building of UECOG (UniRef-Enriched COG Database) as a model. Genet Mol Res 2008; 7:910-24. [PMID: 18949709 DOI: 10.4238/vol7-3x-meeting008] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
A procedure to recruit members to enlarge protein family databases is described here. The procedure makes use of UniRef50 clusters produced by UniProt. Current family entries are used to recruit additional members based on the UniRef50 clusters to which they belong. Only those additional UniRef50 members that are not fragments and whose length is within a restricted range relative to the original entry are recruited. The enriched dataset is then limited to contain only genomes from selected clades. We used the COG database - used for genome annotation and for studies of phylogenetics and gene evolution - as a model. To validate the method, a UniRef-Enriched COG0151 (UECOG) was tested with distinct procedures to compare recruited members with the recruiters: PSI-BLAST, secondary structure overlap (SOV), Seed Linkage, COGnitor, shared domain content, and neighbor-joining single-linkage, and observed that the former four agree in their validations. Presently, the UniRef50-based recruitment procedure enriches the COG database for Archaea, Bacteria and its subgroups Actinobacteria, Firmicutes, Proteobacteria, and other bacteria by 2.2-, 8.0-, 7.0-, 8.8-, 8.7-, and 4.2-fold, respectively, in terms of sequences, and also considerably increased the number of species.
Collapse
Affiliation(s)
- G R Fernandes
- Departamento de Bioquímica e Imunologia, Laboratório de Biodados, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brasil
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Rogozin IB, Babenko VN, Fedorova ND, Jackson JD, Jacobs AR, Krylov DM, Makarova KS, Mazumder R, Mekhedov SL, Mirkin BG, Nikolskaya AN, Rao BS, Smirnov S, Sorokin AV, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA, Koonin EV. Evolution of eukaryotic gene repertoire and gene structure: discovering the unexpected dynamics of genome evolution. Cold Spring Harb Symp Quant Biol 2003; 68:293-301. [PMID: 15338629 DOI: 10.1101/sqb.2003.68.293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2023]
Affiliation(s)
- I B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Jordan IK, Natale DA, Koonin EV, Galperin MY. Independent evolution of heavy metal-associated domains in copper chaperones and copper-transporting atpases. J Mol Evol 2001; 53:622-33. [PMID: 11677622 DOI: 10.1007/s002390010249] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2000] [Accepted: 05/09/2001] [Indexed: 11/29/2022]
Abstract
Copper chaperones are small cytoplasmic proteins that bind intracellular copper (Cu) and deliver it to Cu-dependent enzymes such as cytochrome oxidase, superoxide dismutase, and amine oxidase. Copper chaperones are similar in sequence and structure to the Cu-binding heavy metal-associated (HMA) domains of Cu-transporting ATPases (Cu-ATPases), and the genes for copper chaperones and Cu-ATPases are often located in the same operon. Phylogenetic analysis shows that Cu chaperones and HMA domains of Cu-ATPases represent ancient and distinct lineages that have evolved largely independently since their initial separation. Copper chaperone-Cu-ATPase operons appear to have evolved independently in different prokaryotic lineages, probably due to a strong selective pressure for coexpression of these genes.
Collapse
Affiliation(s)
- I K Jordan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
4
|
Abstract
A complete understanding of the biology of an organism necessarily starts with knowledge of its genetic makeup. Proteins encoded in a genome must be identified and characterized, and the presence or absence of specific sets of proteins must be noted in order to determine the possible biochemical pathways or functional systems utilized by that organism. The COG database presents a set of tools suited to these purposes, including the ability to select protein families (COGs) that contain proteins from a specified set of species. The selection is based upon a phylogenetic pattern, which is a shorthand representation of the presence or absence of a particular species in a COG. Here we present the use of phylogenetic patterns as a means to perform targeted searches for undetected protein-coding genes in complete genomes.
Collapse
Affiliation(s)
- D A Natale
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Healtl, Bethesda MD 20894, USA
| | | | | | | |
Collapse
|
5
|
Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 2001; 29:22-8. [PMID: 11125040 PMCID: PMC29819 DOI: 10.1093/nar/29.1.22] [Citation(s) in RCA: 1413] [Impact Index Per Article: 61.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih. gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.
Collapse
Affiliation(s)
- R L Tatusov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
|
7
|
Abstract
The mechanism for initiation of eukaryotic DNA replication is highly conserved: the proteins required to initiate replication, the sequence of events leading to initiation, and the regulation of initiation are remarkably similar throughout the eukaryotic kingdom. Nevertheless, there is a liberal attitude when it comes to selecting initiation sites. Differences appear to exist in the composition of replication origins and in the way proteins recognize these origins. In fact, some multicellular eukaryotes (the metazoans) can change the number and locations of initiation sites during animal development, revealing that selection of initiation sites depends on epigenetic as well as genetic parameters. Here we have attempted to summarize our understanding of this process, to identify the similarities and differences between single cell and multicellular eukaryotes, and to examine the extent to which origin recognition proteins and replication origins have been conserved among eukaryotes. Published 2000 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- J A Bogan
- National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland 20894, USA.
| | | | | |
Collapse
|
8
|
Natale DA, Li CJ, Sun WH, DePamphilis ML. Selective instability of Orc1 protein accounts for the absence of functional origin recognition complexes during the M-G(1) transition in mammals. EMBO J 2000; 19:2728-38. [PMID: 10835370 PMCID: PMC212765 DOI: 10.1093/emboj/19.11.2728] [Citation(s) in RCA: 70] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
To investigate the events leading to initiation of DNA replication in mammalian chromosomes, the time when hamster origin recognition complexes (ORCs) became functional was related to the time when Orc1, Orc2 and Mcm3 proteins became stably bound to hamster chromatin. Functional ORCs, defined as those able to initiate DNA replication, were absent during mitosis and early G(1) phase, and reappeared as cells progressed through G(1) phase. Immunoblotting analysis revealed that hamster Orc1 and Orc2 proteins were present in nuclei at equivalent concentrations throughout the cell cycle, but only Orc2 was stably bound to chromatin. Orc1 and Mcm3 were easily eluted from chromatin during mitosis and early G(1) phase, but became stably bound during mid-G(1) phase, concomitant with the appearance of a functional pre-replication complex at a hamster replication origin. Since hamster Orc proteins are closely related to their human and mouse homologs, the unexpected behavior of hamster Orc1 provides a novel mechanism in mammals for delaying assembly of pre-replication complexes until mitosis is complete and a nuclear structure has formed.
Collapse
Affiliation(s)
- D A Natale
- National Institute of Child Health and Human Development, Building 6, Room 3A02, National Institutes of Health, Bethesda, MD 20892-2753, USA
| | | | | | | |
Collapse
|
9
|
Li CJ, Bogan JA, Natale DA, DePamphilis ML. Selective activation of pre-replication complexes in vitro at specific sites in mammalian nuclei. J Cell Sci 2000; 113 ( Pt 5):887-98. [PMID: 10671378 DOI: 10.1242/jcs.113.5.887] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
As the first step in determining whether or not pre-replication complexes are assembled at specific sites along mammalian chromosomes, nuclei from G(1)-phase hamster cells were incubated briefly in Xenopus egg extract in order to initiate DNA replication. Most of the nascent DNA consisted of RNA-primed DNA chains 0.5 to 2 kb in length, and its origins in the DHFR gene region were mapped using both the early labeled fragment assay and the nascent strand abundance assay. The results revealed three important features of mammalian replication origins. First, Xenopus egg extract can selectively activate the same origins of bi-directional replication (e.g. ori-beta) and (beta') that are used by hamster cells in vivo. Previous reports of a broad peak of nascent DNA centered at ori-(beta/(beta)' appeared to result from the use of aphidicolin to synchronize nuclei and from prolonged exposure of nuclei to egg extracts. Second, these sites were not present until late G(1)-phase of the cell division cycle, and their appearance did not depend on the presence of Xenopus Orc proteins. Therefore, hamster pre-replication complexes appear to be assembled at specific chromosomal sites during G(1)-phase. Third, selective activation of ori-(beta) in late G(1)-nuclei depended on the ratio of Xenopus egg extract to nuclei, revealing that epigenetic parameters such as the ratio of initiation factors to DNA substrate could determine the number of origins activated.
Collapse
Affiliation(s)
- C J Li
- National Institute of Child Health and Human Development, Building 6, Room 416, National Institutes of Health, Bethesda, MD 20892-2753, USA
| | | | | | | |
Collapse
|
10
|
Tatusov RL, Galperin MY, Natale DA, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 2000. [PMID: 10592175 DOI: 10.1093/nar.28.1.33] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2023] Open
Abstract
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.
Collapse
Affiliation(s)
- R L Tatusov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
11
|
Abstract
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.
Collapse
Affiliation(s)
- R L Tatusov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
12
|
Galperin MY, Natale DA, Aravind L, Koonin EV. A specialized version of the HD hydrolase domain implicated in signal transduction. J Mol Microbiol Biotechnol 1999; 1:303-5. [PMID: 10943560 PMCID: PMC5330256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023] Open
Affiliation(s)
- M Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | | | |
Collapse
|
13
|
Rein T, Natale DA, Gärtner U, Niggemann M, DePamphilis ML, Zorbas H. Absence of an unusual "densely methylated island" at the hamster dhfr ori-beta. J Biol Chem 1997; 272:10021-9. [PMID: 9092544 DOI: 10.1074/jbc.272.15.10021] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
An unusual "densely methylated island" (DMI), in which all cytosine residues are methylated on both strands for 127-516 base pairs, has been reported at mammalian origins of DNA replication. This report had far-reaching implications in understanding of DNA methylation and DNA replication. For example, since this DMI appeared in about 90% of proliferating cells, but not in stationary cells, it may regulate origin activation. In an effort to confirm and extend these observations, the DMI at the well characterized ori-beta locus 17 kilobases downstream of the dhfr gene in chromosomes of Chinese hamster ovary cells was checked for methylated cytosines in genomic DNA. The methylation status of this region was examined in randomly proliferating and stationary cells and in cell populations enriched in the G1, S, or G2 + M phases of their cell division cycle. DNA was subjected to 1) cleavage by methylation-sensitive restriction endonucleases, 2) hydrazine modification of cytosines followed by piperidine cleavage, and 3) permanganate modification of 5-methylcytosines (mC) followed by piperidine cleavage. The permanganate reaction is a novel method for direct detection of mC residues that complements the more commonly used hydrazine method. These methods were capable of detecting mC in 2% of the cells. At the region of the proposed DMI, only one mC at a CpG site was detected. However, the ori-beta DMI was not detected in any of these cell populations using any of these methods.
Collapse
Affiliation(s)
- T Rein
- Institut für Biochemie, Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, D-81377 München, Federal Republic of Germany
| | | | | | | | | | | |
Collapse
|
14
|
Abstract
Autonomously replicating sequence (ARS) elements function as plasmid replication origins. Our studies of the H4 ARS and ARS307 have established the requirement for a DNA unwinding element (DUE), a broad easily-unwound sequence 3' to the essential consensus that likely facilitates opening of the origin. In this report, we examine the intrinsic ease of unwinding a variety of ARS elements using (1) a single-strand-specific nuclease to probe for DNA unwinding in a negatively-supercoiled plasmid, and (2) a computer program that calculates DNA helical stability from the nucleotide sequence. ARS elements that are associated with replication origins on chromosome III are nuclease hypersensitive, and the helical stability minima correctly predict the location and hierarchy of the hypersensitive sites. All well-studied ARS elements in which the essential consensus sequence has been identified by mutational analysis contain a 100-bp region of low helical stability immediately 3' to the consensus, as do ARS elements created by mutation within the prokaryotic M13 vector. The level of helical stability is, in all cases, below that of ARS307 derivatives inactivated by mutations in the DUE. Our findings indicate that the ease of DNA unwinding at the broad region directly 3' to the ARS consensus is a conserved property of yeast replication origins.
Collapse
Affiliation(s)
- D A Natale
- Molecular and Cellular Biology Department, Roswell Park Cancer Institute, Buffalo, NY 14263
| | | | | |
Collapse
|
15
|
Abstract
Earlier studies on the H4 autonomously replicating sequence (ARS) identified a DNA unwinding element (DUE), a required sequence that is hypersensitive to single-strand-specific nucleases and serves to facilitate origin unwinding. Here we demonstrate that a DUE can be identified in the C2G1 ARS, a chromosomal replication origin, by using a computer program that calculates DNA helical stability from the base sequence. The helical stability minima correctly predict the location and hierarchy of the nuclease-hypersensitive sites in a C2G1 ARS plasmid. Nucleotide-level mapping shows that the nuclease-hypersensitive site at the ARS spans a 100-base-pair sequence in the required 3'-flanking region. Mutations that stabilize the DNA helix in the broad 3'-flanking region reduce or abolish ARS-mediated plasmid replication, indicating that helical instability is required for origin function. The level of helical instability is quantitatively related to the replication efficiency of the ARS mutants. Multiple copies of either a consensus-related sequence present in the C2G1 ARS or the consensus sequence itself in synthetic ARS elements contribute to DNA helical instability. Our findings indicate that a DUE is a conserved component of the C2G1 ARS and is a major determinant of replication origin activity.
Collapse
Affiliation(s)
- D A Natale
- Molecular and Cellular Biology Department, Roswell Park Cancer Institute, Buffalo, NY 14263
| | | | | |
Collapse
|
16
|
Kowalski D, Natale DA, Eddy MJ. Stable DNA unwinding, not "breathing," accounts for single-strand-specific nuclease hypersensitivity of specific A+T-rich sequences. Proc Natl Acad Sci U S A 1988; 85:9464-8. [PMID: 2849106 PMCID: PMC282773 DOI: 10.1073/pnas.85.24.9464] [Citation(s) in RCA: 131] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
A long A+T-rich sequence in supercoiled pBR322 DNA is hypersensitive to single-strand-specific nucleases at 37 degrees C but not at reduced temperature. The basis for the nuclease hypersensitivity is stable DNA unwinding as revealed by (i) the same temperature dependence for hypersensitivity and for stable unwinding of plasmid topoisomers after two-dimensional gel electrophoresis, (ii) preferential nuclease digestion of stably unwound topoisomers, and (iii) quantitative nicking of stably unwound topoisomers in the A+T-rich region. Nuclease hypersensitivity of A+T-rich sequences is hierarchical, and either deletion of the primary site or a sufficient increase in the free energy of supercoiling leads to enhanced nicking at an alternative A+T-rich site. The hierarchy of nuclease hypersensitivity reflects a hierarchy in the free energy required for unwinding naturally occurring sequences in supercoiled DNA. This finding, along with the known hypersensitivity of replication origins and transcriptional regulatory regions, has important implications for using single-strand-specific nucleases in DNA structure-function studies.
Collapse
Affiliation(s)
- D Kowalski
- Molecular and Cellular Biology Department, Roswell Park Memorial Institute, Buffalo, NY 14263
| | | | | |
Collapse
|