51
|
Butler A, Whitehead AS. Mapping of the mouse serum amyloid A gene cluster by long-range polymerase chain reaction. Immunogenetics 1996; 44:468-74. [PMID: 8824159 DOI: 10.1007/bf02602809] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
The present study defines the organization of the mouse serum amyloid A (Saa) gene cluster on chromosome 7. A polymerase chain reaction (PCR)-based strategy was used successfully to generate a complete map of the mouse Saa genes, defining a linkage group of 3'-Saa2-5'/5'-Saa1-3'/5'-Saa4-3'/5'-Saa5-3'/5'-+ ++Saa3-3', with a maximum size of 45 kilobases (kb). This contrasts with the 150 kb human SAA gene cluster, which has been previously defined. The tight linkage of both mouse Saas and human SAAs is of potential functional significance, since the genes that encode the acute phase serum amyloid A proteins are known to exhibit co-ordinate transcriptional regulation. The present results thus suggest that selective pressure may exist which maintains the co-ordinately transcribed Saa genes in close physical proximity. This study, furthermore, demonstrates the utility of a novel PCR-based approach for fine mapping of tightly clustered linkage groups. The strategy used possesses a number of advantages over previously described techniques, such as long-range restriction mapping, since it facilitates the concurrent determination of not only precise relative map positions, but also the relative transcriptional orientations of assayed paired loci. Although presently limited in resolution to genes not more than 27 kb apart, future technical advances are likely to extend the applicability of this approach in mapping experiments to less tightly linked clusters of genes.
Collapse
Affiliation(s)
- A Butler
- Department of Genetics and Biotechnology Institute, Trinity College, University of Dublin, Dublin 2, Ireland
| | | |
Collapse
|
52
|
Sbisà E, Pesole G, Tullo A, Saccone C. The evolution of the RNase P- and RNase MRP-associated RNAs: phylogenetic analysis and nucleotide substitution rate. J Mol Evol 1996; 43:46-57. [PMID: 8660429 DOI: 10.1007/bf02352299] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
We report a detailed evolutionary study of the RNase P- and RNase MRP- associated RNAs. The analyses were performed on all the available complete sequences of RNase MRP (vertebrates, yeast, plant), nuclear RNase P (vertebrates, yeast), and mitochondrial RNase P (yeast) RNAs. For the first time the phylogenetic distance between these sequences and the nucleotide substitution rates have been quantitatively measured.The analyses were performed by considering the optimal multiple alignments obtained mostly by maximizing similarity between primary sequences. RNase P RNA and MRP RNA display evolutionary dynamics following the molecular clock. Both have similar rates and evolve about one order of magnitude faster than the corresponding small rRNA sequences which have been, so far, the most common gene markers used for phylogeny. However, small rRNAs evolve too slowly to solve close phylogenetic relationships such as those between mammals. The quicker rate of RNase P and MRP RNA allowed us to assess phylogenetic relationships between mammals and other vertebrate species and yeast strains. The phylogenetic data obtained with yeasts perfectly agree with those obtained by functional assays, thus demonstrating the potential offered by this approach for laboratory experiments.
Collapse
Affiliation(s)
- E Sbisà
- Centro di Studio sui Mitocondri e Metabolismo Energetico, CNR. Via Amendola, 165/A, 70126 Bari, Italy
| | | | | | | |
Collapse
|
53
|
Perrière G, Moszer I, Gojobori T. NRSub: a non-redundant database for Bacillus subtilis. Nucleic Acids Res 1996; 24:41-5. [PMID: 8594597 PMCID: PMC145565 DOI: 10.1093/nar/24.1.41] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
In the context of the international project aimed at sequencing the whole genome of Bacillus subtilis we have developed a non-redundant, fully annotated database of sequences from this organism. Starting from the B.subtilis sequences available in the EMBL, GenBank and DDBJ collections we have removed all encountered duplications and then added extra annotations to the sequences (e.g. accession numbers for the genes, locations on the genetic map, codon usage, etc.) We have also added cross-references to the EMBL, MEDLINE, SWISS-PROT and ENZYME data banks. The present system results from merging of the NRSub and SubtiList databases and the sequence contigs used in the two systems are identical. NRSub is distributed as a flatfile in EMBL format (which is supported by most sequence analysis software packages) and as an ACNUC database, while SubtiList is distributed as a relational database under 4th Dimension. It is possible to access the data through two dedicated World Wide Web servers located in France and Japan.
Collapse
Affiliation(s)
- G Perrière
- Laboratoire de Biometrie, Universite Claude Bernard-Lyon, Villeurbanne, France
| | | | | |
Collapse
|
54
|
Duret L, Mouchiroud D, Gautier C. Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores. J Mol Evol 1995; 40:308-17. [PMID: 7723057 DOI: 10.1007/bf00163235] [Citation(s) in RCA: 186] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
We compared the exon/intron organization of vertebrate genes belonging to different isochore classes, as predicted by their GC content at third codon position. Two main features have emerged from the analysis of sequences published in GenBank: (1) genes coding for long proteins (i.e., > or = 500 aa) are almost two times more frequent in GC-poor than in GC-rich isochores; (2) intervening sequences (= sum of introns) are on average three times longer in GC-poor than in GC-rich isochores. These patterns are observed among human, mouse, rat, cow, and even chicken genes and are therefore likely to be common to all warm-blooded vertebrates. Analysis of Xenopus sequences suggests that the same patterns exist in cold-blooded vertebrates. It could be argued that such results do not reflect the reality because sequence databases are not representative of entire genomes. However, analysis of biases in GenBank revealed that the observed discrepancies between GC-rich and GC-poor isochores are not artifactual, and are probably largely underestimated. We investigated the distribution of microsatellites and interspersed repeats in introns of human and mouse genes from different isochores. This analysis confirmed previous studies showing that L1 repeats are almost absent from GC-rich isochores. Microsatellites and SINES (Alu, B1, B2) are found at roughly equal frequencies in introns from all isochore classes. Globally, the presence of repeated sequences does not account for the increased intron length in GC-poor isochores. The relationships between gene structure and global genome organization and evolution are discussed.
Collapse
Affiliation(s)
- L Duret
- Laboratoire de Biométrie, Génétique et Biologie des Populations, Université Claude Bernard, Lyon I, URA-CNRS 243, Villeurbanne, France
| | | | | |
Collapse
|
55
|
Perrière G, Gouy M, Gojobori T. NRSub: a non-redundant data base for the Bacillus subtilis genome. Nucleic Acids Res 1994; 22:5525-9. [PMID: 7838704 PMCID: PMC310112 DOI: 10.1093/nar/22.25.5525] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
We have organized the DNA sequences of Bacillus subtillis from the EMBL collection to build the NRSub data base. This data base is free from duplications and all detected overlapping sequences are merged into contigs. Data on gene mapping and codon usage are also included. NRSub is publically available through anonymous FTP in flat file format or structured on the form of an ACNUC data base. Under this format, it is possible to use NRSub with the retrieval program Query--win. This program integrates a graphical interface and may be installed on any kind of UNX computer under X Window and on which the Vibrant and Motif libraries are available.
Collapse
Affiliation(s)
- G Perrière
- Laboratorie de Biométrie, Génétique et Biologie des Populations, URA CNRS no. 243, Unviersité Calude Bernard, Lyon, France
| | | | | |
Collapse
|
56
|
Cantatore P, Roberti M, Pesole G, Ludovico A, Milella F, Gadaleta MN, Saccone C. Evolutionary analysis of cytochrome b sequences in some Perciformes: evidence for a slower rate of evolution than in mammals. J Mol Evol 1994; 39:589-97. [PMID: 7807548 DOI: 10.1007/bf00160404] [Citation(s) in RCA: 132] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
To obtain information relative to the phylogenesis and microevolutionary rate of fish mitochondrial DNA, the nucleotide sequence of cytochrome b gene in seven fish species belonging to the order of Perciformes was determined. Sequence analysis showed that fish mitochondrial DNA has a nucleotide compositional bias similar to that of sharks but lower compared to mammals and birds. Quantitative evolutionary analysis, carried out by using a markovian stochastic model, clarifies some phylogenetic relationships within the Perciformes order, particularly in the Scombridae family, and between Perciformes, Gadiformes, Cypriniformes, and Acipenseriformes. The molecular clock of mitochondrial DNA was calibrated with the nucleotide substitution rate of cytochrome b gene in five shark species having divergence times inferred from paleontological estimates. The results of such analysis showed that Acipenseriformes diverged from Perciformes by about 200 MY, that the Perciformes common ancestor dates back to 150 MY, and that fish mitochondrial DNA has a nucleotide substitution rate three to five times lower than that of mammals.
Collapse
Affiliation(s)
- P Cantatore
- Department of Biochemistry and Molecular Biology, University of Bari, Italy
| | | | | | | | | | | | | |
Collapse
|
57
|
Lobry JR, Gautier C. Hydrophobicity, expressivity and aromaticity are the major trends of amino-acid usage in 999 Escherichia coli chromosome-encoded genes. Nucleic Acids Res 1994; 22:3174-80. [PMID: 8065933 PMCID: PMC310293 DOI: 10.1093/nar/22.15.3174] [Citation(s) in RCA: 216] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Multivariate analysis of the amino-acid compositions of 999 chromosome-encoded proteins from Escherichia coli showed that three main factors influence the variability of amino-acid composition. The first factor was correlated with the global hydrophobicity of proteins, and it discriminated integral membrane proteins from the others. The second factor was correlated with gene expressivity, showing a bias in highly expressed genes towards amino-acids having abundant major tRNAs. Just as highly expressed genes have reduced codon diversity in protein coding sequences, so do they have a reduced diversity of amino-acid choice. This showed that translational constraints are important enough to affect the global amino-acid composition of proteins. The third factor was correlated with the aromaticity of proteins, showing that aromatic amino-acid content is highly variable.
Collapse
Affiliation(s)
- J R Lobry
- Laboratoire de Biométrie, CNRS URA 243, Université Claude Bernard, Villeurbanne, France
| | | |
Collapse
|
58
|
Stenico M, Lloyd AT, Sharp PM. Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases. Nucleic Acids Res 1994; 22:2437-46. [PMID: 8041603 PMCID: PMC308193 DOI: 10.1093/nar/22.13.2437] [Citation(s) in RCA: 217] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Synonymous codon usage varies considerably among Caenorhabditis elegans genes. Multivariate statistical analyses reveal a single major trend among genes. At one end of the trend lie genes with relatively unbiased codon usage. These genes appear to be lowly expressed, and their patterns of codon usage are consistent with mutational biases influenced by the neighbouring nucleotide. At the other extreme lie genes with extremely biased codon usage. These genes appear to be highly expressed, and their codon usage seems to have been shaped by selection favouring a limited number of translationally optimal codons. Thus, the frequency of these optimal codons in a gene appears to be correlated with the level of gene expression, and may be a useful indicator in the case of genes (or open reading frames) whose expression levels (or even function) are unknown. A second, relatively minor trend among genes is correlated with the frequency of G at synonymously variable sites. It is not yet clear whether this trend reflects variation in base composition (or mutational biases) among regions of the C.elegans genome, or some other factor. Sequence divergence between C.elegans and C.briggsae has also been studied.
Collapse
Affiliation(s)
- M Stenico
- Department of Genetics, Trinity College, Dublin, Ireland
| | | | | |
Collapse
|
59
|
Abstract
Comparison of homologous genes is a major step for many studies related to genome structure, function or evolution. Similarity search programs easily find genes homologous to a given sequence. However, only very tedious manual procedures allow the retrieval of all sets of homologous genes sequenced for a given set of species. Moreover, this search often generates errors due to the complexity of data to be managed simultaneously: phylogenetic trees, alignments, taxonomy, sequences and related information. HOVERGEN helps to solve these problems by integrating all this information. HOVERGEN corresponds to GenBank sequences from all vertebrate species, with some data corrected, clarified, or completed, notably to address the problem of redundancy. Coding sequences have been classified in gene families. Protein multiple alignments and phylogenetic trees have been calculated for each family. Sequences and related information have been structured in an ACNUC database which permits complex selections. A graphical interface has been developed to visualize and edit trees. Genes are displayed in color, according to their taxonomy. Users have directly access to all information attached to sequences and to multiple alignments simply by clicking on genes. This graphical tool gives thus a rapid and simple access to all data necessary to interpret homology relationships between genes. HOVERGEN allows the user to easily select sets of homologous vertebrate genes, and thus is particularly useful for comparative sequence analysis, or molecular evolution studies.
Collapse
Affiliation(s)
- L Duret
- Laboratoire de Biométrie, Génétique et Biologie des Populations, Université Claude Bernard, Lyon I, URA-CNRS 243, Villeurbanne, France
| | | | | |
Collapse
|
60
|
Wolfe KH, Sharp PM. Mammalian gene evolution: nucleotide sequence divergence between mouse and rat. J Mol Evol 1993; 37:441-56. [PMID: 8308912 DOI: 10.1007/bf00178874] [Citation(s) in RCA: 144] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
As a paradigm of mammalian gene evolution, the nature and extent of DNA sequence divergence between homologous protein-coding genes from mouse and rat have been investigated. The data set examined includes 363 genes totalling 411 kilobases, making this by far the largest comparison conducted between a single pair of species. Mouse and rat genes are on average 93.4% identical in nucleotide sequence and 93.9% identical in amino acid sequence. Individual genes vary substantially in the extent of nonsynonymous nucleotide substitution, as expected from protein evolution studies; here the variation is characterized. The extent of synonymous (or silent) substitution also varies considerably among genes, though the coefficient of variation is about four times smaller than for nonsynonymous substitutions. A small number of genes mapped to the X-chromosome have a slower rate of molecular evolution than average, as predicted if molecular evolution is "male-driven." Base composition at silent sites varies from 33% to 95% G+C in different genes; mouse and rat homologues differ on average by only 1.7% in silent-site G+C, but it is shown that this is not necessarily due to any selective constraint on their base composition. Synonymous substitution rates and silent site base composition appear to be related (genes at intermediate G+C have on average higher rates), but the relationship is not as strong as in our earlier analyses. Rates of synonymous and nonsynonymous substitution are correlated, apparently because of an excess of substitutions involving adjacent pairs of nucleotides. Several factors suggest that synonymous codon usage in rodent genes is not subject to selection.
Collapse
Affiliation(s)
- K H Wolfe
- Department of Genetics, University of Dublin, Trinity College, Ireland
| | | |
Collapse
|
61
|
Abstract
The DNA sequences of the recA gene from 25 strains of bacteria are known. The evolution of these recA gene sequences, and of the derived RecA protein sequences, is examined, with special reference to the effect of variations in genomic G + C content. From the aligned RecA protein sequences, phylogenetic trees have been drawn using both distance matrix and maximum parsimony methods. There is a broad concordance between these trees and those derived from other data (largely 16S ribosomal RNA sequences). There is a fair degree of certainty in the relationships among the "Purple" or Proteobacteria, but the branching pattern between higher taxa within the eubacteria cannot be reliably resolved with these data.
Collapse
Affiliation(s)
- A T Lloyd
- Department of Genetics, Trinity College, Dublin, Ireland
| | | |
Collapse
|
62
|
Mouchiroud D, Bernardi G. Compositional properties of coding sequences and mammalian phylogeny. J Mol Evol 1993; 37:109-16. [PMID: 8411199 DOI: 10.1007/bf02407345] [Citation(s) in RCA: 28] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The compositional distributions of large DNA fragments reflect those of the isochores that make up vertebrate genomes and can provide novel phylogenetic insights in the case of mammalian genomes (see Sabeur et al. 1993). This approach has been complemented here by an analysis of the compositional patterns of coding sequences and their codon positions (which also reflect the isochore pattern) and by a comparison of the base compositions of codon positions from homologous genes in a number of pairs of species. The results obtained using these two approaches support the existence of a general compositional pattern for mammalian genomes and of a distinct pattern for Myomorpha. The other two "special" patterns identified in a megachiropteran and in pangolin could not be tested here.
Collapse
Affiliation(s)
- D Mouchiroud
- Laboratoire de Biométrie, Génétique et Biologie des Populations, U.R.A. 243, Université Claude Bernard, Villeurbanne, France
| | | |
Collapse
|
63
|
Duret L, Dorkeld F, Gautier C. Strong conservation of non-coding sequences during vertebrates evolution: potential involvement in post-transcriptional regulation of gene expression. Nucleic Acids Res 1993; 21:2315-22. [PMID: 8506129 PMCID: PMC309526 DOI: 10.1093/nar/21.10.2315] [Citation(s) in RCA: 125] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
Comparison of nucleotide sequences from different classes of vertebrates that diverged more than 300 million years ago, revealed the existence of highly conserved regions (HCRs) with more than 70% similarity over 100 to 1450 nt in non-coding parts of genes. Such a conservation is unexpected because it is much longer and stronger than what is necessary for specifying the binding of a regulatory protein. HCRs are relatively frequent, particularly in genes that are essential to cell life. In multigene families, conserved regions are specific of each isotype and are probably involved in the control of their specific pattern of expression. Studying HCRs distribution within genes showed that functional constraints are generally much stronger in 3'-non-coding regions than in promoters or introns. The 3'-HCRs are particularly A + T-rich and are always located in the transcribed untranslated regions of genes, which suggests that they are involved in post-transcriptional processes. However, current knowledge of mechanisms that regulate mRNA export, localisation, translation, or degradation is not sufficient to explain the strong functional constraints that we have characterised.
Collapse
Affiliation(s)
- L Duret
- Laboratoire de Biométrie, Génétique et Biologie des Populations, Université Claude Bernard, Lyon I, URA-CNRS 243, Villeurbanne, France
| | | | | |
Collapse
|
64
|
Sharp PM, Lloyd AT. Regional base composition variation along yeast chromosome III: evolution of chromosome primary structure. Nucleic Acids Res 1993; 21:179-83. [PMID: 8441625 PMCID: PMC309089 DOI: 10.1093/nar/21.2.179] [Citation(s) in RCA: 92] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
The recent determination of the complete sequence of chromosome III from the yeast Saccharomyces cerevisiae allows, for the first time, the investigation of the long range primary structure of a eukaryotic chromosome. We have found that, against a background G+C level of about 35%, there are two regions (one in each chromosome arm) in which G+C values rise to over 50%. This effect is seen in silent sites within genes, but not in noncoding intergenic sequences. The variation in G+C content is not related to differential selection of synonymous codons, and probably reflects mutational biases. That the intergenic regions do not exhibit the same phenomenon is particularly interesting, and suggests that they are under substantial constraint. The yeast chromosome may be a model of the structure of the human genome, since there is evidence that it is also a mosaic of long regions of different base compositions, reflected in wide variation of G+C content at silent sites among genes. Two possible causes of this regional effect, replication timing, and recombination frequency, are discussed.
Collapse
Affiliation(s)
- P M Sharp
- Department of Genetics, Trinity College, Dublin, Ireland
| | | |
Collapse
|
65
|
Lloyd AT, Sharp PM. Evolution of codon usage patterns: the extent and nature of divergence between Candida albicans and Saccharomyces cerevisiae. Nucleic Acids Res 1992; 20:5289-95. [PMID: 1437548 PMCID: PMC334333 DOI: 10.1093/nar/20.20.5289] [Citation(s) in RCA: 91] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Codon usage in a sample of 28 genes from the pathogenic yeast Candida albicans has been analysed using multivariate statistical analysis. A major trend among genes, correlated with gene expression level, was identified. We have focussed on the extent and nature of divergence between C.albicans and the closely related yeast Saccharomyces cerevisiae. It was recently suggested that significant differences exist between the subsets of preferred codons in these two species [Brown et al. (1991) Nucleic Acids Res. 19, 4293]. Overall, the genes of C.albicans are more A + T-rich, reflecting the lower genomic G + C content of that species, and presumably resulting from a different pattern of mutational bias. However, in both species highly expressed genes preferentially use the same subset of 'optimal' codons. A suggestion that the low frequency of NCG codons in both yeast species results from selection against the presence of codons that are potentially highly mutable is discounted. Codon usage in C.albicans, as in other unicellular species, can be interpreted as the result of a balance between the processes of mutational bias and translational selection. Codon usage in two related Candida species, C.maltosa and C.tropicalis, is briefly discussed.
Collapse
Affiliation(s)
- A T Lloyd
- Department of Genetics, Trinity College, Dublin, Ireland
| | | |
Collapse
|
66
|
Chantret I, Lacasa M, Chevalier G, Ruf J, Islam I, Mantei N, Edwards Y, Swallow D, Rousset M. Sequence of the complete cDNA and the 5' structure of the human sucrase-isomaltase gene. Possible homology with a yeast glucoamylase. Biochem J 1992; 285 ( Pt 3):915-23. [PMID: 1353958 PMCID: PMC1132882 DOI: 10.1042/bj2850915] [Citation(s) in RCA: 63] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The complete sequence of the 6 kb cDNA and the 5' genomic structure are reported for the gene coding for the human intestinal brush border hydrolase sucrase-isomaltase. The human sucrase-isomaltase cDNA shows a high level of identity (83%) with that of the rabbit enzyme, indicating that the protein shares the same structural domains in both species. In addition to the previously reported homology with lysosomal alpha-glucosidase, the sucrase and isomaltase subunits also appear to be homologous to a yeast glucoamylase. A 14 kb human genomic clone has been isolated which includes the first three exons and the first two introns of the gene, as well as 9.5 kb 5' to the major start site of transcription. The first exon comprises 62 bp of untranslated sequence and the second starts exactly at the initiation ATG codon. Typical CAAT and TATA boxes are seen upstream of the first exon. A genetic polymorphism is described which involves a PstI site in the second intron. Southern blotting, sequencing and mRNA studies indicate that the structures of the sucrase-isomaltase gene and its mRNA are unaltered in the two human colon cancer cell lines Caco-2 and HT-29 in comparison with normal human small intestine.
Collapse
Affiliation(s)
- I Chantret
- MRC Human Biochemical Genetics Unit, Galton Laboratory, University College London, U.K
| | | | | | | | | | | | | | | | | |
Collapse
|
67
|
Pesole G, Prunella N, Liuni S, Attimonelli M, Saccone C. WORDUP: an efficient algorithm for discovering statistically significant patterns in DNA sequences. Nucleic Acids Res 1992; 20:2871-5. [PMID: 1614873 PMCID: PMC336935 DOI: 10.1093/nar/20.11.2871] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
We present here a fast and sensitive method designed to isolate short nucleotide sequences which have non-random statistical properties and may thus be biologically active. It is based on a first order Markov analysis and allows us to detect statistically significant sequence motifs from six to ten nucleotides long which are significantly shared (or avoided) in the sequences under investigation. This method has been tested on a set of 521 sequences extracted from the Eukaryotic Promoter Database (2). Our results demonstrate the accuracy and the efficiency of the method in that the sequence motifs which are known to act as eukaryotic promoters, such as the TATA-box and the CAAT-box, were clearly identified. In addition we have found other statistically significant motifs, the biological roles of which are yet to be clarified.
Collapse
Affiliation(s)
- G Pesole
- Dipartimento di Biochimica e Biologia Molecolare, Università di Bari, Italy
| | | | | | | | | |
Collapse
|
68
|
Pizzi E, Attimonelli M, Liuni S, Frontali C, Saccone C. A simple method for global sequence comparison. Nucleic Acids Res 1992; 20:131-6. [PMID: 1738591 PMCID: PMC310336 DOI: 10.1093/nar/20.1.131] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
A simple method of sequence comparison, based on a correlation analysis of oligonucleotide frequency distributions, is here shown to be a reliable test of overall sequence similarity. The method does not involve sequence alignment procedures and permits the rapid screening of large amounts of sequence data. It identifies those sequences which deserve more careful analysis of sequence similarity at the level of resolution of the single nucleotide. It uses observed quantities only and does not involve the adoption of any theoretical model.
Collapse
Affiliation(s)
- E Pizzi
- Centro Studi Mitocondri e Metabolismo Energetico CNR, Dipartimento di Biochimica e Biologia Molecolare, University of Bari, Italy
| | | | | | | | | |
Collapse
|
69
|
Sharp PM. Determinants of DNA sequence divergence between Escherichia coli and Salmonella typhimurium: codon usage, map position, and concerted evolution. J Mol Evol 1991; 33:23-33. [PMID: 1909371 DOI: 10.1007/bf02100192] [Citation(s) in RCA: 171] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The nature and extent of DNA sequence divergence between homologous protein-coding genes from Escherichia coli and Salmonella typhimurium have been examined. The degree of divergence varies greatly among genes at both synonymous (silent) and nonsynonymous sites. Much of the variation in silent substitution rates can be explained by natural selection on synonymous codon usage, varying in intensity with gene expression level. Silent substitution rates also vary significantly with chromosomal location, with genes near oriC having lower divergence. Certain genes have been examined in more detail. In particular, the duplicate genes encoding elongation factor Tu, tufA and tufB, from S. typhimurium have been compared to their E. coli homologues. As expected these very highly expressed genes have high codon usage bias and have diverged very little between the two species. Interestingly, these genes, which are widely spaced on the bacterial chromosome, also appear to be undergoing concerted evolution, i.e., there has been exchange between the loci subsequent to the divergence of the two species.
Collapse
Affiliation(s)
- P M Sharp
- Department of Genetics, Trinity College, Dublin, Ireland
| |
Collapse
|
70
|
Aïssani B, D'Onofrio G, Mouchiroud D, Gardiner K, Gautier C, Bernardi G. The compositional properties of human genes. J Mol Evol 1991; 32:493-503. [PMID: 1908020 DOI: 10.1007/bf02102651] [Citation(s) in RCA: 52] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present work represents the first attempt to study in greater detail previously proposed compositional correlations in genomes, based on a body of additional data relating to gene localizations as well as to extended flanking sequences extracted from gene banks. We have investigated the correlations that exist between (1) the GC levels of exons of human genes, and (2) the GC levels of either intergenic sequences or introns associated with the genes under consideration. In both cases, linear relationships with slopes close to unity were found. The similarity of the linear relationships indicates similar GC levels in intergenic sequences and introns located in the same isochores. Moreover, both intergenic sequences and introns showed GC levels 5-10% lower than the corresponding exons. The above findings considerably strengthen the previously drawn conclusion that coding and noncoding sequences (both inter- and intragenic) from the same isochores of the human genome are compositionally correlated. In addition, we find linear correlations between the GC levels of codon positions and of the intergenic sequences or introns associated with the corresponding genes, as well as among the GC levels of codon positions of genes.
Collapse
Affiliation(s)
- B Aïssani
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, Paris, France
| | | | | | | | | | | |
Collapse
|
71
|
D'Onofrio G, Mouchiroud D, Aïssani B, Gautier C, Bernardi G. Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins. J Mol Evol 1991; 32:504-10. [PMID: 1908021 DOI: 10.1007/bf02102652] [Citation(s) in RCA: 125] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
We have analyzed the correlation that exists between the GC levels of third and first or second codon position for about 1400 human coding sequences. The linear relationship that was found indicates that the large differences in GC level of third codon positions of human genes are paralleled by smaller differences in GC levels of first and second codon positions. Whereas third codon position differences correspond to very large differences in codon usage within the human genome, the first and second codon position differences correspond to smaller, yet very remarkable, differences in the amino acid composition of encoded proteins. Because GC levels of codon positions are linearly correlated with the GC levels of the isochores harboring the corresponding genes, both codon usage and amino acid composition are different for proteins encoded by genes located in isochores of different GC levels. Furthermore, we have also shown that a linear relationship with a unit slope and a correlation coefficient of 0.77 exists between GC levels of introns and exons from the 238 human genes currently available for this analysis. Introns are, however, about 5% lower in GC, on average, than exons from the same genes.
Collapse
Affiliation(s)
- G D'Onofrio
- Laboratoire de Génétique Moléculaire, Institut Jacques Monod, Paris, France
| | | | | | | | | |
Collapse
|
72
|
Pesole G, Bozzetti MP, Lanave C, Preparata G, Saccone C. Glutamine synthetase gene evolution: a good molecular clock. Proc Natl Acad Sci U S A 1991; 88:522-6. [PMID: 1671172 PMCID: PMC50843 DOI: 10.1073/pnas.88.2.522] [Citation(s) in RCA: 88] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Glutamine synthetase (EC 6.3.1.2) gene evolution in various animals, plants, and bacteria was evaluated by a general stationary Markov model. The evolutionary process proved to be unexpectedly regular even for a time span as long as that between the divergence of prokaryotes from eukaryotes. This enabled us to draw phylogenetic trees for species whose phylogeny cannot be easily reconstructed from the fossil record. Our calculation of the times of divergence of the various organelle-specific enzymes led us to hypothesize that the pea and bean chloroplast genes for these enzymes originated from the duplication of nuclear genes as a result of the different metabolic needs of the various species. Our data indicate that the duplication of plastid glutamine synthetase genes occurred long after the endosymbiotic events that produced the organelles themselves.
Collapse
Affiliation(s)
- G Pesole
- Dipartimento di Biochimica e Biologia Molecolare, Universitá di Bari, Italy
| | | | | | | | | |
Collapse
|
73
|
Sharp PM, Devine KM. Codon usage and gene expression level in Dictyostelium discoideum: highly expressed genes do 'prefer' optimal codons. Nucleic Acids Res 1989; 17:5029-39. [PMID: 2762118 PMCID: PMC318092 DOI: 10.1093/nar/17.13.5029] [Citation(s) in RCA: 124] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Codon usage patterns in the slime mould Dictyostelium discoideum have been re-examined (a total of 58 genes have been analysed). Considering the extreme A + T-richness of this genome (G + C = 22%), there is a surprising degree of codon usage variation among genes. For example, G + C content at silent sites varies from less than 10% to greater than 30%. It was previously suggested [Warrick, H.M. and Spudich, J.A. (1988) Nucleic Acids Res. 16: 6617-6635] that highly expressed genes contain fewer 'optimal' codons than genes expressed at lower levels. However, it appears that the optimal codons were misidentified. Multivariate statistical analysis shows that the greatest variation among genes is in relative usage of a particular subset of codons (about one per amino acid), many of which are C-ending. We have identified these as optimal codons, since (i) their frequency is positively correlated with gene expression level, and (ii) there is a strong mutation bias in this genome towards A and T nucleotides. Thus, codon usage in D. discoideum can be explained by a balance between the forces of mutational bias and translational selection.
Collapse
Affiliation(s)
- P M Sharp
- Department of Genetics, Trinity College, Dublin, Ireland
| | | |
Collapse
|
74
|
Gadaleta G, Pepe G, De Candia G, Quagliariello C, Sbisà E, Saccone C. The complete nucleotide sequence of the Rattus norvegicus mitochondrial genome: cryptic signals revealed by comparative analysis between vertebrates. J Mol Evol 1989; 28:497-516. [PMID: 2504926 DOI: 10.1007/bf02602930] [Citation(s) in RCA: 388] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
This paper reports the nucleotide sequence of rat mitochondrial DNA, only the fourth mammalian mitochondrial genome to be completely sequenced. Extensive comparative studies performed with similar genomes from other organisms revealed a number of interesting features. 1) Messenger RNA genes: the codon strategy is mainly dictated by the base compositional constraints of the corresponding codogenic DNA strand. The usage of the initiation and termination codons follows well-established rules. In general the canonical initiator, ATG, and terminators, TAA and TAG (in rat, only TAA), are always present when there is gene overlapping or when the mRNAs possess untranslated nucleotides at the 5' or 3' ends. 2) Transfer RNA genes: a number of features suggest the peculiar evolutionary behavior of this class of genes and confirm their role in the duplication and rearrangement processes that took place in the evolution of the animal mitochondrial genome. 3) Ribosomal RNA genes: accurate sequence analysis revealed a number of significant examples of complementarity between ribosomal and messenger RNAs. This suggests that they might play an important role in the regulation of mitochondrial translation and transcription mechanisms. The properties revealed by our work shed new light on the organization and evolution of the vertebrate mitochondrial genome and more importantly open up the way to clearly aimed experimental studies of the regulatory mechanisms in mitochondria.
Collapse
Affiliation(s)
- G Gadaleta
- Centro di Studio sui Mitocondri e Metabolismo Energetico, CNR Bari, Italy
| | | | | | | | | | | |
Collapse
|
75
|
Pesole G, Attimonelli M, Liuni S. A backtranslation method based on codon usage strategy. Nucleic Acids Res 1988; 16:1715-28. [PMID: 3281142 PMCID: PMC338166 DOI: 10.1093/nar/16.5.1715] [Citation(s) in RCA: 26] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
This study describes a method for the backtranslation of an aminoacidic sequence, an extremely useful tool for various experimental approaches. It involves two computer programs CLUSTER and BACKTR written in Fortran 77 running on a VAX/VMS computer. CLUSTER generates a reliable codon usage table through a cluster analysis, based on a chi 2-like distance between the sequences. BACKTR produces backtranslated sequences according to different options when use is made of the codon usage table obtained in addition to selecting the least ambiguous potential oligonucleotide probes within an aminoacidic sequence. The method was tested by applying it to 158 yeast genes.
Collapse
Affiliation(s)
- G Pesole
- Dipartimento di Biochimica e Biologia Molecolare, University of Bari, Italy
| | | | | |
Collapse
|
76
|
Attimonelli M, Lanave C, Liuni S, Pesole G. MERGE: a software package for generating a single data-base starting from EMBL and GenBank collections. Nucleic Acids Res 1988; 16:1681-2. [PMID: 3353218 PMCID: PMC338162 DOI: 10.1093/nar/16.5.1681] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Affiliation(s)
- M Attimonelli
- Dipartimento di Biochimica e Biologia Molecolare, University of Bari, Italy
| | | | | | | |
Collapse
|
77
|
Cantatore P, Roberti M, Rainaldi G, Saccone C, Gadaleta MN. Clustering of tRNA genes in Paracentrotus lividus mitochondrial DNA. Curr Genet 1988; 13:91-6. [PMID: 2834108 DOI: 10.1007/bf00365762] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
We have determined the base sequence of the restriction fragment Bam1-2 (3,593) of Paracentrotus lividus (sea urchin) mtDNA. This fragment contains, in addition to genes previously identified (part of the 12S rRNA, ND1 and part of the ND2 mRNA), a cluster of 15 tRNA genes located between the 12S and ND1 genes. Also to be found in the tRNA gene cluster, between the tRNA(Thr) and tRNA(Pro) genes, is a sequence of 134 bp which constitutes the only non-coding region of this DNA so far identified. The distinctive organization of the tRNA genes and the extreme size reduction of the non-coding region suggest the existence of unique mechanisms for the regulation of gene expression in this organism.
Collapse
Affiliation(s)
- P Cantatore
- Dipartimento di Biochimica e Biologia Molecolare, Università di Bari, Italy
| | | | | | | | | |
Collapse
|
78
|
Saccone C, Attimonelli M, Sbisà E. Structural elements highly preserved during the evolution of the D-loop-containing region in vertebrate mitochondrial DNA. J Mol Evol 1987; 26:205-11. [PMID: 3129568 DOI: 10.1007/bf02099853] [Citation(s) in RCA: 113] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
A detailed comparative study of the regions surrounding the origin of replication in vertebrate mitochondrial DNA (mtDNA) has revealed a number of interesting properties. This region, called the D-loop-containing region, can be divided into three domains. The left (L) and right (R) domains, which have a low G content and contain the 5' and the 3' D-loop ends, respectively, are highly variable for both base sequence and length. They, however, contain thermodynamically stable secondary structures which include the conserved sequence blocks called CSB-1 and TAS which are associated with the start and stop sites, respectively, for D-loop strand synthesis. We have found that a "mirror symmetry" exists between the CSB-1 and TAS elements, which suggests that they can act as specific recognition sites for regulatory, probably dimeric, proteins. Long, statistically significant repeats are found in the L and R domains. Between the L and R domains we observed in all mtDNA sequences a region with a higher G content which was apparently free of complex secondary structure. This central domain, well preserved in mammals, contains an open reading frame of variable length in the organisms considered. The identification of common features well preserved in evolution despite the high primary structural divergence of the D-loop-containing region of vertebrate mtDNA suggests that these properties are of prime importance for the mitochondrial processes that occur in this region and may be useful for singling out the sites on which one should operate experimentally in order to discover functionally important elements.
Collapse
Affiliation(s)
- C Saccone
- Dipartimento di Biochimica e Biologia Molecolare, Università, Bari, Italy
| | | | | |
Collapse
|
79
|
Abstract
Higher plant nuclear sequences reveal avoidance of CpG and TpA doublets. Chloroplast sequences avoid the TpA doublet in all codon positions. The chloroplast genome is not methylated but codon positions II-III and untranslated regions avoid CpG. The mitochondrial genome, also unmethylated, avoids CpG in all codon positions. We therefore deduce that methylation is not sufficient to explain CpG avoidance in the higher plant systems. Other factors must be taken into account such as amino acid composition, codon choices and perhaps stability of the DNA helix.
Collapse
|