1
|
|
2
|
Chargaff’s Cluster Rule. Evol Bioinform Online 2016. [DOI: 10.1007/978-3-319-28755-3_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
3
|
Mahale KN, Kempraj V, Dasgupta D. Does the growth temperature of a prokaryote influence the purine content of its mRNAs? Gene 2012; 497:83-9. [PMID: 22305982 DOI: 10.1016/j.gene.2012.01.040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Accepted: 01/19/2012] [Indexed: 11/20/2022]
Abstract
The formation and breaking of hydrogen bonds between nucleic acid bases are dependent on temperature. The high G+C content of organisms was surmised to be an adaptation for high temperature survival because of the thermal stability of G:C pairs. However, a survey of genomic GC% and optimum growth temperature (OGT) of several prokaryotes revoked any direct relation between them. Significantly high purine (R=A or G) content in mRNAs is also seen as a selective response for survival among thermophiles. Nevertheless, the biological relevance of thermophiles loading their unstable mRNAs with excess purines (purine-loading or R-loading) is not persuasive. Here, we analysed the mRNA sequences from the genomes of 168 prokaryotes (as obtained from NCBI Genome database) with their OGTs ranging from -5 °C to 100 °C to verify the relation between R-loading and OGT. Our analysis fails to demonstrate any correlation between R-loading of the mRNA pool and OGT of a prokaryote. The percentage of purine-loaded mRNAs in prokaryotes is found to be in a rough negative correlation with the genomic GC% (r(2)=0.655, slope=-1.478, P<000.1). We conclude that genomic GC% and bias against certain combinations of nucleotides drive the mRNA-synonymous (sense) strands of DNA towards variations in R-loading.
Collapse
|
4
|
Assessment of bilateral limb lymphedema by bioelectrical impedance spectroscopy. Int J Gynecol Cancer 2011; 21:409-18. [PMID: 21270623 DOI: 10.1097/igc.0b013e31820866e1] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
OBJECTIVE The aim of the present study was to determine if the ratio of extracellular fluid (ECF), including the lymph, to that of intracellular fluid (ICF), as measured by bioimpedance spectroscopy (BIS), could be used to assess bilateral lymphedema (LE). BACKGROUND The presence of LE is commonly determined as an increase in tissue volume due to the presence of excess lymph relative to the volume of a comparable unaffected body region or to comparative normative data. However, in bilateral LE of the limbs, a comparable body region, the contralateral limb, is also affected, precluding normalization. An alternative is to normalize the increase in lymph volume, as ECF, to that of ICF volume. METHODS Extracellular/intracellular fluid ratios, expressed as the ratio of intracellular impedance (Ri) to extracellular impedance (R0), for the limbs of 277 female and 224 male controls were determined from an accumulated database of impedance data. Equivalent data were obtained for an opportunistic cross-sectional sample of 37 female and 5 male participants with bilateral LE of the legs. The ratios of Ri/R0 in the lymphedematous legs of the affected participants were compared with the equivalent ratios in the unaffected arms of the same participants and with those of the controls using box plots and visualized as bivariate data using tolerance ellipses. RESULTS Despite Ri/R0 ratios varying with age, sex, and limb dominance, comparison of the ratio for affected legs (normalized to the ratio in the unaffected arms) with equivalent ratios observed in a control population (as bivariate tolerance plots) was capable of discriminating between 70% and 89% of the participants with LE. CONCLUSIONS Bioelectrical impedance spectroscopy and determination of Ri/R0 ratios as indices of ECF/ICF ratios holds promise for the semiquantitative assessment of bilateral LE.
Collapse
|
5
|
Slightom JL, Sun SM, Hall TC. Complete nucleotide sequence of a French bean storage protein gene: Phaseolin. Proc Natl Acad Sci U S A 2010; 80:1897-901. [PMID: 16593301 PMCID: PMC393717 DOI: 10.1073/pnas.80.7.1897] [Citation(s) in RCA: 131] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The complete nucleotide sequences of the gene and the mRNA coding for a specific phaseolin type French bean major storage protein have been determined. Comparison of these sequences reveals a phaseolin gene structure consisting of 80 base pairs (bp) of 5' untranslated DNA, 1,263 bp of protein-encoding DNA which is interrupted by five intervening sequences (IVS1, 72 bp; IVS2, 88 bp; IVS3, 124 bp; IVS4, 128 bp; and IVS5, 103 bp), and 135 bp of 3' untranslated DNA. Sequences characteristic of eukaryotic promoters "CCAAT" and "TATA" are present in the 5' flanking DNA, and the eukaryotic poly(A) addition signal A-A-T-A-A-A occurs 16 bp before the first nucleotide of poly(A). The derived amino acid sequence yields an amino acid composition and a molecular weight compatible with those found for the beta-type phaseolin protein. Two regions that probably serve as carbohydrate-peptide linkage recognition sites have been identified. A region of highly hydrophobic amino acids at the NH(2) terminus of the protein suggests the presence of a signal peptide in the newly synthesized phaseolin protein.
Collapse
Affiliation(s)
- J L Slightom
- Agrigenetics Advanced Research Laboratory, 5649 East Buckeye Road, Madison, Wisconsin 53716
| | | | | |
Collapse
|
6
|
Genetic recombination as a major cause of mutagenesis in the human globin gene clusters. Clin Biochem 2009; 42:1839-50. [DOI: 10.1016/j.clinbiochem.2009.07.014] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2009] [Revised: 06/23/2009] [Accepted: 07/01/2009] [Indexed: 11/18/2022]
|
7
|
Kalamaras A, Chassanidis C, Samara M, Chiotoglou I, Vamvakopoulos NK, Papadakis MN, Kollia* P, Patrinos GP. The 5′ Regulatory Region of the Human Fetal Globin Genes is a Gene Conversion Hotspot. Hemoglobin 2009; 32:572-81. [DOI: 10.1080/03630260802507824] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
8
|
Touchon M, Rocha EPC. From GC skews to wavelets: a gentle guide to the analysis of compositional asymmetries in genomic data. Biochimie 2007; 90:648-59. [PMID: 17988781 DOI: 10.1016/j.biochi.2007.09.015] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2007] [Accepted: 09/21/2007] [Indexed: 12/29/2022]
Abstract
Compositional asymmetries are pervasive in DNA sequences. They are the result of the asymmetric interactions between DNA and cellular mechanisms such as replication and transcription. Here, we review many of the methods that have been proposed over the years to analyse compositional asymmetries in DNA sequences. Among these we list GC skews, oligonucleotide skews and wavelets, which among other uses have been extensively employed to delimitate origins and termini of replication in genomes. We also review the use of multivariate methods, such as factorial correspondence analysis, discriminant analysis and analysis of variance, which allow assigning compositional strand asymmetries to the different biological processes shaping sequence composition. Finally, we review methods that have been used to infer substitution matrices and allow understanding the mutational processes underlying strand asymmetry. We focus on replication asymmetries because they have been more thoroughly studied, but the methods may be adapted, and often are, to other problems. Although strand asymmetry has been studied more frequently through compositional skews of nucleotides or oligonucleotides, we recall that, depending on the goal of the analysis, other methods may be more appropriate to answer certain biological questions. We also refer to programs freely available to analyse strand asymmetry.
Collapse
Affiliation(s)
- Marie Touchon
- Atelier de Bioinformatique, Université Pierre et Marie Curie-Paris 6, Paris, France
| | | |
Collapse
|
9
|
Hu J, Zhao X, Yu J. Replication-associated purine asymmetry may contribute to strand-biased gene distribution. Genomics 2007; 90:186-94. [PMID: 17532183 DOI: 10.1016/j.ygeno.2007.04.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2006] [Revised: 03/09/2007] [Accepted: 04/02/2007] [Indexed: 11/19/2022]
Abstract
Among prokaryotic genomes, the distribution of genes on the leading and lagging strands of the replication fork is known to be biased. Several hypotheses explaining this strand-biased gene distribution (SGD) have been proposed, but none have been tested or supported by sufficient data analyses. In this work we have analyzed 211 prokaryotic genomes in terms of compositional strand asymmetries and the presence or absence of polC and have found that SGD correlates not only with polC, but also with purine asymmetry (PAS). Furthermore, SGD, PAS, and polC are all features associated with a group of low-GC, gram-positive bacteria (Firmicutes). We conclude that PAS is a characteristic of organisms with a heterodimeric DNA polymerase III alpha-subunit constituted by polC and dnaE, which may play a direct role in the maintenance of SGD.
Collapse
Affiliation(s)
- Jianfei Hu
- College of Life Sciences, Peking University, Beijing 100871, China.
| | | | | |
Collapse
|
10
|
Paz A, Mester D, Nevo E, Korol A. Looking for organization patterns of highly expressed genes: purine-pyrimidine composition of precursor mRNAs. J Mol Evol 2007; 64:248-60. [PMID: 17211550 DOI: 10.1007/s00239-006-0135-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2006] [Accepted: 11/19/2006] [Indexed: 01/05/2023]
Abstract
We analyzed precursor messenger RNAs (pre-mRNAs) of 12 eukaryotic species. In each species, three groups of highly expressed genes, ribosomal proteins, heat shock proteins, and amino-acyl tRNA synthetases, were compared with a control group (randomly selected genes). The purine-pyrimidine (R-Y) composition of pre-mRNAs of the three targeted gene groups proved to differ significantly from the control. The exons of the three groups tested have higher purine contents and R-tract abundance and lower abundance of Y-tracts compared to the control (R-tract-tract of sequential purines with Rn>or=5; Y-tract-tract of sequential pyrimidines with Yn>or=5). In species widely employing "intron definition" in the splicing process, the Y content of introns of the three targeted groups appeared to be higher compared to the control group. Furthermore, in all examined species, the introns of the targeted genes have a lower abundance of R-tracts compared to the control. We hypothesized that the R-Y composition of the targeted gene groups contributes to high rate and efficiency of both splicing and translation, in addition to the mRNA coding role. This is presumably achieved by (1) reducing the possibility of the formation of secondary structures in the mRNA, (2) using the R-tracts and R-biased sequences as exonic splicing enhancers, (3) lowering the amount of targets for pyrimidine tract binding protein in the exons, and (4) reducing the amount of target sequences for binding of serine/arginine-rich (SR) proteins in the introns, thereby allowing SR proteins to bind to proper (exonic) targets.
Collapse
Affiliation(s)
- A Paz
- Institute of Evolution, Haifa University, Mount Carmel, Haifa, 31905, Israel
| | | | | | | |
Collapse
|
11
|
Forsdyke DR. Chargaff’s Cluster Rule. Evol Bioinform Online 2006. [DOI: 10.1007/978-0-387-33419-6_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
12
|
Mitchell D, Bridge R. A test of Chargaff's second rule. Biochem Biophys Res Commun 2005; 340:90-4. [PMID: 16364245 DOI: 10.1016/j.bbrc.2005.11.160] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2005] [Accepted: 11/22/2005] [Indexed: 10/25/2022]
Abstract
In 1968, Chargaff and his colleagues discovered a rule in Bacillus subtilis: in single stranded DNA, A=T and C=G. This rule has since been confirmed many times in other bacterial and eukaryotic genomes. To the best of our knowledge, this rule has not been tested before in either single stranded DNA or RNA genomes. Over 3400 genomic sequences were examined here and included for the first time both double and single stranded DNA and RNA genomes. We found that: (1) with the exception of the organellar DNA, this parity rule holds for all types of double stranded DNA genomes and (2) that this rule fails to hold for other types of genomes. The parity rule appears to be a selective force on genome evolution and codon use.
Collapse
Affiliation(s)
- David Mitchell
- Vice Deanery of Genetics and Microbiology, Trinity College, Dublin, Ireland.
| | | |
Collapse
|
13
|
Dudkiewicz M, Mackiewicz P, Mackiewicz D, Kowalczuk M, Nowicka A, Polak N, Smolarczyk K, Banaszak J, Dudek MR, Cebrat S. Higher mutation rate helps to rescue genes from the elimination by selection. Biosystems 2004; 80:193-9. [PMID: 15823418 DOI: 10.1016/j.biosystems.2004.11.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2004] [Revised: 06/17/2004] [Accepted: 11/23/2004] [Indexed: 11/26/2022]
Abstract
Directional mutation pressure associated with replication processes is the main cause of the asymmetry between the leading and lagging DNA strands in bacterial genomes. On the other hand, the asymmetry between sense and antisense strands of protein coding sequences is a result of both mutation and selection pressures. Thus, there are two different ways of superposition of the sense strand, on the leading or lagging strand. Besides many other implications of these two possible situations, one seems to be very important - because of the asymmetric replication-associated mutation pressure, the mutation rate of genes depends on their location. Using Monte Carlo methods, we have simulated, under experimentally determined directional mutation pressure, the divergence rate and the elimination rate of genes depending on their location in respect to the leading/lagging DNA strands in the asymmetric prokaryotic genome. We have found that the best survival strategy for the majority of genes is to sometimes switch between DNA strands. Paradoxically, this strategy results in higher substitution rates but remains in agreement with observations in bacterial genomes that such inversions are very frequent and divergence rate between homologs lying on different DNA strands is very high.
Collapse
Affiliation(s)
- Malgorzata Dudkiewicz
- Institute of Genetics and Microbiology, University of Wrocław, ul. Przybyszewskiego, Wrocław, Poland
| | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
DeMarchis L, Cropp C, Sheng ZM, Bargo S, Callahan R. Candidate target genes for loss of heterozygosity on human chromosome 17q21. Br J Cancer 2004; 90:2384-9. [PMID: 15187990 PMCID: PMC2409524 DOI: 10.1038/sj.bjc.6601848] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Loss of heterozygosity (LOH) on chromosome 17q21 has been detected in 30% of primary human breast tumours. The smallest common region deleted occurred in an interval between the D17S746 and D17S846 polymorphic sequences tagged sites that are located on two recombinant P1-bacteriophage clones of chromosome 17q21: 122F4 and 50H1, respectively. To identify the target gene for LOH, we defined a map of this chromosomal region. We found the following genes: JUP, FK506BP10, SC65, Gastrin (GAS) and HAP1. Of the genes that have been identified in this study, only JUP is located between D17S746 and D17S846. This was of interest since earlier studies have shown that JUP expression is altered in breast, lung and thyroid tumours as well as cell lines having LOH in chromosome 17q21. However, no mutations were detected in JUP using single-strand conformation polymorphism analysis of primary breast tumour DNAs having LOH at 17q21. We could find no evidence that the transcription promoter for JUP is methylated in tumour DNAs having LOH at 17q21. We suspect that the target gene for LOH in primary human breast tumours on chromosome 17q21 is either JUP and results in a haploinsufficiency for expression or may be an unidentified gene located in the interval between D17S846 and JUP.
Collapse
Affiliation(s)
- L DeMarchis
- Mammary Biology and Tumorigenesis Laboratory, National Cancer Institute, Bethesda, MD 20892, USA
| | - C Cropp
- Mammary Biology and Tumorigenesis Laboratory, National Cancer Institute, Bethesda, MD 20892, USA
| | - Z M Sheng
- Mammary Biology and Tumorigenesis Laboratory, National Cancer Institute, Bethesda, MD 20892, USA
| | - S Bargo
- Mammary Biology and Tumorigenesis Laboratory, National Cancer Institute, Bethesda, MD 20892, USA
| | - R Callahan
- Mammary Biology and Tumorigenesis Laboratory, National Cancer Institute, Bethesda, MD 20892, USA
- National Cancer Institute, Building 10/Room 5B50, Bethesda, MD 20892, USA. E-mail:
| |
Collapse
|
15
|
Lambros RJ, Mortimer JR, Forsdyke DR. Optimum growth temperature and the base composition of open reading frames in prokaryotes. Extremophiles 2003; 7:443-50. [PMID: 14666404 DOI: 10.1007/s00792-003-0353-4] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2003] [Accepted: 06/20/2003] [Indexed: 11/27/2022]
Abstract
The purine-loading index (PLI) is the difference between the numbers of purines (A+G) and pyrimidines (T+C) per kilobase of single-stranded nucleic acid. By purine-loading their mRNAs organisms may minimize unnecessary RNA-RNA interactions and prevent inadvertent formation of "self" double-stranded RNA. Since RNA-RNA interactions have a strong entropy-driven component, this need to minimize should increase as temperature increases. Consistent with this, we report for 550 prokaryotic species that optimum growth temperature is related to the average PLI of open reading frames. With increasing temperature prokaryotes tend to acquire base A and lose base C, while keeping bases T and G relatively constant. Accordingly, while the PLI increases, the (G+C)% decreases. The previously observed positive correlation between (G+C)% and optimum growth temperature, which applies to RNA species whose structure is of major importance for their function (ribosomal and transfer RNAs) does not apply to mRNAs, and hence is unlikely to apply generally to genomic DNA.
Collapse
Affiliation(s)
- R J Lambros
- Department of Biochemistry, Queen's University, Kingston, Ontario K7L3N6, Canada
| | | | | |
Collapse
|
16
|
Maeda M. The conserved residues of the ligand-binding domains of steroid receptors are located in the core of the molecules. J Mol Graph Model 2002; 19:543-51, 601-6. [PMID: 11552682 DOI: 10.1016/s1093-3263(01)00087-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The relationship between conserved residues and biochemical functions of steroid receptors was investigated. Pairwise three-dimensional (3D) alignment of the ligand-binding domains of the human estrogen (1A52) and progesterone (1A28) receptors revealed two conserved domains; Asn313-Ser456 and Gln471-Lys531 (numbering reflects the sequence in the human estrogen receptor). Alignment of the protein sequences of 39 steroid receptors revealed 36 highly conserved residues (i.e., the residues commonly found in more than 80% of sequences aligned). They were distributed throughout the sequences but formed a contiguous 3D structure. Most of these highly conserved residues were buried in the ligand-binding domain, but several residues were exposed on the surface. The well-known functions commonly associated with the ligand-binding domain of steroid receptors are ligand binding, HSP90 binding, transcriptional activation and dimerization. The relationship between the residues and these functions were checked. To determine the residues involved in dimerization, the differences between the solvent accessibilities of the monomeric and dimeric forms were calculated. These results revealed 32 residues of 1A52 and 15 residues of 1A28 potentially involved in dimerization. Their distribution areas do not overlap greatly. Comparing these putative dimerization sites with highly conserved residues, many of the exposed conserved residues were observed on the side of the domain opposite are the dimerization sites. Some highly conserved residues are located in a steroid-binding site and in transcriptional activation domain. However, few of them were observed in the HSP90 binding site. These results indicate that the core structure made by most of the highly conserved residues among the ligand-binding domains of steroid receptors is important. These conserved residues may be essential for conformational change in the ligand-binding domain from its inactive to active form.
Collapse
Affiliation(s)
- M Maeda
- Biochemistry Department, National Institute of Agrobiological Sciences, Kannondai 2-1-2, Tsukuba, Ibaraki 305-8602, Japan.
| |
Collapse
|
17
|
Roten CAH, Gamba P, Barblan JL, Karamata D. Comparative Genometrics (CG): a database dedicated to biometric comparisons of whole genomes. Nucleic Acids Res 2002; 30:142-4. [PMID: 11752276 PMCID: PMC99136 DOI: 10.1093/nar/30.1.142] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The ever increasing rate at which whole genome sequences are becoming accessible to the scientific community has created an urgent need for tools enabling comparison of chromosomes of different species. We have applied biometric methods to available chromosome sequences and posted the results on our Comparative Genometrics (CG) web site. By genometrics, a term coined by Elston and Wilson [GENET: Epidemiol. (1990), 7, 17-19], we understand a biometric analysis of chromosomes. During the initial phase, our web site displays, for all completely sequenced prokaryotic genomes, three genometric analyses: the DNA walk [Lobry (1999) Microbiology Today, 26, 164-165] and two complementary representations, i.e. the cumulative GC- and TA-skew analyses, capable of identifying, at the level of whole genomes, features inherent to chromosome organization and functioning. It appears that the latter features are taxon-specific. Although primarily focused on prokaryotic chromosomes, the CG web site contains genometric information on paradigm plasmids, phages, viruses and eukaryotic organelles. Relevant data and methods can be readily used by the scientific community for further analyses as well as for tutorial purposes. Our data posted at the CG web site are freely available on the World Wide Web at http://www.unil.ch/comparativegenometrics.
Collapse
Affiliation(s)
- Claude-Alain H Roten
- Institut de Génétique et de Biologie Microbiennes, rue César-Roux 19, CH-1005 Lausanne, Switzerland.
| | | | | | | |
Collapse
|
18
|
Abstract
We calculated nucleotide distribution curves along the DNA molecules of the human chromosomes 21 and 22, their correlations in more than 10,000 equidistant positions, and subjected the correlations to cluster analysis. The cluster analysis demonstrated that both DNA molecules were composed of two types of segments exhibiting qualitatively different correlations. The segments differed most in the correlation of the distribution curves of cytosine and guanine, which was very high in type I segments but weak in type II segments. The type I and II segments also significantly differed in the correlations of the distribution curves of adenine with thymine. In addition, adenine strongly anticorrelated with cytosine but this anticorrelation was uniform along both chromosomes and, therefore, it did not contribute to the distinction of the two types of segments. The segments were up to 100 kbp long but they had nothing in common with isochores. Building blocks of the mosaic structure of the DNA molecules of the human chromosomes 21 and 22 are very similar but different in several interesting aspects from those of E. coli.
Collapse
Affiliation(s)
- D Häring
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno
| | | |
Collapse
|
19
|
Cristillo AD, Mortimer JR, Barrette IH, Lillicrap TP, Forsdyke DR. Double-stranded RNA as a not-self alarm signal: to evade, most viruses purine-load their RNAs, but some (HTLV-1, Epstein-Barr) pyrimidine-load. J Theor Biol 2001; 208:475-91. [PMID: 11222051 DOI: 10.1006/jtbi.2000.2233] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
For double-stranded RNA (dsRNA) to signal the presence of foreign (non-self) nucleic acid, self-RNA-self-RNA interactions should be minimized. Indeed, self-RNAs appear to have been fine-tuned over evolutionary time by the introduction of purines in clusters in the loop regions of stem-loop structures. This adaptation should militate against the "kissing" interactions which initiate formation of dsRNA. Our analyses of virus base compositions suggest that, to avoid triggering the host cell's dsRNA surveillance mechanism, most viruses purine-load their RNAs to resemble host RNAs ("stealth" strategy). However, some GC-rich latent viruses (HTLV-1, EBV) pyrimidine-load their RNAs. It is suggested that when virus production begins, these RNAs suddenly increase in concentration and impair host mRNA function by virtue of an excess of complementary "kissing" interactions ("surprise" strategy). Remarkably, the only mRNA expressed in the most fundamental form of EBV latency (the "EBNA-1 program") is purine-loaded. This apparent stealth strategy is reinforced by a simple sequence repeat which prefers purine-rich codons. During latent infection the EBNA-1 protein may evade recognition by cytotoxic T-cells, not by virtue of containing a simple sequence amino acid repeat as has been proposed, but by virtue of the encoding mRNA being purine-loaded to prevent interactions with host RNAs of either genic or non-genic origin.
Collapse
Affiliation(s)
- A D Cristillo
- Department of Biochemistry, Queen's University, Kingston, Ontario, K7L3N6, Canada
| | | | | | | | | |
Collapse
|
20
|
Abstract
Of Chargaff's four rules on DNA base composition, only his first parity rule was incorporated into mainstream biology as the DNA double helix. Now, the cluster rule, the second parity rule, and the GC rule, reveal the multiple levels of information in our genomes and potential conflicts between them. In these terms we can understand how double-stranded RNA became an intracellular alarm signal, how potentially recombining nucleic acids can distinguish between 'self' and 'not-self' so leading to the origin of species, how isochores evolved to facilitate gene duplication, and how unlikely it is that any mutation can ever remain truly neutral.
Collapse
Affiliation(s)
- D R Forsdyke
- Department of Biochemistry, Queen's University, Kingston, Ontario K7L3N6, Canada.
| | | |
Collapse
|
21
|
Abstract
Certain mutations are known to occur with differing frequencies on the leading and lagging strands of DNA. The extent to which these mutational biases affect the sequences of higher eukaryotes has been difficult to ascertain because the positions of most replication origins are not known, making it impossible to distinguish between the leading and lagging strands. To resolve whether strand biases influence the evolution of primate sequences, we compared the substitution patterns in noncoding regions adjacent to an origin of replication identified within the beta-globin complex. Although there was limited asymmetry around the beta-globin origin of replication, patterns of substitutions do not support the existence of a mutational bias between the leading and lagging strands of chromosomal DNA replication in primates.
Collapse
|
22
|
Lao PJ, Forsdyke DR. Thermophilic bacteria strictly obey Szybalski's transcription direction rule and politely purine-load RNAs with both adenine and guanine. Genome Res 2000; 10:228-36. [PMID: 10673280 PMCID: PMC310832 DOI: 10.1101/gr.10.2.228] [Citation(s) in RCA: 83] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/1999] [Accepted: 12/16/1999] [Indexed: 11/24/2022]
Abstract
When transcription is to the right of the promoter, the "top," mRNA-synonymous strand of DNA tends to be purine-rich. When transcription is to the left of the promoter, the top, mRNA-template strand tends to be pyrimidine-rich. This transcription-direction rule suggests that there has been an evolutionary selection pressure for the purine-loading of RNAs. The politeness hypothesis states that purine-loading prevents distracting RNA-RNA interactions and excessive formation of double-stranded RNA, which might trigger various intracellular alarms. Because RNA-RNA interactions have a distinct entropy-driven component, the pressure for the evolution of purine-loading might be greater in organisms living at high temperatures. In support of this, we find that Chargaff differences (a measure of purine-loading) are greater in thermophiles than in nonthermophiles and extend to both purine bases. In thermophiles the pressure to purine-load affects codon choice, indicating that some features of their amino acid composition (e.g., high levels of glutamic acid) might reflect purine-loading pressure (i.e., constraints on mRNA) rather than direct constraints on protein structure and function.
Collapse
Affiliation(s)
- P J Lao
- Department of Biochemistry, Queen's University, Kingston, Ontario, K7L 3N6, Canada
| | | |
Collapse
|
23
|
Machida M, Yamazaki S, Kunihiro S, Tanaka T, Kushida N, Jinnno K, Haikawa Y, Yamazaki J, Yamamoto S, Sekine M, Oguchi A, Nagai Y, Sakai M, Aoki K, Ogura K, Kudoh Y, Kikuchi H, Zhang MQ, Yanagida M. A 38 kb segment containing the cdc2 gene from the left arm of fission yeast chromosome II: sequence analysis and characterization of the genomic DNA and cDNAs encoded on the segment. Yeast 2000; 16:71-80. [PMID: 10620777 DOI: 10.1002/(sici)1097-0061(20000115)16:1<71::aid-yea505>3.0.co;2-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
A genomic 38 kbp segment on the c1750 cosmid clone containing the cdc2 gene, located in the left arm of chromosome II from Schizosaccharomyces pombe, was sequenced. The segment was found to have five previously known genes, pht1, cdc2, his3, act1 and mei4. Among 11 coding sequences (CDSs) predicted by the gene finding software INTRON.PLOT., four CDSs, pi007, pi010, pi014 and pi016, had considerable similarity to 40S ribosomal protein, glycosyltransferase, cdc2-related protein kinase and alpha-1, 2-mannosyltransferase, respectively. Another unusually huge open reading frame (ORF) (pi011), consisting of 2233 amino acids, existed, having significant homology to alpha-amylase, granule-bound glycogen synthase and the Sz. pombe YS 1110 clone product at the N-terminal, middle and C-terminal regions, respectively. All the predicted 11 CDSs were experimentally analysed by RACE PCR. The sequencing of the RACE products revealed that there were two small overlaps at the 3' untranslated regions (UTRs) between pi004 and pi005 (17 bp) and between pi007 and pi008 (2 bp). The distances between 5' end of the 5'UTR and the putative translation initiation codon varied from 10 to 302 nucleotides (nt) among the nine CDSs successfully analysed by 5'-RACE. The expression level of each CDS on this clone was determined. Among the 16 genes on this clone, the previously determined genes, pht1, cdc2, his3 and act1, were found to be most highly expressed. Finally, cDNAs of all the newly identified genes were detected by RACE, proving the actual expression of these genes. The nucleotide sequence has been submitted to the EMBL database under Accession No. AB004534.
Collapse
Affiliation(s)
- M Machida
- Molecular Biology Department, National Institute of Bioscience and Human Technology, Higashi 1-1, Tsukuba, Ibaraki 305-8566, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Imatani A, Callahan R. Identification of a novel NOTCH-4/INT-3 RNA species encoding an activated gene product in certain human tumor cell lines. Oncogene 2000; 19:223-31. [PMID: 10645000 DOI: 10.1038/sj.onc.1203295] [Citation(s) in RCA: 79] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Ectopic expression of the intracellular domain of NOTCH-4/INT-3 leads to tumorigenesis in the mouse mammary gland. This results from a gain-of-function mutation. To evaluate gain-of-function NOTCH-4/INT-3 activity in human cancers we have surveyed human breast, lung, and colon carcinoma tissue culture cell lines for evidence of increased NOTCH-4/INT-3 RNA expression. High levels of a 1.8 Kb NOTCH-4/INT-3 RNA species are detected in normal human testis but not in other tissues where a 6.5 Kb species is prevalent. Transformed human cancer cell lines express the 1.8 Kb NOTCH-4/INT-3 RNA species. We show that this RNA species encodes a truncated form of the NOTCH-4/INT-3 intracellular domain (ICD). This novel NOTCH-4/INT-3 protein includes the CDC10 repeats and amino acid residues C-terminal to them, but is missing the CBF-1 binding region of the NOTCH-4/INT-3 ICD. This suggests that it has a different mode of action. Furthermore, we show that a transgene which expresses the 1.8 Kb NOTCH-4/INT-3 RNA species in the 'normal' human mammary epithelial cell line MCF-10A enables these cells to grow in soft agar.
Collapse
Affiliation(s)
- A Imatani
- Laboratory of Tumor Immunology, National Cancer Institute, NIH, Bethesda, Maryland, MD 20892, USA
| | | |
Collapse
|
25
|
Häring D, Kypr J. Correlations and anticorrelations among nucleotide distributions along the genomes of various organisms. J Biomol Struct Dyn 1999; 17:267-73. [PMID: 10563576 DOI: 10.1080/07391102.1999.10508359] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
We have analyzed correlations of nucleotide distributions along more than 50 megabases of the longest sequenced parts of the human, mouse, Drosophila, Arabidopsis, yeast, E.coli and three kinds of viral genomes. The strongest correlations were observed between the distributions of C and G, in particular in the genome of Drosophila. This correlation was much weaker, though still strong, in the human genome and E.coli that exhibited the same level of this correlation. The C/G correlation hardly originates from the isochores because the isochores were not reported to occur in the genomes of Drosophila and E. coil. The genomic distribution curves of adenine and thymine were also positively correlated in all analyzed organisms except for the yeast where they were anticorrelated. Still stronger anticorrelations were, however, observed between the genomic distributions of A and C and between G and T. These genomic distributions anticorrelated almost generally and very strong. These anticorrelations are likely to originate from point mutations resulting from unrepaired GA mispairing as a replication intermediate. The C/A or G/T anticorrelation or compensation is a very strong and general new phenomenon that shapes the genomic nucleotide sequences.
Collapse
Affiliation(s)
- D Häring
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno
| | | |
Collapse
|
26
|
Li W. Statistical properties of open reading frames in complete genome sequences. COMPUTERS & CHEMISTRY 1999; 23:283-301. [PMID: 10404621 DOI: 10.1016/s0097-8485(99)00014-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Some statistical properties of open reading frames in all currently available complete genome sequences are analyzed (seventeen prokatyotic genomes, and 16 chromosome sequences from the yeast genome). The size distribution of open reading frames is characterized by various techniques, such as quantile tables, QQ-plots, rank-size plots (Zipf's plots), and spatial densities. The issue of the influence of CG% on the size distribution is addressed. When yeast chromosomes are compared with archaeal and eubacterial genomes, they tend to have more long open reading frames. There is little or no evidence to reject the null hypothesis that open reading frames on six different reading frames and two strands distribute similarly. A topic of current interest, the base composition asymmetry in open reading frames between the two strands, is studied using regression analysis. The base composition asymmetry at three codon positions is analyzed separately. It was shown in these genome sequences that the first codon position is G- and A-rich (i.e. purine-rich); there is a co-existence of A- and T-rich branches at the second codon position; and the third codon position is weakly T-rich.
Collapse
Affiliation(s)
- W Li
- Laboratory of Statistical Genetics, Rockefeller University, New York, NY 10021, USA.
| |
Collapse
|
27
|
Miyazaki S, Rasmussen S, Imatani A, Diella F, Sullivan DT, Callahan R. Characterization of the Drosophila ortholog of mouse eIF-3p48/INT-6. Gene 1999; 233:241-7. [PMID: 10375641 DOI: 10.1016/s0378-1119(99)00130-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The mouse mammary tumor virus (MMTV) has been shown to integrate frequently into INT-6 in MMTV-induced mouse mammary tumors. The INT6 gene has been highly conserved through evolution and has recently been shown to encode the p48 component of the eucaryotic translation initiation factor 3 (eIF-3) complex. We report here the isolation of the Drosophila eIF-3p48/INT-6. The gene comprises three exons within 1.8kb of genomic DNA located at cytogenetic position 73C2 in the Drosophila genome. The 1.5kb eIF-3p48/INT-6 RNA species encodes a protein composed of 364 amino-acid residues whose sequence is 71% similar to that of the mouse/human eIF-3/INT-6 amino-acid sequence. eIF-3p48/INT-6 RNA is expressed throughout development in Drosophila and the encoded protein is associated with the microsomal subcellular fraction.
Collapse
Affiliation(s)
- S Miyazaki
- Laboratory of Tumor Immunology and Biology, National Cancer Institute, Bethesda, MD 20892, USA
| | | | | | | | | | | |
Collapse
|
28
|
Abstract
Analysis of 22 complete sequences of double-stranded DNA viruses reveals striking compositional asymmetries between leading and lagging, and between transcribed and non-transcribed strands. In all bi-directionally replicated genomes analyzed, the observed leading strand GC skew (measuring relative excess of guanines versus cytosines) is different from that in the lagging strand. In most of these genomes GC skew switches polarity close to replication origins. GC skew changes linearly across adenovirus linear genomes, which replicate from one end. In papillomavirus, GC skew is positive in one half of the genome where transcription and replication proceed in the same direction, and is close to zero in the other half with divergent transcription and replication. Possible contributions of these two processes (and associated repair mechanisms) as well as other potential sources of strand bias in the observed asymmetries are discussed. Use of cumulative skew plots for genome comparisons is demonstrated on the example of herpes simplex virus.
Collapse
Affiliation(s)
- A Grigoriev
- Max-Planck-Institute for Molecular Genetics, Berlin, Germany.
| |
Collapse
|
29
|
Bell SJ, Chow YC, Ho JY, Forsdyke DR. Correlation of chi orientation with transcription indicates a fundamental relationship between recombination and transcription. Gene X 1998; 216:285-92. [PMID: 9729432 DOI: 10.1016/s0378-1119(98)00333-3] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Cross-over hot-spot instigator (Chi) sequences (5'-GCTGGTGG-3') are abundant, strand-specific, sequences, which locally increase recombination in Escherichia coli. Located within G-rich 'recombination islands', Chi orientations correlate with the orientations both of DNA replication and of transcription. Consistent with evidence from eukaryotic systems for a fundamental relationship between recombination and transcription, we find for E. coli Chi sequences, and for Haemophilus influenzae Chi-like sequences, that orientations correlate better with transcription than with replication. Complying with Szybalski's transcription direction rule, open reading frames in these prokaryotes have purine-rich mRNA-synonymous DNA strands. Hence, the G-richness of 'recombination islands' may reflect their correspondence with 'transcriptional islands' (genes). Comparison of a natural with the corresponding shuffled sequence, indicates a base order-dependent island unit of approx. 1kb. 1998 Elsevier Science B.V.
Collapse
Affiliation(s)
- S J Bell
- Department of Biochemistry, Queen's University, Kingston, Ontario, K7L3N6, Canada
| | | | | | | |
Collapse
|
30
|
Dang KD, Dutt PB, Forsdyke DR. Chargaff difference analysis of the bithorax complex of Drosophila melanogaster. Biochem Cell Biol 1998; 76:129-37. [PMID: 9666315 DOI: 10.1139/o97-095] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Much of the fruit fly genome is compact ("Escherichia coli mode"), indicating a genome-wide selection pressure against DNA with little adaptive function. However, in the bithorax complex (BX-C) homeodomain genes are widely dispersed with large introns ("mammalian mode"). Chargaff difference analysis of compact bacterial and viral genomes has shown that most mRNAs have the potential to form stem-loop structures with purine-rich loops. Thus, for many taxa if transcription is to the right, the top (mRNA synonymous) DNA strand has purine-rich loop potential, and if transcription is to the left, the top (template) strand has pyrimidine-rich loop potential. The best indicator bases for transcription direction are A and T for AT-rich genomes, and C and G for CG-rich genomes. Consistent with this, Chargaff difference analysis of BX-C genes and several non-BX-C genes shows that, whatever the mode, mRNAs have the potential to form stem-loop structures with A-rich loops. We confirm that many potential open reading frames in the BX-C are unlikely to be functional. Conversely, we suggest that a few unassigned open reading frames may actually be functional. Since apparent organization in the mammalian mode cannot be explained in terms of unacknowledged open reading frames, yet the fruit fly genome is under pressure to be compact, it is likely that many BX-C functions do not involve the encoding of proteins.
Collapse
Affiliation(s)
- K D Dang
- Department of Biochemistry, Queen's University, Kingston, ON, Canada
| | | | | |
Collapse
|
31
|
Bucklin A, Sundt RC, Dahle G. The population genetics ofCalanus finmarchicusin the North Atlantic. ACTA ACUST UNITED AC 1996. [DOI: 10.1080/00785326.1995.10429837] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
32
|
Milner CM, Campbell RD. The G9a gene in the human major histocompatibility complex encodes a novel protein containing ankyrin-like repeats. Biochem J 1993; 290 ( Pt 3):811-8. [PMID: 8457211 PMCID: PMC1132354 DOI: 10.1042/bj2900811] [Citation(s) in RCA: 56] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
The class III region of the human major histocompatibility complex spans approx. 1.1 Mbp on the short arm of chromosome 6 and is known to contain at least 36 genes. The complete nucleotide sequence of a 3.4 kb mRNA from one of these genes, G9a (or BAT8), has been determined from cDNA and genomic DNA clones. The single-copy G9a gene encodes a protein product of 1001 amino acids with a predicted molecular mass of 111,518 Da. The C-terminal region (residues 730-999) of the G9a protein has been expressed in Escherichia coli as a fusion protein with the 26 kDa glutathione S-transferase of Schistosoma japonicum (Sj26). The fusion protein has been used to raise antisera which, in Western-blot analysis, cross-react specifically with an intracellular protein of approx. 98 kDa. The function of the G9a protein is unknown. However, comparison of the derived amino acid sequence of G9a with the protein databases has revealed interesting similarities with a number of other proteins. The C-terminal region of G9a is 35% identical with a 149 amino acid segment of the Drosophila trithorax protein. In addition the G9a protein has been shown to contain six contiguous copies of a 33-amino acid repeat. This repeat, originally identified in the Notch protein of Drosophila and known as the cdc10/SW16 or ANK repeat, is also found in a number of other human proteins and may be involved in intracellular protein-protein interactions.
Collapse
Affiliation(s)
- C M Milner
- MRC Immunochemistry Unit, Department of Biochemistry, Oxford, U.K
| | | |
Collapse
|
33
|
Hardison R, Krane D, Vandenbergh D, Cheng JF, Mansberger J, Taddie J, Schwartz S, Huang XQ, Miller W. Sequence and comparative analysis of the rabbit alpha-like globin gene cluster reveals a rapid mode of evolution in a G + C-rich region of mammalian genomes. J Mol Biol 1991; 222:233-49. [PMID: 1960725 DOI: 10.1016/0022-2836(91)90209-o] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
A sequence of 10,621 base-pairs from the alpha-like globin gene cluster of rabbit has been determined. It includes the sequence of gene zeta 1 (a pseudogene for the rabbit embryonic zeta-globin), the functional rabbit alpha-globin gene, and the theta 1 pseudogene, along with the sequences of eight C repeats (short interspersed repeats in rabbit) and a J sequence implicated in recombination. The region is quite G + C-rich (62%) and contains two CpG islands. As expected for a very G + C-rich region, it has an abundance of open reading frames, but few of the long open reading frames are associated with the coding regions of genes. Alignments between the sequences of the rabbit and human alpha-like globin gene clusters reveal matches primarily in the immediate vicinity of genes and CpG islands, while the intergenic regions of these gene clusters have many fewer matches than are seen between the beta-like globin gene clusters of these two species. Furthermore, the non-coding sequences in this portion of the rabbit alpha-like globin gene cluster are shorter than in human, indicating a strong tendency either for sequence contraction in the rabbit gene cluster or for expansion in the human gene cluster. Thus, the intergenic regions of the alpha-like globin gene clusters have evolved in a relatively fast mode since the mammalian radiation, but not exclusively by nucleotide substitution. Despite this rapid mode of evolution, some strong matches are found 5' to the start sites of the human and rabbit alpha genes, perhaps indicating conservation of a regulatory element. The rabbit J sequence is over 1000 base-pairs long; it contains a C repeat at its 5' end and an internal region of homology to the 3'-untranslated region of the alpha-globin gene. Part of the rabbit J sequence matches with sequences within the X homology block in human. Both of these regions have been implicated as hot-spots for recombination, hence the matching sequences are good candidates for such a function. All the interspersed repeats within both gene clusters are retroposon SINEs that appear to have inserted independently in the rabbit and human lineages.
Collapse
Affiliation(s)
- R Hardison
- Department of Molecular and Cell Biology, Pennsylvania State University, University Park 16802
| | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Li Q, Zhou B, Powers P, Enver T, Stamatoyannopoulos G. Primary structure of the goat beta-globin locus control region. Genomics 1991; 9:488-99. [PMID: 2032720 DOI: 10.1016/0888-7543(91)90415-b] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The goat beta-globin cluster is composed of a triplicated four-gene set. A locus control region (LCR) containing elements homologous to 5'DNase I hypersensitive sites (HS) 1, 2, and 3 of the human beta-globin LCR has been identified at the 5' end of this locus. We determined 10.2 kb of nucleotide sequence from the goat beta-globin locus control region. Self-comparison of this sequence by dot matrix analysis revealed the presence of six complete and three incomplete artiodactyl repeats. A novel repeated element, termed D repeat, was also identified. Southern blotting analysis demonstrated that these elements exist in the goat genome as a low to medium frequency interspersed repeat family. The absence of any other large region of self-homology (direct or inverted) in the goat LCR suggests that 5'HSs 1, 2, and 3 did not arise through duplication, but rather evolved independently. By comparing goat 5'HS 1 to those of human, rabbit, and mouse, we show a greater than 80% conservation in sequence between the four species. This level of evolutionary conservation suggests that 5'HS 1 plays an important role in the regulation of beta-globin loci.
Collapse
Affiliation(s)
- Q Li
- Shanghai Institute of Biochemistry, Chinese Academy of Sciences
| | | | | | | | | |
Collapse
|
35
|
Behe MJ, Beasty AM. Co-polymer tracts in eukaryotic, prokaryotic, and organellar DNA. DNA SEQUENCE : THE JOURNAL OF DNA SEQUENCING AND MAPPING 1991; 1:291-302. [PMID: 1799681 DOI: 10.3109/10425179109020785] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Large variations in DNA base composition and noticeable strand asymmetries are known to occur between different organisms and within different regions of the genomes of single organisms. Apparently such composition and sequence biases occur to fulfill structural rather than informational requirements. Here we report the wide occurrence of a more subtle biasing of DNA sequence that can have structural consequences: an increase or a suppression of the number of long tracts of two-base co-polymers. Strong biases were observed when the DNA sequences of the longest eukaryotic, prokaryotic, and organellar entries in the GenBank data base (totaling 773 kilobases) were analyzed for the number of occurrences of tracts of the two-base co-polymers (A,T)n, (G,C)n, and (A,C)n as a function of tract length. (The expression (A,T)n is used here to denote an uninterrupted tract, n nucleotides in length, of A and T bases in any proportion or order, terminated at each end by a G or C residue.) Characteristic differences are also observed in tract biases of eukaryotic vs. prokaryotic organisms.
Collapse
Affiliation(s)
- M J Behe
- Department of Chemistry, Lehigh University, Bethlehem, Pennsylvania 18015
| | | |
Collapse
|
36
|
Evolution of DNA Sequence Contributions of Mutational Bias and Selection to the Origin of Chromosomal Compartments. ACTA ACUST UNITED AC 1990. [DOI: 10.1007/978-3-642-75599-6_1] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
37
|
den Dunnen JT, van Neck JW, Cremers FP, Lubsen NH, Schoenmakers JG. Nucleotide sequence of the rat gamma-crystallin gene region and comparison with an orthologous human region. Gene X 1989; 78:201-13. [PMID: 2777080 DOI: 10.1016/0378-1119(89)90223-0] [Citation(s) in RCA: 28] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
The sequences of a 51-kb region containing the cluster of five rat gamma-crystallin-coding genes (CRYG) and of a 7-kb region surrounding the sixth rat CRYG gene were determined. Approximately 78% of the total sequence represents intergenic DNA. We also sequenced 22 kb of DNA from the human CRYG gene cluster. All CRYG genes are associated with CpG-rich regions. The sequence similarity between the human and rat gene regions drops sharply (to 65%) in intronic and 3'-flanking regions but decreases only gradually in the 5'-flanking region. Highly conserved regions (greater than 80%) are found as far upstream as 1.5 kb. Overall intergenic distances are conserved. The human region contains much more repetitive DNA (24% vs. 10%) but less simple-sequence (sps) DNA (0.7% vs. 4%) than the rat region. Almost all repeats and spsDNA elements are located in the intergenic region. The location of repetitive and spsDNA differs between the orthologous regions and these elements were probably inserted after the evolutionary separation of rat and man. The Alu repeats in man and the B3 repeats in the rat are close copies of their respective consensus sequences and bordered by virtually perfect repeats. In contrast, the B1 and B2 repeats in the rat have diverged considerably from the consensus sequence and the surrounding direct repeats are usually imperfect. Thus the dispersion of the B1 and B2 repeats in the rat probably preceded that of the B3 repeats. Within the rat genomic region the spacing of Z-DNA elements is surprisingly regular, they are located about 12 kb apart. A search for putative matrix-associated regions suggests that the rat CRYG gene cluster is organized into two chromosomal domains.
Collapse
Affiliation(s)
- J T den Dunnen
- Department of Molecular Biology, University of Nijmegen, The Netherlands
| | | | | | | | | |
Collapse
|
38
|
Abstract
In the traditional view of molecular evolution, the rate of point mutation is uniform over the genome of an organism and variation in the rate of nucleotide substitution among DNA regions reflects differential selective constraints. Here we provide evidence for significant variation in mutation rate among regions in the mammalian genome. We show first that substitutions at silent (degenerate) sites in protein-coding genes in mammals seem to be effectively neutral (or nearly so) as they do not occur significantly less frequently than substitutions in pseudogenes. We then show that the rate of silent substitution varies among genes and is correlated with the base composition of genes and their flanking DNA. This implies that the variation in both silent substitution rate and base composition can be attributed to systematic differences in the rate and pattern of mutation over regions of the genome. We propose that the differences arise because mutation patterns vary with the timing of replication of different chromosomal regions in the germline. This hypothesis can account for both the origin of isochores in mammalian genomes and the observation that silent nucleotide substitutions in different mammalian genes do not have the same molecular clock.
Collapse
Affiliation(s)
- K H Wolfe
- Department of Genetics, Trinity College, Dublin, Ireland
| | | | | |
Collapse
|
39
|
Margot JB, Demers GW, Hardison RC. Complete nucleotide sequence of the rabbit beta-like globin gene cluster. Analysis of intergenic sequences and comparison with the human beta-like globin gene cluster. J Mol Biol 1989; 205:15-40. [PMID: 2486295 DOI: 10.1016/0022-2836(89)90362-8] [Citation(s) in RCA: 63] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
The nucleotide sequence of the entire beta-like globin gene cluster of rabbits has been determined. This sequence of a continuous stretch of 44.5 x 10(3) base-pairs (bp) starts about 6 x 10(3) bp upstream from epsilon (the 5'-most gene) and ends about 12 x 10(3) bp downstream from beta (the 3'-most gene). Analysis of the sequence reveals that: (1) the sequence is relatively A + T rich (about 60%); (2) regions with high G + C content are associated with OcC repeats, a short interspersed repeated DNA in rabbits; (3) the distribution of polypurines, polypyrimidines and alternating purine/pyrimidine tracts is not random within the cluster; (4) most open reading frames are associated with known globin coding regions, OcC repeats or long interspersed repeats (L1 repeats); (5) the most prominent open reading frames are found in the L1 repeats; (6) different strand asymmetries in base composition are associated with embyronic and adult genes as well as the tandem L1 repeats at the 3' end of the cluster; and (7) essentially all the repeats appear to have been inserted by a transposon mechanism. A comparison of the sequence with itself by a dot-plot analysis has revealed nine new members of the OcC family of repeats in addition to the six previously reported. The OcC repeats tend to be clustered, particularly in the epsilon-gamma and gamma-psi delta intergenic regions. Dot-plot comparisons between the rabbit and the human clusters have revealed extensive sequence matches. Homology starts about 6 x 10(3) bp 5' to epsilon or as far upstream as the rabbit sequence is available. It continues throughout the entire cluster and stops about 0.7 x 10(3) bp 3' to beta, at which point several repeats have inserted in both rabbits and humans. Throughout the gene cluster, the homology is interrupted mainly by insertions or deletions in either the rabbit or the human genome. Almost all of the insertions are of known short or long repeated DNAs. The positions of the insertions are different in the two gene clusters, which indicates that both short and long repeats have been transposing throughout the genome for the time since the mammalian radiation. An alignment of rabbit and human sequences allows the calculation of the substitution rate around epsilon. Sequences far removed from the gene are evolving at a rate equivalent to the pseudogene rate, although some short regions show an apparently higher rate.(ABSTRACT TRUNCATED AT 400 WORDS)
Collapse
Affiliation(s)
- J B Margot
- Department of Molecular and Cell Biology, Paul M. Althouse Laboratory, Pennsylvania State University, University Park 16802
| | | | | |
Collapse
|
40
|
Li Q, Powers PA, Smithies O. Nucleotide sequence of 16-kilobase pairs of DNA 5' to the human epsilon-globin gene. J Biol Chem 1985. [DOI: 10.1016/s0021-9258(18)95678-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
|
41
|
Srivastava R, Duceman BW, Biro PA, Sood AK, Weissman SM. Molecular organization of the class I genes of human major histocompatibility complex. Immunol Rev 1985; 84:93-121. [PMID: 3899913 DOI: 10.1111/j.1600-065x.1985.tb01127.x] [Citation(s) in RCA: 54] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
In this brief review, our main emphasis has been on the analysis of the sequence diversity among various class I genes and their functional implications. The availability of complete nucleotide sequences of 7 different genes representing different loci allowed us to derive a consensus sequence. One mouse MHC Class I gene was included in these comparisons as a representative of H2 genes Evolutionary patterns can be seen on the basis of divergence of various genes from the derived consensus sequence. At least 1 human gene which has a promoter similar to that of H2 genes and which contains a single initiation codon following this promoter (unlike all other human genes and like all the H2 genes) has been identified. Both variable and homology regions can be identified in the entire length of the gene. While exons show relatively strong conservation of sequences, the introns have many variable regions, introns 6 and 7 being the most heterogeneous. Stretches of conserved nucleotide sequences are noticed at the 3' regions of most introns. Estimation of total number of class I genes is presented on the basis of cloning experiments, and the abundance of 1 particular pseudogene is discussed.
Collapse
|
42
|
Abstract
Because the genetic code is redundant for most amino acids, different codons can be used in a given position without altering the structure of the protein for which the gene codes. This flexibility permits information encoding structural, and therefore functional, properties of RNA and DNA to be transmitted simultaneously by a protein-coding sequence of DNA. Among the other messages that might be transmitted, it is proposed, is one modulating the evolution of the DNA itself.
Collapse
|
43
|
Perrin P. Coding strategy differences between constant and variable segments of immunoglobulin genes. Nucleic Acids Res 1984; 12:5515-27. [PMID: 6462913 PMCID: PMC318936 DOI: 10.1093/nar/12.13.5515] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Vertebrate immunoglobulin (Ig) mRNAs reveal intraspecies variation in codon usage distinct from that seen with yeast or bacterial genes. Comparison of all available Ig gene sequences shows that %(G + C) in codon position III is consistently lower in variable (V) segments than in constant (C) segments. I find an even lower %(G + C) in the hypervariable domains of V segments. This analysis suggests that base substitution in Ig genes correlates positively with local A + T content.
Collapse
|
44
|
Ruppert S, Scherer G, Schütz G. Recent gene conversion involving bovine vasopressin and oxytocin precursor genes suggested by nucleotide sequence. Nature 1984; 308:554-7. [PMID: 6709064 DOI: 10.1038/308554a0] [Citation(s) in RCA: 159] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
The nonapeptide hormones arginine vasopressin (AVP) and oxytocin (OT) are synthesized in the hypothalamus together with their carrier proteins, the neurophysins, as common polypeptide precursors. The organization of these precursors has been established by sequence determination of cloned bovine cDNAs encoding prepro-arginine vasopressin-neurophysin II (prepro-AVP-NPII) and prepro-oxytocin-neurophysin I (prepro-OT-NPI). When the mRNA sequences coding for the conserved middle part of the neurophysins were compared, we found that these sequences are not merely similar but identical. The primary structure of the chromosomal genes now determined shows that both genes, which appear to have arisen by a gene duplication, are split into three exons, each encoding a functional domain of the precursor polypeptide. Sequence comparison reveals that the stretch of sequence identity within the two mRNAs is probably the result of a gene conversion encompassing exon B, which encodes the conserved part of the neurophysins, and part of the preceding intron.
Collapse
|
45
|
Devereux J, Haeberli P, Smithies O. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res 1984; 12:387-95. [PMID: 6546423 PMCID: PMC321012 DOI: 10.1093/nar/12.1part1.387] [Citation(s) in RCA: 11651] [Impact Index Per Article: 291.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The University of Wisconsin Genetics Computer Group (UWGCG) has been organized to develop computational tools for the analysis and publication of biological sequence data. A group of programs that will interact with each other has been developed for the Digital Equipment Corporation VAX computer using the VMS operating system. The programs available and the conditions for transfer are described.
Collapse
|
46
|
Collins FS, Weissman SM. The molecular genetics of human hemoglobin. PROGRESS IN NUCLEIC ACID RESEARCH AND MOLECULAR BIOLOGY 1984; 31:315-462. [PMID: 6397774 DOI: 10.1016/s0079-6603(08)60382-7] [Citation(s) in RCA: 299] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
47
|
Michelson AM, Orkin SH. Boundaries of gene conversion within the duplicated human alpha-globin genes. Concerted evolution by segmental recombination. J Biol Chem 1983. [DOI: 10.1016/s0021-9258(17)43800-2] [Citation(s) in RCA: 85] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
|
48
|
Poncz M, Schwartz E, Ballantine M, Surrey S. Nucleotide sequence analysis of the delta beta-globin gene region in humans. J Biol Chem 1983. [DOI: 10.1016/s0021-9258(17)44270-0] [Citation(s) in RCA: 76] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
49
|
Heilig R, Muraskowsky R, Kloepfer C, Mandel JL. The ovalbumin gene family: complete sequence and structure of the Y gene. Nucleic Acids Res 1982; 10:4363-82. [PMID: 7122240 PMCID: PMC320805 DOI: 10.1093/nar/10.14.4363] [Citation(s) in RCA: 57] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
The "ovalbumin Y" gene, one of three which constitute the ovalbumin gene family in chicken has been completely sequenced. The exact location of exons can be derived from the comparison with the ovalbumin gene sequence and from the map previously established by electron microscopy analysis. During evolution of the Y gene, selective pressure has operated to retain a sequence coding for an ovalbumin-like protein. The location of splice junctions, the length of protein coding exons and the reading phase are as in the ovalbumin gene. The overall homology between the Y and ovalbumin protein coding sequences is 72.6% (resulting in a 58% homology for the amino acid sequences). A significantly high number of base changes within coding sequences are present in clusters, which appear in several cases to be correlated with the occurrence of direct repeats. The 3' untranslated sequences of the Y and ovalbumin mRNAs have diverged much more, and the Y sequence contains a peculiar U(T) rich region. Corresponding introns of the ovalbumin and Y genes differ extensively both in sequence and in length. They share however characteristic biases in their base distribution.
Collapse
|
50
|
Lipman DJ, Maizel J. Comparative analysis of nucleic acid sequences by their general constraints. Nucleic Acids Res 1982; 10:2723-39. [PMID: 7079183 PMCID: PMC320646 DOI: 10.1093/nar/10.8.2723] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
We describe two measures of a nucleic acid sequence, derived from Information Theory, which characterize the constraints toward nonuniform base composition, and the constraints on the ordering of the bases. These two measures distinguish extra-chromosomal coding sequences from all other coding sequences examined. The two measures separate eukaryotic coding sequences into two groups: those with introns and those without introns. We have also found a relationship between the general constraints of a subsequence and its degree of conservation in related genes.
Collapse
|