1
|
Van Deynze A, Stoffel K, Buell CR, Kozik A, Liu J, van der Knaap E, Francis D. Diversity in conserved genes in tomato. BMC Genomics 2007; 8:465. [PMID: 18088428 PMCID: PMC2249608 DOI: 10.1186/1471-2164-8-465] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2007] [Accepted: 12/18/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tomato has excellent genetic and genomic resources including a broad set of Expressed Sequence Tag (EST) data and high-density genetic maps. In addition, emerging physical maps and bacterial artificial clone sequence data serve as template to investigate genetic variation within the cultivated germplasm pool with the goal to manipulate agriculturally important traits. Unfortunately, the nearly exclusive focus of resource development on interspecific populations for genetic analyses and diversity studies has left a void in our understanding of genotypic variation within tomato breeding programs that focus on intra-specific populations. We describe the results of a study to identify nucleotide variation within tomato breeding germplasm and mapping parents for a set of conserved single-copy ESTs that are orthologous between tomato and Arabidopsis. RESULTS Using a pooled sequencing strategy, 967 tomato transcripts were screened for polymorphism in 12 tomato lines. Although intron position was conserved, intron lengths were 2-fold larger in tomato than in Arabidopsis. A total of 1,487 single nucleotide polymorphisms and 282 insertion/deletions were identified, of which 579 and 206 were polymorphic in breeding germplasm, respectively. Fresh market and processing germplasm were clearly divergent, as were Solanum lycopersicum var. cerasiformae and Solanum pimpinellifolium, tomato's closest relatives. The polymorphisms identified serve as marker resources for tomato. The COS is also applicable to other Solanaceae crops. CONCLUSIONS The results from this research enabled significant progress towards bridging the gap between genetic and genomic resources developed for populations derived from wide crosses and those applicable to intra-specific crosses for breeding in tomato.
Collapse
Affiliation(s)
- Allen Van Deynze
- Seed Biotechnology Center, University of California, 1 Shields Ave,, Davis, CA, USA.
| | | | | | | | | | | | | |
Collapse
|
2
|
Van Deynze A, Stoffel K, Buell CR, Kozik A, Liu J, van der Knaap E, Francis D. Diversity in conserved genes in tomato. BMC Genomics 2007. [PMID: 18088428 DOI: 10.1186/1471‐2164‐8‐465] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tomato has excellent genetic and genomic resources including a broad set of Expressed Sequence Tag (EST) data and high-density genetic maps. In addition, emerging physical maps and bacterial artificial clone sequence data serve as template to investigate genetic variation within the cultivated germplasm pool with the goal to manipulate agriculturally important traits. Unfortunately, the nearly exclusive focus of resource development on interspecific populations for genetic analyses and diversity studies has left a void in our understanding of genotypic variation within tomato breeding programs that focus on intra-specific populations. We describe the results of a study to identify nucleotide variation within tomato breeding germplasm and mapping parents for a set of conserved single-copy ESTs that are orthologous between tomato and Arabidopsis. RESULTS Using a pooled sequencing strategy, 967 tomato transcripts were screened for polymorphism in 12 tomato lines. Although intron position was conserved, intron lengths were 2-fold larger in tomato than in Arabidopsis. A total of 1,487 single nucleotide polymorphisms and 282 insertion/deletions were identified, of which 579 and 206 were polymorphic in breeding germplasm, respectively. Fresh market and processing germplasm were clearly divergent, as were Solanum lycopersicum var. cerasiformae and Solanum pimpinellifolium, tomato's closest relatives. The polymorphisms identified serve as marker resources for tomato. The COS is also applicable to other Solanaceae crops. CONCLUSIONS The results from this research enabled significant progress towards bridging the gap between genetic and genomic resources developed for populations derived from wide crosses and those applicable to intra-specific crosses for breeding in tomato.
Collapse
Affiliation(s)
- Allen Van Deynze
- Seed Biotechnology Center, University of California, 1 Shields Ave,, Davis, CA, USA.
| | | | | | | | | | | | | |
Collapse
|
3
|
Budak H, Shearman RC, Dweikat I. Evolution of Buchloë dactyloides based on cloning and sequencing of matK, rbcL, and cob genes from plastid and mitochondrial genomes. Genome 2007; 48:411-6. [PMID: 16121238 DOI: 10.1139/g05-002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Buffalograss (Buchloë dactyloides (Nutt.) Englem), a C4 turfgrass species, is native to the Great Plains region of North America. The evolutionary implications of buffalograss are unclear. Sequencing of rbcL and matK genes from plastid and the cob gene from mitochondrial genomes was examined to elucidate buffalo grass evolution. This study is the first to report sequencing of these genes from organelle genomes in the genus Buchloë. Comparisons of sequence data from the mitochondrial and plastid genome revealed that all genotypes contained the same cytoplasmic origin. There were some rearrangements detected in mitochondrial genome. The buffalograss genome appears to have evolved through the rearrangements of convergent subgenomic domains. Combined analyses of plastid genes suggest that the evolutionary process in Buchloë accessions studied was monophyletic rather than polyphyletic. However, since plastid and mitochondrial genomes are generally uniparentally inherited, the evolutionary history of these genomes may not reflect the evolutionary history of the organism, especially in a species in which out-crossing is common. The sequence information obtained from this study can be used as a genome-specific marker for investigation of the buffalograss polyploidy complex and testing of the mode of plastid and mitochondrial transmission in genus Buchloë.
Collapse
Affiliation(s)
- Hikmet Budak
- Sabanci University, Faculty of Engineering and Natural Science, Biological Science, and Bioengineering, Istanbul, Turkey.
| | | | | |
Collapse
|
4
|
Richly E, Leister D. NUPTs in sequenced eukaryotes and their genomic organization in relation to NUMTs. Mol Biol Evol 2004; 21:1972-80. [PMID: 15254258 DOI: 10.1093/molbev/msh210] [Citation(s) in RCA: 103] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
NUPTs (nuclear plastid DNA) derive from plastid-to-nucleus DNA transfer and exist in various plant species. Experimental data imply that the DNA transfer is an ongoing, highly frequent process, but for the interspecific diversity of NUPTs, no clear explanation exists. Here, an inventory of NUPTs in the four sequenced plastid-bearing species and their genomic organization is presented. Large genomes with a predicted low gene density contain more NUPTs. In Chlamydomonas and Plasmodium, DNA transfer occurred but was limited, probably because of the presence of only one plastid per cell. In Arabidopsis and rice, NUPTs are frequently organized as clusters. Tight clusters can contain both NUPTs and NUMTs (nuclear mitochondrial DNA), indicating that preNUPTs and preNUMTs might have concatamerized before integration. The composition of such a hypothetical preNUPT-preNUMT pool seems to be variable, as implied by substantially different NUPTs:NUMTs ratios in different species. Loose clusters can span several dozens of kbps of nuclear DNA, and they contain markedly more NUPTs or NUMTs than expected from a random genomic distribution of nuclear organellar DNA. The level of sequence similarity between NUPTs/NUMTs and plastid/mitochondrial DNA correlates with the size of the integrant. This implies that original insertions are large and decay over evolutionary time into smaller fragments with diverging sequences. We suggest that tight and loose clusters represent intermediates of this decay process.
Collapse
Affiliation(s)
- Erik Richly
- Abteilung für Pflanzenzüchtung und Ertragsphysiologie, Max-Planck-Institut für Züchtungsforschung, Köln, Germany
| | | |
Collapse
|
5
|
Zolla L, Timperio AM. High performance liquid chromatography-electrospray mass spectrometry for the simultaneous resolution and identification of intrinsic thylakoid membrane proteins. Proteins 2000; 41:398-406. [PMID: 11025550 DOI: 10.1002/1097-0134(20001115)41:3<398::aid-prot110>3.0.co;2-k] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In higher plants, both photosystem I (PSI) and II (PSII) consist of membrane-embedded proteins that contain more than one transmembrane alpha helix. PSI is a multiprotein complex consisting of a core complex of thirteen proteins surrounded by four different types of light harvesting antenna proteins. Up to now, the protein components of both photosystems have been characterized by SDS-PAGE and/or immunoblotting and, therefore, identification made only on the basis of electrophoretic mobility, which is sometimes not sufficient to discriminate between individual membrane proteins. This is also complicated by the fact that some proteins, such as the antenna proteins, have almost identical molecular mass and amino acid sequence, making it difficult to identify and ascertain the relative stoichiometry of the proteins. In this paper, we report the complete resolution of the antenna proteins and most of the core components of PSI from spinach, together with the identification of proteins by molecular mass, successfully deduced by the combined use of HPLC coupled on-line with a mass spectrometer equipped with an electrospray ion source (ESI-MS). The proposed RP-HPLC-ESI-MS method holds several advantages over SDS-PAGE, including better protein separation, especially for antenna proteins, mass accuracy, speed, efficiency, and the potential to reveal isomeric forms. Moreover, the molecular masses determined by HPLC-ESI-MS are in good agreement with the molecular masses of the individual components calculated on the basis of their nucleotide-derived amino acid sequences, indicating an absence of post-translational modifications in these proteins. It follows that if the method proposed is useful for these highly hydrophobic proteins, it may be of general use for any membrane proteins, where the presence of detergent for solubilization may compromise their characterization.
Collapse
Affiliation(s)
- L Zolla
- Dipartimento di Scienze Ambientali, Università della Tuscia, Viterbo, Italy.
| | | |
Collapse
|
6
|
Herrnstadt C, Clevenger W, Ghosh SS, Anderson C, Fahy E, Miller S, Howell N, Davis RE. A novel mitochondrial DNA-like sequence in the human nuclear genome. Genomics 1999; 60:67-77. [PMID: 10458912 DOI: 10.1006/geno.1999.5907] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We describe here a nuclear mitochondrial DNA-like sequence (numtDNA) that is nearly identical in sequence to a continuous 5842 bp segment of human mitochondrial DNA (mtDNA) that spans nucleotide positions 3914 to 9755. On the basis of evolutionary divergence among modern primates, this numtDNA molecule appears to represent mtDNA from a hominid ancestor that has been translocated to the nuclear genome during the recent evolution of humans. This numtDNA sequence harbors synonymous and nonsynonymous nucleotide substitutions relative to the authentic human mtDNA sequence, including an array of substitutions that was previously found in the cytochrome c oxidase subunit 1 and 2 genes. These substitutions were previously reported to occur in human mtDNA, but subsequently contended to be present in a nuclear pseudogene sequence. We now demonstrate their exclusive association with this 5842-bp numtDNA, which we have characterized in its entirety. This numtDNA does not appear to be expressed as a mtDNA-encoded mRNA. It is present in nuclear DNA from human blood donors, in human SH-SY5Y and A431 cell lines, and in rho(0) SH-SY5Y and rho(0) A431 cell lines that were depleted of mtDNA. The existence of human numtDNA sequences with great similarities to human mtDNA renders the amplification of pure mtDNA from cellular DNA very difficult, thereby creating the potential for confounding studies of mitochondrial diseases and population genetics.
Collapse
|
7
|
Abstract
A surprisingly large number of plant nuclear DNA sequences inferred to be remnants of chloroplast and mitochondrial DNA migration events were detected through computer-assisted database searches. Nineteen independent organellar DNA insertions, with a median size of 117 bp (range of 38 to > 785 bp), occur in the proximity of 15 nuclear genes. One fragment appears to have been passed through a RNA intermediate, based on the presence of an edited version of the mitochondrial gene in the nucleus. Tandemly arranged fragments from disparate regions of organellar genomes and from different organellar genomes indicate that the fragments joined together from an intracellular pool of RNA and/or DNA before they integrated into the nuclear genome. Comparisons of integrated sequences to genes lacking the insertions, as well as the occurrence of coligated fragments, support a model of random integration by end joining. All transferred sequences were found in noncoding regions, but the positioning of organellar-derived DNA in introns, as well as regions 5' and 3' to nuclear genes, suggests that the random integration of organellar DNA has the potential to influence gene expression patterns. A semiquantitative estimate was performed on the amount of organellar DNA being transferred and assimilated into the nucleus. Based on this database survey, we estimate that 3-7% of the plant nuclear genomic sequence files contain organellar-derived DNA. The timing and the magnitude of genetic flux to the nuclear genome suggest that random integration is a substantial and ongoing process for creating sequence variation.
Collapse
Affiliation(s)
- J L Blanchard
- Department of Botany, University of Georgia, Athens 30602, USA
| | | |
Collapse
|
8
|
Richard M, Tremblay C, Bellemare G. Chloroplastic genomes of Ginkgo biloba and Chlamydomonas moewusii contain a chlB gene encoding one subunit of a light-independent protochlorophyllide reductase. Curr Genet 1994; 26:159-65. [PMID: 8001171 DOI: 10.1007/bf00313805] [Citation(s) in RCA: 25] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
We have cloned and sequenced a Chlamydomonas moewusii chloroplastic DNA fragment that includes a 563 amino-acid open reading frame (ORF563, chlB) presenting 89% amino-acid homology with ORF513 from Marchantia polymorpha. It is also homologous to ORF510 from Pinus thunbergii but includes two insertions absent in both M. polymorphia and P. thunbergii. The derived polypeptide is 54% similar to the product of bchB from Rhodobacter capsulatus, identified as one subunit of a light-independent NADH-protochlorophyllide reductase. We also isolated and sequenced an homologous chloroplastic gene from the gymnosperm Ginkgo biloba. Northern hybridizations performed on RNA isolated from synchronized Chlamydomonas eugametos cells showed higher expression between the tenth hour of light and the eighth hour of darkness, peaking during the first 2 h of darkness.
Collapse
Affiliation(s)
- M Richard
- Département de Biochimie, Faculté des Sciences et de Génie, Université Laval, Québec, Canada
| | | | | |
Collapse
|
9
|
Ayliffe MA, Timmis JN. Plastid DNA sequence homologies in the tobacco nuclear genome. MOLECULAR & GENERAL GENETICS : MGG 1992; 236:105-12. [PMID: 1337369 DOI: 10.1007/bf00279648] [Citation(s) in RCA: 25] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The tobacco (Nicotiana tabacum) nuclear genome contains long tracts of DNA (i.e. in excess of 18 kb) with high sequence homology to the tobacco plastid genome. Five lambda clones containing these nuclear DNA sequences encompass more than one-third of the tobacco plastid genome. The absolute size of these five integrants is unknown but potentially includes uninterrupted sequences that are as large as the plastid genome itself. An additional sequence was cloned consisting of both nuclear and plastid-derived DNA sequences. The nuclear component of the clone is part of a family of repeats, which are present in about 400 locations in the nuclear genome. The homologous sequences present in chromosomal DNA were very similar to those of the corresponding sequences in the plastid genome. However significant sequence divergence, including base substitutions, insertions and deletions of up to 41 bp, was observed between these nuclear sequences and the plastid genome. Associated with the larger deletions were sequence motifs suggesting that processes such as DNA replication slippage and excision of hairpin loops may have been involved in deletion formation.
Collapse
Affiliation(s)
- M A Ayliffe
- Department of Genetics, University of Adelaide, South Australia
| | | |
Collapse
|
10
|
Ayliffe MA, Timmis JN. Tobacco nuclear DNA contains long tracts of homology to chloroplast DNA. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 1992; 85:229-238. [PMID: 24197309 DOI: 10.1007/bf00222864] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/1992] [Accepted: 04/07/1992] [Indexed: 06/02/2023]
Abstract
Long tracts of DNA with high sequence homology to chloroplast DNA were isolated from nuclear genomic libraries of Nicotiana tabacum. One lambda EMBL4 clone was characterised in detail and assigned to nuclear DNA. The majority of the 15.5-kb sequence is greater than 99% homologous with its chloroplast DNA counterpart, but a single base deletion causes premature termination of the reading frame of the psaA gene. One region of the clone contains a concentration of deleted regions, and these were used to identify and quantify the sequence in native nuclear DNA by polymerase chain reaction (PCR) methods. An estimated 15 copies of this specific region are present in a 1c tobacco nucleus.
Collapse
Affiliation(s)
- M A Ayliffe
- Department of Genetics, University of Adelaide, GPO Box 498, 5001, Adelaide, South Australia
| | | |
Collapse
|
11
|
Ogihara Y, Terachi T, Sasakuma T. Structural analysis of length mutations in a hot-spot region of wheat chloroplast DNAs. Curr Genet 1992; 22:251-8. [PMID: 1339325 DOI: 10.1007/bf00351733] [Citation(s) in RCA: 25] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The hot-spot region related to length mutations in the chloroplast genome of the wheat group was precisely analyzed at the DNA sequence level. This region, located downstream from the rbcL gene, was highly enriched in A + T, and contained a number of direct and inverted repeats. Many deletions/insertions were observed in the region. In most deletions/insertions of multiple nucleotides, short repeated sequences were found at the mutation points. Furthermore, a pair of short repeated sequences was also observed at the border of the translocated gene. A sequence homologous with ORF512 of tobacco cpDNA was truncated in cpDNAs of the wheat group and found only in the mitochondrial DNA of Ae. crassa, suggesting the inter-organellar translocation of this sequence. Mechanisms that could generate structural alterations of the chloroplast genome in the wheat group are discussed.
Collapse
Affiliation(s)
- Y Ogihara
- Kihara Institute for Biological Research, Yokohama City University, Japan
| | | | | |
Collapse
|
12
|
Martin GB, Ganal MW, Tanksley SD. Construction of a yeast artificial chromosome library of tomato and identification of cloned segments linked to two disease resistance loci. ACTA ACUST UNITED AC 1992; 233:25-32. [PMID: 1351245 DOI: 10.1007/bf00587557] [Citation(s) in RCA: 97] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
We have constructed a yeast artificial chromosome (YAC) library of tomato for chromosome walking that contains the equivalent of three haploid genomes (22,000 clones). The source of high molecular weight DNA was leaf protoplasts from the tomato cultivars VFNT cherry and Rio Grande-PtoR, which together contain loci encoding resistance to six pathogens of tomato. Approximately 11,000 YACs have been screened with RFLP markers that cosegregate with Tm-2a and Pto - loci conferring resistance to tobacco mosaic virus and Pseudomonas syringae pv. tomato, respectively. Five YACs were identified that hybridized to the markers and are therefore starting points for chromosome walks to these genes. A subset of the library was characterized for the presence of various repetitive sequences and YACs were identified that carried TGRI, a repeat clustered near the telomeres of most tomato chromosomes, TGRII, an interspersed repeat, and TGRIII, a repeat that occurs primarily at centromeric sites. Evaluation of the library for organellar sequences revealed that approximately 10% of the clones contain chloroplast sequences. Many of these YAC clones appear to contain the entire 155 kb tomato chloroplast genome. The tomato cultivars used in the library construction, in addition to carrying various disease resistance genes, also contain the wild-type alleles corresponding to most recessive mutations that have been mapped by classical linkage analysis. Thus, in addition to its utility for physical mapping and genome studies, this library should be useful for chromosome walking to genes corresponding to virtually any phenotype that can be scored in a segregating population.
Collapse
Affiliation(s)
- G B Martin
- Department of Plant Breeding and Biometry, Cornell University, Ithaca, NY 14853
| | | | | |
Collapse
|
13
|
Ossorio PN, Sibley LD, Boothroyd JC. Mitochondrial-like DNA sequences flanked by direct and inverted repeats in the nuclear genome of Toxoplasma gondii. J Mol Biol 1991; 222:525-36. [PMID: 1660924 DOI: 10.1016/0022-2836(91)90494-q] [Citation(s) in RCA: 52] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
In the course of our genetic studies on Toxoplasma gondii, it was discovered that one cosmid hybridized to a repetitive element. The hybridization pattern observed for the enzyme BglII indicated that this cosmid hybridized to a large number of discrete, but related elements. Four BglII fragments were subcloned from the cosmid, and each was shown to hybridize with all the others, as well as to numerous dispersed sequences in genomic DNA. Three subclones were sequenced in their entirety, and shown to contain fragments of the genes for cytochrome oxidase subunit I and apocytochrome b, complete and functional copies of which have been found in only mitochondrial genomes. All the subcloned fragments were bounded at both ends by a 91 base-pair sequence, which contains a site for BglII. This 91 base-pair sequence could be found as either a direct or inverted repeat. It was determined that the BglII elements are arrayed downstream from a single copy nuclear gene. Comparison of genomic and cosmid DNAs confirmed that the cosmid faithfully reflects the nuclear genome. Although the mitochondrial genome of Toxoplasma has not been characterized, these nuclear mitochondrial-like sequences appear to be internally rearranged with respect to known, functional mitochondrial genomes, and with respect to each other. The finding of short repeated sequences flanking these elements may be a clue to the mechanism of their dissemination.
Collapse
Affiliation(s)
- P N Ossorio
- Department of Microbiology and Immunology, Stanford University School of Medicine, CA 94305-5402
| | | | | |
Collapse
|
14
|
Pichersky E, Logsdon JM, McGrath JM, Stasys RA. Fragments of plastid DNA in the nuclear genome of tomato: prevalence, chromosomal location, and possible mechanism of integration. MOLECULAR & GENERAL GENETICS : MGG 1991; 225:453-8. [PMID: 1673221 DOI: 10.1007/bf00261687] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
We have undertaken a systematic search for plastid DNA sequences integrated in the tomato nuclear genome, using heterologous probes taken from intervals of a plastid DNA region spanning 58 kb. A total of two short integrates (202 and 141 nucleotides) were isolated and mapped to chromosomes 9 and 5, respectively. The nucleotide sequence of the integrates and that of the flanking regions were determined. The integration sites contain direct repeat elements similar in position (but not in length or sequence) to the direct repeats previously observed with another plastid integrate in the tomato nuclear genome. Based on these results, a model for the process of movement and integration of plastid sequences into the nuclear genome is discussed.
Collapse
Affiliation(s)
- E Pichersky
- Department of Biology, University of Michigan, Ann Arbor 48109
| | | | | | | |
Collapse
|
15
|
Pietrokovski S, Hirshon J, Trifonov EN. Linguistic measure of taxonomic and functional relatedness of nucleotide sequences. J Biomol Struct Dyn 1990; 7:1251-68. [PMID: 2363847 DOI: 10.1080/07391102.1990.10508563] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The frequencies of "words", oligonucleotides within nucleotide sequences, reflect the genetic information contained in the sequence "texts". Nucleotide sequences are characteristically represented by their contrast word vocabularies. Comparison of the sequences by correlating their contrast vocabularies is shown to reflect well the relatedness (unrelatedness) between the sequences. A single value, the linguistic similarity between the sequences, is suggested as a measure of sequence relatedness. Sequences as short as 1000 bases can be characterized and quantitatively related to other sequences by this technique. The linguistic sequence similarity value is used for analysis of taxonomically and functionally diverse nucleotide sequences. The similarity value is shown to be very sensitive to the relatedness of the source species, thus providing a convenient tool for taxonomic classification of species by their sequence vocabularies. Functionally diverse sequences appear distinct by their linguistic similarity values. This can be a basis for a quick screening technique for functional characterization of the sequences and for mapping functionally distinct regions in long sequences.
Collapse
Affiliation(s)
- S Pietrokovski
- Department of Polymer Research, Weizmann Institute of Science, Rehovot, Israel
| | | | | |
Collapse
|