151
|
Abstract
The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.
Collapse
Affiliation(s)
- Li Li
- Department of Biology and Genetics, Center for Bioinformatics, and Genomics Institute, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | | | | |
Collapse
|
152
|
Abstract
The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.
Collapse
|
153
|
Schnabel E, Kulikova O, Penmetsa RV, Bisseling T, Cook DR, Frugoli J. An integrated physical, genetic and cytogenetic map around the sunn locus of Medicago truncatula. Genome 2003; 46:665-72. [PMID: 12897874 DOI: 10.1139/g03-019] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The sunn mutation of Medicago truncatula is a single-gene mutation that confers a novel supernodulation phenotype in response to inoculation with Sinorhizobium meliloti. We took advantage of the publicly available codominant PCR markers, the high-density genetic map, and a linked cytogenetic map to define the physical and genetic region containing sunn. We determined that sunn is located at the bottom of linkage group 4, where a fine-structure genetic map was used to place the locus within a approximately 400-kb contig of bacterial artificial chromosome (BAC) clones. Genetic analyses of the sunn contig, as well as of a second, closely linked BAC contig designated NUM1, indicate that the physical to genetic distance within this chromosome region is in the range of 1000 -1100 kb.cM-1. The ratio of genetic to cytogenetic distance determined across the entire region is 0.3 cM.microm(-1). These estimates are in good agreement with the empirically determined value of approximately 300 kb.microm(-1) measured for the NUM1 contig. The assignment of sunn to a defined physical interval should provide a basis for sequencing and ultimately cloning the responsible gene.
Collapse
Affiliation(s)
- E Schnabel
- Department of Genetic and Biochemistry, Clemson University, Clemson, SC USA.
| | | | | | | | | | | |
Collapse
|
154
|
Kalyanaraman A, Aluru S, Kothari S, Brendel V. Efficient clustering of large EST data sets on parallel computers. Nucleic Acids Res 2003; 31:2963-74. [PMID: 12771222 PMCID: PMC156714 DOI: 10.1093/nar/gkg379] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Clustering expressed sequence tags (ESTs) is a powerful strategy for gene identification, gene expression studies and identifying important genetic variations such as single nucleotide polymorphisms. To enable fast clustering of large-scale EST data, we developed PaCE (for Parallel Clustering of ESTs), a software program for EST clustering on parallel computers. In this paper, we report on the design and development of PaCE and its evaluation using Arabidopsis ESTs. The novel features of our approach include: (i) design of memory efficient algorithms to reduce the memory required to linear in the size of the input, (ii) a combination of algorithmic techniques to reduce the computational work without sacrificing the quality of clustering, and (iii) use of parallel processing to reduce run-time and facilitate clustering of larger data sets. Using a combination of these techniques, we report the clustering of 168 200 Arabidopsis ESTs in 15 min on an IBM xSeries cluster with 30 dual-processor nodes. We also clustered 327 632 rat ESTs in 47 min and 420 694 Triticum aestivum ESTs in 3 h and 15 min. We demonstrate the quality of our software using benchmark Arabidopsis EST data, and by comparing it with CAP3, a software widely used for EST assembly. Our software allows clustering of much larger EST data sets than is possible with current software. Because of its speed, it also facilitates multiple runs with different parameters, providing biologists a tool to better analyze EST sequence data. Using PaCE, we clustered EST data from 23 plant species and the results are available at the PlantGDB website.
Collapse
|
155
|
Wortman JR, Haas BJ, Hannick LI, Smith RK, Maiti R, Ronning CM, Chan AP, Yu C, Ayele M, Whitelaw CA, White OR, Town CD. Annotation of the Arabidopsis genome. PLANT PHYSIOLOGY 2003; 132:461-8. [PMID: 12805579 PMCID: PMC166989 DOI: 10.1104/pp.103.022251] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2003] [Revised: 03/07/2003] [Accepted: 03/18/2003] [Indexed: 05/18/2023]
Affiliation(s)
- Jennifer R Wortman
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
156
|
Abstract
Potential allergenicity of transgenic proteins for consumption must be investigated before their introduction into the food chain. A prerequisite is sequence analysis. We have critically reviewed the performance of the current guidelines proposed by the Food and Agriculture Organization (FAO) and the World Health Organization (WHO) for allergenicity prediction based on protein sequence and show that its precision is very low. To improve prediction, we propose a new strategy based on sequence motifs identified from a new allergen database. If tested on random test sequences and known allergens, both methods are apparently very sensitive. However, the precision of our motif-based prediction (95.5%) is superior to the current method (36.6%). We conclude that the proposed motif-based prediction is a superior alternative to the current method for use in the decision-tree approach for allergenicity assessment.
Collapse
Affiliation(s)
- Michael B Stadler
- Institute of Immunology, University of Bern, Sahlihaus 2, Inselspital, CH-3010 Bern, Switzerland.
| | | |
Collapse
|
157
|
Qin Q, Bergmann CW, Rose JKC, Saladie M, Kolli VSK, Albersheim P, Darvill AG, York WS. Characterization of a tomato protein that inhibits a xyloglucan-specific endoglucanase. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2003; 34:327-338. [PMID: 12713539 DOI: 10.1046/j.1365-313x.2003.01726.x] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
A basic, 51 kDa protein was purified from suspension-cultured tomato and shown to inhibit the hydrolytic activity of a xyloglucan-specific endoglucanase (XEG) from the fungus Aspergillus aculeatus. The tomato (Lycopersicon esculentum) protein, termed XEG inhibitor protein (XEGIP), inhibits XEG activity by forming a 1 : 1 protein:protein complex with a Ki approximately 0.5 nm. To our knowledge, XEGIP is the first reported proteinaceous inhibitor of any endo-beta-1,4-glucanase, including the cellulases. The cDNA encoding XEGIP was cloned and sequenced. Database analysis revealed homology with carrot extracellular dermal glycoprotein (EDGP), which has a putative role in plant defense. XEGIP also has sequence similarity to ESTs from a broad range of plant species, suggesting that XEGIP-like genes are widely distributed in the plant kingdom. Although Southern analysis detected only a single XEGIP gene in tomato, at least five other XEGIP-like tomato sequences have been identified. Similar small families of XEGIP-like sequences are present in other plants, including Arabidopsis. XEGIP also has some sequence similarity to two previously characterized proteins, basic globulin 7S protein from soybean and conglutin gamma from lupin. Several amino acids in the XEGIP sequence, notably 8 of the 12 cysteines, are generally conserved in all the XEGIP-like proteins we have encountered, suggesting a fundamental structural similarity. Northern analysis revealed that XEGIP is widely expressed in tomato vegetative tissues and is present in expanding and maturing fruit, but is downregulated during ripening.
Collapse
Affiliation(s)
- Qiang Qin
- Complex Carbohydrate Research Center and Department of Biochemistry and Molecular Biology, 220 Riverbend Road, University of Georgia, Athens 30602-4712, USA
| | | | | | | | | | | | | | | |
Collapse
|
158
|
Sallaud C, Meynard D, van Boxtel J, Gay C, Bès M, Brizard JP, Larmande P, Ortega D, Raynal M, Portefaix M, Ouwerkerk PBF, Rueb S, Delseny M, Guiderdoni E. Highly efficient production and characterization of T-DNA plants for rice ( Oryza sativa L.) functional genomics. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2003; 106:1396-408. [PMID: 12677401 DOI: 10.1007/s00122-002-1184-x] [Citation(s) in RCA: 148] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2002] [Accepted: 09/25/2002] [Indexed: 05/20/2023]
Abstract
We investigated the potential of an improved Agrobacterium tumefaciens-mediated transformation procedure of japonica rice ( Oryza sativa L.) for generating large numbers of T-DNA plants that are required for functional analysis of this model genome. Using a T-DNA construct bearing the hygromycin resistance ( hpt), green fluorescent protein ( gfp) and beta-glucuronidase ( gusA) genes, each individually driven by a CaMV 35S promoter, we established a highly efficient seed-embryo callus transformation procedure that results both in a high frequency (75-95%) of co-cultured calli yielding resistant cell lines and the generation of multiple (10 to more than 20) resistant cell lines per co-cultured callus. Efficiencies ranged from four to ten independent transformants per co-cultivated callus in various japonica cultivars. We further analysed the T-DNA integration patterns within a population of more than 200 transgenic plants. In the three cultivars studied, 30-40% of the T(0) plants were found to have integrated a single T-DNA copy. Analyses of segregation for hygromycin resistance in T(1) progenies showed that 30-50% of the lines harbouring multiple T-DNA insertions exhibited hpt gene silencing, whereas only 10% of lines harbouring a single T-DNA insertion was prone to silencing. Most of the lines silenced for hpt also exhibited apparent silencing of the gus and gfp genes borne by the T-DNA. The genomic regions flanking the left border of T-DNA insertion points were recovered in 477 plants and sequenced. Adapter-ligation Polymerase chain reaction analysis proved to be an efficient and reliable method to identify these sequences. By homology search, 77 T-DNA insertion sites were localized on BAC/PAC rice Nipponbare sequences. The influence of the organization of T-DNA integration on subsequent identification of T-DNA insertion sites and gene expression detection systems is discussed.
Collapse
Affiliation(s)
- C Sallaud
- Biotrop Programme, Cirad-Amis, Avenue Agropolis, 34398 Montpellier Cedex 5, France
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
159
|
Thibaud-Nissen F, Shealy RT, Khanna A, Vodkin LO. Clustering of microarray data reveals transcript patterns associated with somatic embryogenesis in soybean. PLANT PHYSIOLOGY 2003; 132:118-36. [PMID: 12746518 PMCID: PMC166958 DOI: 10.1104/pp.103.019968] [Citation(s) in RCA: 128] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2003] [Revised: 01/15/2003] [Accepted: 01/28/2003] [Indexed: 05/18/2023]
Abstract
Globular somatic embryos can be induced from immature cotyledons of soybean (Glycine max L. Merr. cv Jack) placed on high levels of the auxin 2,4-dichlorophenoxyacetic acid (2,4-D). Somatic embryos develop from the adaxial side of the cotyledon, whereas the abaxial side evolves into a callus. Using a 9,280-cDNA clone array, we have compared steady-state RNA from the adaxial side from which embryos develop and from the abaxial callus at five time points over the course of the 4 weeks necessary for the development of globular embryos. In a second set of experiments, we have profiled the expression of each clone in the adaxial side during the same period. A total of 495 genes differentially expressed in at least one of these experiments were grouped according to the similarity of their expression profiles using a nonhierarchical clustering algorithm. Our results indicate that the appearance of somatic embryos is preceded by dedifferentiation of the cotyledon during the first 2 weeks on auxin. Changes in mRNA abundance of genes characteristic of oxidative stress and genes indicative of cell division in the adaxial side of the cotyledons suggest that the arrangement of the new cells into organized structures might depend on a genetically controlled balance between cell proliferation and cell death. Our data also suggest that the formation of somatic globular embryos is accompanied by the transcription of storage proteins and the synthesis of gibberellic acid.
Collapse
|
160
|
Abstract
Following an interaction with rhizobial soil bacteria, legume plants are able to form a novel organ, termed the root nodule. This organ houses the rhizobial microsymbionts, which perform the biological nitrogen fixation process resulting in the incorporation of ammonia into plant organic molecules. Recent advances in genomics have opened exciting new perspectives in this field by providing the complete gene inventory of two rhizobial microsymbionts. The complete genome sequences of Mesorhizobium loti, the symbiont of several Lotus species, and Sinorhizobium meliloti, the symbiont of alfalfa, were determined and annotated in detail. For legume macrosymbionts, expressed sequence tag projects and expression analyses using DNA arrays in conjunction with proteomics approaches have identified numerous genes involved in root nodule formation and nitrogen fixation. The isolation of legume genes by tagging or positional cloning recently allowed the identification of genes that control the very early steps of root nodule organogenesis.
Collapse
Affiliation(s)
- Stefan Weidner
- Department of Genetics, University of Bielefeld, Postfach 100131, D-33501, Bielefeld, Germany.
| | | | | |
Collapse
|
161
|
Merrick JM, Osman A, Tsai J, Quackenbush J, LoVerde PT, Lee NH. The Schistosoma mansoni gene index: gene discovery and biology by reconstruction and analysis of expressed gene sequences. J Parasitol 2003; 89:261-9. [PMID: 12760639 DOI: 10.1645/0022-3395(2003)089[0261:tsmgig]2.0.co;2] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Expressed sequence tag (EST) sequencing and analysis is a primary research tool to identify and characterize the Schistosoma mansoni transcriptome. As part of our gene discovery effort, a total of 5,793 ESTs have been generated from clones selected randomly from complementary DNA (cDNA) libraries constructed from male and female adult worms. Assembly analysis of all the 16,813 public S. mansoni ESTs has identified 1,920 distinct tentative consensus sequences (TCs) and 5,571 nonoverlapping ESTs (singletons). Of these, 376 TCs (20%) and 1,449 singletons (26%) are unique to the SUNY/TIGR sequencing effort. Tentative consensus sequences and singletons were distributed into various categories of biological roles associated with cell structure, metabolism, protein fate, signal transduction, transcription, protein synthesis, transporters, and cell growth. The TCs and singletons represent transcripts that can be used as a resource for functional annotation of genomic sequence data, comparative sequence analysis, and cDNA clone selection for microarray projects. The utility of EST analysis is demonstrated by identifying new protease genes, which may be involved in hemoglobin degradation.
Collapse
Affiliation(s)
- Joseph M Merrick
- Department of Microbiology, State University of New York, Buffalo, New York 14214, USA.
| | | | | | | | | | | |
Collapse
|
162
|
Dixon RA, Sumner LW. Legume natural products: understanding and manipulating complex pathways for human and animal health. PLANT PHYSIOLOGY 2003; 131:878-85. [PMID: 12644640 PMCID: PMC1540287 DOI: 10.1104/pp.102.017319] [Citation(s) in RCA: 157] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Affiliation(s)
- Richard A Dixon
- Plant Biology Division, Samuel Roberts Noble Foundation, Ardmore, Oklahoma 73401, USA.
| | | |
Collapse
|
163
|
Blanc G, Hokamp K, Wolfe KH. A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res 2003; 13:137-44. [PMID: 12566392 PMCID: PMC420368 DOI: 10.1101/gr.751803] [Citation(s) in RCA: 513] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2002] [Accepted: 11/12/2002] [Indexed: 12/30/2022]
Abstract
The Arabidopsis genome contains numerous large duplicated chromosomal segments, but the different approaches used in previous analyses led to different interpretations regarding the number and timing of ancestral large-scale duplication events. Here, using more appropriate methodology and a more recent version of the genome sequence annotation, we investigate the scale and timing of segmental duplications in Arabidopsis. We used protein sequence similarity searches to detect duplicated blocks in the genome, used the level of synonymous substitution between duplicated genes to estimate the relative ages of the blocks containing them, and analyzed the degree of overlap between adjacent duplicated blocks. We conclude that the Arabidopsis lineage underwent at least two distinct episodes of duplication. One was a polyploidy that occurred much more recently than estimated previously, before the Arabidopsis/Brassica rapa split and probably during the early emergence of the crucifer family (24-40 Mya). An older set of duplicated blocks was formed after the monocot/dicot divergence, and the relatively low level of overlap among these blocks indicates that at least some of them are remnants of a larger duplication such as a polyploidy or aneuploidy.
Collapse
Affiliation(s)
- Guillaume Blanc
- Department of Genetics, Smurfit Institute, University of Dublin, Trinity College, Dublin 2, Ireland
| | | | | |
Collapse
|
164
|
Ronning CM, Stegalkina SS, Ascenzi RA, Bougri O, Hart AL, Utterbach TR, Vanaken SE, Riedmuller SB, White JA, Cho J, Pertea GM, Lee Y, Karamycheva S, Sultana R, Tsai J, Quackenbush J, Griffiths HM, Restrepo S, Smart CD, Fry WE, Van Der Hoeven R, Tanksley S, Zhang P, Jin H, Yamamoto ML, Baker BJ, Buell CR. Comparative analyses of potato expressed sequence tag libraries. PLANT PHYSIOLOGY 2003; 131:419-29. [PMID: 12586867 PMCID: PMC166819 DOI: 10.1104/pp.013581] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2002] [Revised: 10/21/2002] [Accepted: 11/14/2002] [Indexed: 05/18/2023]
Abstract
The cultivated potato (Solanum tuberosum) shares similar biology with other members of the Solanaceae, yet has features unique within the family, such as modified stems (stolons) that develop into edible tubers. To better understand potato biology, we have undertaken a survey of the potato transcriptome using expressed sequence tags (ESTs) from diverse tissues. A total of 61,940 ESTs were generated from aerial tissues, below-ground tissues, and tissues challenged with the late-blight pathogen (Phytophthora infestans). Clustering and assembly of these ESTs resulted in a total of 19,892 unique sequences with 8,741 tentative consensus sequences and 11,151 singleton ESTs. We were able to identify a putative function for 43.7% of these sequences. A number of sequences (48) were expressed throughout the libraries sampled, representing constitutively expressed sequences. Other sequences (13,068, 21%) were uniquely expressed and were detected only in a single library. Using hierarchal and k means clustering of the EST sequences, we were able to correlate changes in gene expression with major physiological events in potato biology. Using pair-wise comparisons of tuber-related tissues, we were able to associate genes with tuber initiation, dormancy, and sprouting. We also were able to identify a number of characterized as well as novel sequences that were unique to the incompatible interaction of late-blight pathogen, thereby providing a foundation for further understanding the mechanism of resistance.
Collapse
Affiliation(s)
- Catherine M Ronning
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
165
|
Brown S, Chang JL, Sadee W, Babbitt PC. A semiautomated approach to gene discovery through expressed sequence tag data mining: discovery of new human transporter genes. AAPS PHARMSCI 2003; 5:E1. [PMID: 12713273 PMCID: PMC2751469 DOI: 10.1208/ps050101] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Identification and functional characterization of the genes in the human genome remain a major challenge. A principal source of publicly available information used for this purpose is the National Center for Biotechnology Information database of expressed sequence tags (dbEST), which contains over 4 million human ESTs. To extract the information buried in this data more effectively, we have developed a semiautomated method to mine dbEST for uncharacterized human genes. Starting with a single protein input sequence, a family of related proteins from all species is compiled. This entire family is then used to mine the human EST database for new gene candidates. Evaluation of putative new gene candidates in the context of a family of characterized proteins provides a framework for inference of the structure and function of the new genes. When applied to a test data set of 28 families within the major facilitator superfamily (MFS) of membrane transporters, our protocol found 73 previously characterized human MFS genes and 43 new MFS gene candidates. Development of this approach provided insights into the problems and pitfalls of automated data mining using public databases.
Collapse
Affiliation(s)
- Shoshana Brown
- Department of Biopharmaceutical Sciences, School of Pharmacy, University of California, San Francisco, 513 Parnassus St., 94143 San Francisco, CA
| | - Jean l. Chang
- Department of Biopharmaceutical Sciences, School of Pharmacy, University of California, San Francisco, 513 Parnassus St., 94143 San Francisco, CA
- Whitehead Institute/MIT Center for Genome Research, 320 Charles St., 02141 Cambridge, MA
| | - Wolfgang Sadee
- Department of Biopharmaceutical Sciences, School of Pharmacy, University of California, San Francisco, 513 Parnassus St., 94143 San Francisco, CA
- Ohio State University Medical Center, 333 W. 10th Ave., 43210-1239 Columbus, OH
| | - Patricia C. Babbitt
- Department of Biopharmaceutical Sciences, School of Pharmacy, University of California, San Francisco, 513 Parnassus St., 94143 San Francisco, CA
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, 94143 San Francisco, CA
| |
Collapse
|
166
|
Zhu Y, King BL, Parvizi B, Brunk BP, Stoeckert CJ, Quackenbush J, Richardson J, Bult CJ. Integrating computationally assembled mouse transcript sequences with the Mouse Genome Informatics (MGI) database. Genome Biol 2003; 4:R16. [PMID: 12620126 PMCID: PMC151306 DOI: 10.1186/gb-2003-4-2-r16] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2002] [Revised: 11/27/2002] [Accepted: 12/19/2002] [Indexed: 12/27/2022] Open
Abstract
Databases of experimentally generated and computationally derived transcript sequences are valuable resources for genome analysis and annotation. The utility of such databases is enhanced when the sequences they contain are integrated with such biological information as genomic location, gene function, gene expression and phenotypic variation. We present the analysis and results of a semi-automated process of connecting transcript assemblies with highly curated biological information for mouse genes that is available through the Mouse Genome Informatics (MGI) database.
Collapse
Affiliation(s)
- Yunxia Zhu
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME 04609, USA.
| | | | | | | | | | | | | | | |
Collapse
|
167
|
Yuan Q, Ouyang S, Liu J, Suh B, Cheung F, Sultana R, Lee D, Quackenbush J, Buell CR. The TIGR rice genome annotation resource: annotating the rice genome and creating resources for plant biologists. Nucleic Acids Res 2003; 31:229-33. [PMID: 12519988 PMCID: PMC165506 DOI: 10.1093/nar/gkg059] [Citation(s) in RCA: 115] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Rice is not only a major food staple for the world's population but it also is a model species for a major group of flowering plants, the monocotyledonous plants. Draft genomic sequence of two subspecies of rice, Oryza sativa spp. japonica and indica ssp. are publicly available. To provide the community with a resource to data-mine the rice genome, we have constructed an annotation resource for rice (http://www.tigr.org/tdb/e2k1/osa1/). In this resource, we have annotated the rice genome for gene content, identified motifs/domains within the predicted genes, constructed a rice repeat database, identified related sequences in other plant species, and identified syntenic sequences between rice and maize. All of the data is available through web-based interfaces, FTP downloads, and a Distributed Annotation System.
Collapse
Affiliation(s)
- Qiaoping Yuan
- The Institute for Genomic Research, 9712 Medical Center Dr., Rockville, MD 20850, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
168
|
Mladek C, Guger K, Hauser MT. Identification and characterization of the ARIADNE gene family in Arabidopsis. A group of putative E3 ligases. PLANT PHYSIOLOGY 2003; 131:27-40. [PMID: 12529512 PMCID: PMC166784 DOI: 10.1104/pp.012781] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2002] [Revised: 09/03/2002] [Accepted: 09/26/2002] [Indexed: 05/18/2023]
Abstract
ARIADNE (ARI) proteins were recently identified in fruitfly (Drosophila melanogaster), mouse, and man because of their specific interaction with the ubiquitin-conjugating (E2) enzymes UbcD10, UbcM4, UbcH7, and UbcH8. They are characterized by specific motifs and protein structures that they share with PARKIN, and there is increasing evidence that ARI/PARKIN proteins function as E2-dependent ubiquitin-protein ligases. On the basis of homology and motif searches, 16 AtARI genes were identified in Arabidopsis. Analysis of the position of exons/introns and their chromosomal localization indicates that the AtARI gene family expanded via larger and smaller genome duplications. We present evidence that retroposition of processed mRNA may have also contributed to enlarging this gene family. Phylogenetic analyses divides the AtARI proteins into three subgroups. Two groups are absent in yeast, invertebrates, and vertebrates and may therefore represent new plant-specific subfamilies. Examination of the predicted protein sequences revealed that the ARI proteins share an additional leucine-rich region at the N terminus that is highly conserved in all phyla analyzed. Furthermore, conserved consensus signals for casein kinase II-dependent phosphorylation and for nuclear localization were identified. The in silico-based analyses were complemented with experimental data to quantify expression levels. Using real-time polymerase chain reaction, we show that the ARI genes are differentially transcribed. AtARI1 is highly expressed in all organs, whereas no transcripts could be detected for AtARI11, AtARI13, and AtARI14. AtARI12 and AtARI16 are expressed in an organ-specific manner in the roots and siliques, respectively.
Collapse
Affiliation(s)
- Christina Mladek
- Center of Applied Genetics, University of Agricultural Sciences Vienna, Austria
| | | | | |
Collapse
|
169
|
Welinder KG, Justesen AF, Kjaersgård IVH, Jensen RB, Rasmussen SK, Jespersen HM, Duroux L. Structural diversity and transcription of class III peroxidases from Arabidopsis thaliana. EUROPEAN JOURNAL OF BIOCHEMISTRY 2002; 269:6063-81. [PMID: 12473102 DOI: 10.1046/j.1432-1033.2002.03311.x] [Citation(s) in RCA: 175] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Understanding peroxidase function in plants is complicated by the lack of substrate specificity, the high number of genes, their diversity in structure and our limited knowledge of peroxidase gene transcription and translation. In the present study we sequenced expressed sequence tags (ESTs) encoding novel heme-containing class III peroxidases from Arabidopsis thaliana and annotated 73 full-length genes identified in the genome. In total, transcripts of 58 of these genes have now been observed. The expression of individual peroxidase genes was assessed in organ-specific EST libraries and compared to the expression of 33 peroxidase genes which we analyzed in whole plants 3, 6, 15, 35 and 59 days after sowing. Expression was assessed in root, rosette leaf, stem, cauline leaf, flower bud and cell culture tissues using the gene-specific and highly sensitive reverse transcriptase-polymerase chain reaction (RT-PCR). We predicted that 71 genes could yield stable proteins folded similarly to horseradish peroxidase (HRP). The putative mature peroxidases derived from these genes showed 28-94% amino acid sequence identity and were all targeted to the endoplasmic reticulum by N-terminal signal peptides. In 20 peroxidases these signal peptides were followed by various N-terminal extensions of unknown function which are not present in HRP. Ten peroxidases showed a C-terminal extension indicating vacuolar targeting. We found that the majority of peroxidase genes were expressed in root. In total, class III peroxidases accounted for an impressive 2.2% of root ESTs. Rather few peroxidases showed organ specificity. Most importantly, genes expressed constitutively in all organs and genes with a preference for root represented structurally diverse peroxidases (< 70% sequence identity). Furthermore, genes appearing in tandem showed distinct expression profiles. The alignment of 73 Arabidopsis peroxidase sequences provides an easy access to the identification of orthologous peroxidases in other plant species and will provide a common platform for combining knowledge of peroxidase structure and function relationships obtained in various species.
Collapse
Affiliation(s)
- Karen G Welinder
- Department of Protein Chemistry, University of Copenhagen, Denmark.
| | | | | | | | | | | | | |
Collapse
|
170
|
Suzuki H, Achnine L, Xu R, Matsuda SPT, Dixon RA. A genomics approach to the early stages of triterpene saponin biosynthesis in Medicago truncatula. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2002; 32:1033-1048. [PMID: 12492844 DOI: 10.1046/j.1365-313x.2002.01497.x] [Citation(s) in RCA: 169] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
The saponins of the model legume Medicago truncatula are glycosides of at least five different triterpene aglycones: soyasapogenol B, soyasapogenol E, medicagenic acid, hederagenin and bayogenin. These aglycones are most likely derived from beta-amyrin, a product of the cyclization of 2,3-oxidosqualene. Mining M. truncatula EST data sets led to the identification of sequences putatively encoding three early enzymes of triterpene aglycone formation: squalene synthase (SS), squalene epoxidase (SE), and beta-amyrin synthase (beta-AS). SS was functionally characterized by expression in Escherichia coli, two forms of SE by complementation of the yeast erg1 mutant, and beta-AS by expression in yeast. Beta-amyrin was the sole product of the cyclization of squalene epoxide by the recombinant M. truncatulabeta-AS, as judged by GC-MS and NMR. Transcripts encoding beta-AS, SS and one form of SE were strongly and co-ordinately induced, associated with accumulation of triterpenes, upon exposure of M. truncatula cell suspension cultures to methyl jasmonate. Sterol composition remained unaffected by jasmonate treatment. Molecular verification of induction of the triterpene pathway in a cell culture system provides a new tool for saponin pathway gene discovery by DNA array-based approaches.
Collapse
Affiliation(s)
- Hideyuki Suzuki
- Plant Biology Division, The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | | | | | | | | |
Collapse
|
171
|
Itoh A, Schilmiller AL, McCaig BC, Howe GA. Identification of a jasmonate-regulated allene oxide synthase that metabolizes 9-hydroperoxides of linoleic and linolenic acids. J Biol Chem 2002; 277:46051-8. [PMID: 12351632 DOI: 10.1074/jbc.m207234200] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Allene oxide synthase (AOS) is a cytochrome P-450 (CYP74A) that catalyzes the first step in the conversion of 13-hydroperoxy linolenic acid to jasmonic acid and related signaling molecules in plants. Here, we report the molecular cloning and characterization of a novel AOS-encoding cDNA (LeAOS3) from Lycopersicon esculentum whose predicted amino acid sequence classifies it as a member of the CYP74C subfamily of enzymes that was hitherto not known to include AOSs. Recombinant LeAOS3 expressed in Escherichia coli showed spectral characteristics of a P-450. The enzyme transformed 9- and 13-hydroperoxides of linoleic and linolenic acid to alpha-ketol, gamma-ketol, and cyclopentenone compounds that arise from spontaneous hydrolysis of unstable allene oxides, indicating that the enzyme is an AOS. Kinetic assays demonstrated that LeAOS3 was approximately 10-fold more active against 9-hydroperoxides than the corresponding 13-isomers. LeAOS3 transcripts accumulated in roots, but were undetectable in aerial parts of mature plants. In contrast to wild-type plants, LeAOS3 expression was undetectable in roots of a tomato mutant that is defective in jasmonic acid signaling. These findings suggest that LeAOS3 plays a role in the metabolism of 9-lipoxygenase-derived hydroperoxides in roots, and that this branch of oxylipin biosynthesis is regulated by the jasmonate signaling cascade.
Collapse
Affiliation(s)
- Aya Itoh
- Department of Energy Plant Research Laboratory and Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824-1312, USA
| | | | | | | |
Collapse
|
172
|
Guo D, Chen F, Dixon RA. Monolignol biosynthesis in microsomal preparations from lignifying stems of alfalfa (Medicago sativa L.). PHYTOCHEMISTRY 2002; 61:657-667. [PMID: 12423886 DOI: 10.1016/s0031-9422(02)00375-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Microsomal preparations from lignifying stems of alfalfa (Medicago sativa L.) contained coniferaldehyde 5-hydroxylase activity and immunodetectable caffeic acid 3-O-methyltransferase (COMT), and catalyzed the S-adenosyl L-methionine (SAM) dependent methylation of caffeic acid, caffeyl aldehyde and caffeyl alcohol. When supplied with NADPH and SAM, the microsomes converted caffeyl aldehyde to coniferaldehyde, 5-hydroxyconiferaldehyde, and traces of sinapaldehyde. Coniferaldehyde was a better precursor of sinapaldehyde than was 5-hydroxyconiferaldehyde. The alfalfa microsomes could not metabolize 4-coumaric acid, 4-coumaraldehyde, 4-coumaroyl CoA, or ferulic acid. No metabolism of monolignol precursors was observed in microsomal preparations from transgenic alfalfa down-regulated in COMT expression. In most microsomal preparations, the level of the metabolic conversions was independent of added recombinant COMT. Taken together, the data provide only limited support for the concept of metabolic channeling in the biosynthesis of S monolignols via coniferaldehyde.
Collapse
Affiliation(s)
- Dianjing Guo
- Plant Biology Division, Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, Oklahoma 73401, USA
| | | | | |
Collapse
|
173
|
Ayoubi P, Jin X, Leite S, Liu X, Martajaja J, Abduraham A, Wan Q, Yan W, Misawa E, Prade RA. PipeOnline 2.0: automated EST processing and functional data sorting. Nucleic Acids Res 2002; 30:4761-9. [PMID: 12409467 PMCID: PMC135791 DOI: 10.1093/nar/gkf585] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.
Collapse
Affiliation(s)
- Patricia Ayoubi
- Department of Microbiology and Molecular Genetics and. School of Mechanical and Aerospace Engineering, Oklahoma State University, Stillwater, OK 74078, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
174
|
Katsanis N, Worley KC, Gonzalez G, Ansley SJ, Lupski JR. A computational/functional genomics approach for the enrichment of the retinal transcriptome and the identification of positional candidate retinopathy genes. Proc Natl Acad Sci U S A 2002; 99:14326-31. [PMID: 12391299 PMCID: PMC137883 DOI: 10.1073/pnas.222409099] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Grouping genes by virtue of their sequence similarity, functional association, or spatiotemporal distribution is an important first step in investigating function. Given the recent identification of >30,000 human genes either by analyses of genomic sequence or by derivation/assembly of ESTs, automated means of discerning gene function and association with disease are critical for the efficient processing of this large volume of data. We have designed a series of computational tools to manipulate the EST sequence database (dbEST) to predict EST clusters likely representing genes expressed exclusively or preferentially in a specific tissue. We implemented this tool by extracting 40,000 human retinal ESTs and performing in silico subtraction against 1.4 million human ESTs. This process yielded 925 ESTs likely to be specifically or preferentially expressed in the retina. We mapped all retinal-specific/predominant sequences in the human genome and produced a web-based searchable map of the retina transcriptome, onto which we overlaid the positions of all mapped but uncloned retinopathy genes. This resource has provided positional candidates for 42 of 51 uncloned retinopathies and may expedite substantially the identification of disease-associated genes. More importantly, the ability to systematically group ESTs according to their predicted expression profile is likely to be an important resource for studying gene function in a wide range of tissues and physiological systems and to identify positional candidate genes for human disorders whose phenotypic manifestations are restricted to specific tissues/organs/cell types.
Collapse
Affiliation(s)
- Nicholas Katsanis
- Department of Molecular and Human Genetics, Texas Children's Hospital, Baylor College of Medicine, Houston, TX 77030, USA
| | | | | | | | | |
Collapse
|
175
|
Malek RL, Irby RB, Guo QM, Lee K, Wong S, He M, Tsai J, Frank B, Liu ET, Quackenbush J, Jove R, Yeatman TJ, Lee NH. Identification of Src transformation fingerprint in human colon cancer. Oncogene 2002; 21:7256-65. [PMID: 12370817 DOI: 10.1038/sj.onc.1205900] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2002] [Revised: 07/12/2002] [Accepted: 07/31/2002] [Indexed: 11/09/2022]
Abstract
We used a classical rodent model of transformation to understand the transcriptional processes, and hence the molecular and cellular events a given cell undergoes when progressing from a normal to a transformed phenotype. Src activation is evident in 80% of human colon cancer, yet the myriad of cellular processes effected at the level of gene expression has yet to be fully documented. We identified a Src 'transformation fingerprint' within the gene expression profiles of Src-transformed rat 3Y1 fibroblasts demonstrating a progression in transformation characteristics. To evaluate the role of this gene set in human cancer development and progression, we extracted the orthologous genes present on the Affymetrix Hu95A GeneChip (12k named genes) and compared expression profiles between the Src-induced rodent cell line model of transformation and staged colon tumors where Src is known to be activated. A similar gene expression pattern between the cell line model and staged colon tumors for components of the cell cycle, cytoskeletal associated proteins, transcription factors and lysosomal proteins suggests the need for co-regulation of several cellular processes in the progression of cancer. Genes not previously implicated in tumorigenesis were detected, as well as a set of 14 novel, highly conserved genes with here-to-fore unknown function. These studies define a set of transformation associated genes whose up-regulation has implications for understanding Src mediated transformation and strengthens the role of Src in the development and progression of human colon cancer. Supportive Supplemental Data can be viewed at http://pga.tigr.org/PGApubs.shtml.
Collapse
Affiliation(s)
- Renae L Malek
- Department of Functional Genomics, The Institute for Genomic Research, 9712 Medical Center Dr, Rockville, Maryland, MD 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
176
|
Thompson HGR, Harris JW, Wold BJ, Quake SR, Brody JP. Identification and confirmation of a module of coexpressed genes. Genome Res 2002; 12:1517-22. [PMID: 12368243 PMCID: PMC187523 DOI: 10.1101/gr.418402] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2002] [Accepted: 07/31/2002] [Indexed: 11/25/2022]
Abstract
We synthesize a large gene expression data set using dbEST and UniGene. We use guilt-by-association (GBA) to analyze this data set and identify coexpressed genes. One module, or group of genes, was found to be coexpressed mainly in tissue extracted from breast and ovarian cancers, but also found in tissue from lung cancers, brain cancers, and bone marrow. This module contains at least six members that are believed to be involved in either transcritional regulation (PDEF, H2AFO, NUCKS) or the ubiquitin proteasome pathway (PSMD7, SQSTM1, FLJ10111). We confirm these observations of coexpression by real-time RT-PCR analysis of mRNA extracted from four model breast epithelial cell lines.
Collapse
Affiliation(s)
- H Garrett R Thompson
- Department of Biomedical Engineering, University of California Irvine, Irvine, California 92697, USA
| | | | | | | | | |
Collapse
|
177
|
Fedorova M, van de Mortel J, Matsumoto PA, Cho J, Town CD, VandenBosch KA, Gantt JS, Vance CP. Genome-wide identification of nodule-specific transcripts in the model legume Medicago truncatula. PLANT PHYSIOLOGY 2002; 130:519-37. [PMID: 12376622 PMCID: PMC166584 DOI: 10.1104/pp.006833] [Citation(s) in RCA: 79] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
The Medicago truncatula expressed sequence tag (EST) database (Gene Index) contains over 140,000 sequences from 30 cDNA libraries. This resource offers the possibility of identifying previously uncharacterized genes and assessing the frequency and tissue specificity of their expression in silico. Because M. truncatula forms symbiotic root nodules, unlike Arabidopsis, this is a particularly important approach in investigating genes specific to nodule development and function in legumes. Our analyses have revealed 340 putative gene products, or tentative consensus sequences (TCs), expressed solely in root nodules. These TCs were represented by two to 379 ESTs. Of these TCs, 3% appear to encode novel proteins, 57% encode proteins with a weak similarity to the GenBank accessions, and 40% encode proteins with strong similarity to the known proteins. Nodule-specific TCs were grouped into nine categories based on the predicted function of their protein products. Besides previously characterized nodulins, other examples of highly abundant nodule-specific transcripts include plantacyanin, agglutinin, embryo-specific protein, and purine permease. Six nodule-specific TCs encode calmodulin-like proteins that possess a unique cleavable transit sequence potentially targeting the protein into the peribacteroid space. Surprisingly, 114 nodule-specific TCs encode small Cys cluster proteins with a cleavable transit peptide. To determine the validity of the in silico analysis, expression of 91 putative nodule-specific TCs was analyzed by macroarray and RNA-blot hybridizations. Nodule-enhanced expression was confirmed experimentally for the TCs composed of five or more ESTs, whereas the results for those TCs containing fewer ESTs were variable.
Collapse
Affiliation(s)
- Maria Fedorova
- Department of Agronomy and Plant Genetics, 1991 Upper Bedford Circle, University of Minnesota, St. Paul, MN 55108, USA
| | | | | | | | | | | | | | | |
Collapse
|
178
|
Koo AJK, Ohlrogge JB. The predicted candidates of Arabidopsis plastid inner envelope membrane proteins and their expression profiles. PLANT PHYSIOLOGY 2002; 130:823-36. [PMID: 12376647 PMCID: PMC166609 DOI: 10.1104/pp.008052] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2002] [Revised: 05/30/2002] [Accepted: 06/13/2002] [Indexed: 05/17/2023]
Abstract
Plastid envelope proteins from the Arabidopsis nuclear genome were predicted using computational methods. Selection criteria were: first, to find proteins with NH(2)-terminal plastid-targeting peptides from all annotated open reading frames from Arabidopsis; second, to search for proteins with membrane-spanning domains among the predicted plastidial-targeted proteins; and third, to subtract known thylakoid membrane proteins. Five hundred forty-one proteins were selected as potential candidates of the Arabidopsis plastid inner envelope membrane proteins (AtPEM candidates). Only 34% (183) of the AtPEM candidates could be assigned to putative functions based on sequence similarity to proteins of known function (compared with the 69% function assignment of the total predicted proteins in the genome). Of the 183 candidates with assigned functions, 40% were classified in the category of "transport facilitation," indicating that this collection is highly enriched in membrane transporters. Information on the predicted proteins, tissue expression data from expressed sequence tags and microarrays, and publicly available T-DNA insertion lines were collected. The data set complements proteomic-based efforts in the increased detection of integral membrane proteins, low-abundance proteins, or those not expressed in tissues selected for proteomic analysis. Digital northern analysis of expressed sequence tags suggested that the transcript levels of most AtPEM candidates were relatively constant among different tissues in contrast to stroma and the thylakoid proteins. However, both digital northern and microarray analyses identified a number of AtPEM candidates with tissue-specific expression patterns.
Collapse
Affiliation(s)
- Abraham J K Koo
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824-1312, USA
| | | |
Collapse
|
179
|
Abstract
The computational detection of novel selenoproteins in genomic sequences is usually achieved through identification of SECIS, a conserved secondary structure element found in the 3' UTR of animal selenoprotein mRNAs. Previous studies have used "descriptors" specifying the number of base pairs and the conserved nucleotides in SECIS to identify this element. A major drawback of the "descriptor" approach is that the number of detections in current genomic or transcript databases largely exceeds the number of true selenoproteins. In this study, we use instead the ERPIN program to detect SECIS elements. ERPIN is based on a lod-score profile algorithm that uses a training-set of aligned RNA sequences as input. From an initial alignment of 44 animal SECIS sequences, we performed a series of iterative searches in which the training set was progressively enriched up to 117 confirmed SECIS elements, from a large collection of metazoan species. About 200 high-scoring candidates were also detected. We show that ERPIN scores for these candidates can be converted into expect values, thus enabling their statistical evaluation. The most interesting SECIS candidates are presented.
Collapse
|
180
|
Dixon RA, Achnine L, Kota P, Liu CJ, Reddy MSS, Wang L. The phenylpropanoid pathway and plant defence-a genomics perspective. MOLECULAR PLANT PATHOLOGY 2002; 3:371-90. [PMID: 20569344 DOI: 10.1046/j.1364-3703.2002.00131.x] [Citation(s) in RCA: 672] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Summary The functions of phenylpropanoid compounds in plant defence range from preformed or inducible physical and chemical barriers against infection to signal molecules involved in local and systemic signalling for defence gene induction. Defensive functions are not restricted to a particular class of phenylpropanoid compound, but are found in the simple hydroxycinnamic acids and monolignols through to the more complex flavonoids, isoflavonoids, and stilbenes. The enzymatic steps involved in the biosynthesis of the major classes of phenylpropanoid compounds are now well established, and many of the corresponding genes have been cloned. Less is understood about the regulatory genes that orchestrate rapid, coordinated induction of phenylpropanoid defences in response to microbial attack. Many of the biosynthetic pathway enzymes are encoded by gene families, but the specific functions of individual family members remain to be determined. The availability of the complete genome sequence of Arabidopsis thaliana, and the extensive expressed sequence tag (EST) resources in other species, such as rice, soybean, barrel medic, and tomato, allow, for the first time, a full appreciation of the comparative genetic complexity of the phenylpropanoid pathway across species. In addition, gene expression array analysis and metabolic profiling approaches make possible comparative parallel analyses of global changes at the genome and metabolome levels, facilitating an understanding of the relationships between changes in specific transcripts and subsequent alterations in metabolism in response to infection.
Collapse
Affiliation(s)
- Richard A Dixon
- Plant Biology Division, Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401, USA
| | | | | | | | | | | |
Collapse
|
181
|
Affiliation(s)
- Alain Lescure
- Unité Propre de Recherche 9002 du CNRS, Institut de Biologie Moléculaire et Cellulaire, 67084 Strasbourg, France
| | | | | |
Collapse
|
182
|
Van der Hoeven R, Ronning C, Giovannoni J, Martin G, Tanksley S. Deductions about the number, organization, and evolution of genes in the tomato genome based on analysis of a large expressed sequence tag collection and selective genomic sequencing. THE PLANT CELL 2002; 14:1441-56. [PMID: 12119366 PMCID: PMC150698 DOI: 10.1105/tpc.010478] [Citation(s) in RCA: 137] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2001] [Accepted: 04/18/2002] [Indexed: 05/18/2023]
Abstract
Analysis of a collection of 120,892 single-pass ESTs, derived from 26 different tomato cDNA libraries and reduced to a set of 27,274 unique consensus sequences (unigenes), revealed that 70% of the unigenes have identifiable homologs in the Arabidopsis genome. Genes corresponding to metabolism have remained most conserved between these two genomes, whereas genes encoding transcription factors are among the fastest evolving. The majority of the 10 largest conserved multigene families share similar copy numbers in tomato and Arabidopsis, suggesting that the multiplicity of these families may have occurred before the divergence of these two species. An exception to this multigene conservation was observed for the E8-like protein family, which is associated with fruit ripening and has higher copy number in tomato than in Arabidopsis. Finally, six BAC clones from different parts of the tomato genome were isolated, genetically mapped, sequenced, and annotated. The combined analysis of the EST database and these six sequenced BACs leads to the prediction that the tomato genome encodes approximately 35,000 genes, which are sequestered largely in euchromatic regions corresponding to less than one-quarter of the total DNA in the tomato nucleus.
Collapse
|
183
|
Nielsen HL, Rønnov-Jessen L, Villadsen R, Petersen OW. Identification of EPSTI1, a novel gene induced by epithelial-stromal interaction in human breast cancer. Genomics 2002; 79:703-10. [PMID: 11991720 DOI: 10.1006/geno.2002.6755] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
During growth, invasion, and metastasis, tumor cells interact extensively with the surrounding stroma. To identify genes that are upregulated during this process, we compared mRNA pooled from tumor cells and fibroblasts cultured separately to mRNA from cells in coculture. Using differential display (DD), a transcript representing a novel gene, designated epithelial-stromal interaction 1 (breast) (EPSTI1), was identified. EPSTI1 showed no homology to any known gene, but matched a cluster of expressed-sequence tags (ESTs). The full-length cDNA of 1508 bp was generated by 5'-RACE, included an open reading frame (ORF) encoding a putative 307-amino-acid protein, and mapped to chromosome 13q13.3. EPSTI1 was highly upregulated in invasive breast carcinomas compared with normal breast. In a tissue mRNA panel the most prominent expression of EPSTI1 was found in placenta. Thus, EPSTI1 is a novel human gene expressed in tissues characterized by extensive epithelial-stromal interaction, and expression of this gene may be a crucial event in invasion and metastasis of cancer.
Collapse
Affiliation(s)
- Helga Lind Nielsen
- Structural Cell Biology Unit, Department of Medical Anatomy A, the Panum Institute, University of Copenhagen, Blegdamsvej 3, DK-2200 Copenhagen N, Denmark
| | | | | | | |
Collapse
|
184
|
Fielden MR, Matthews JB, Fertuck KC, Halgren RG, Zacharewski TR. In silico approaches to mechanistic and predictive toxicology: an introduction to bioinformatics for toxicologists. Crit Rev Toxicol 2002; 32:67-112. [PMID: 11951993 DOI: 10.1080/20024091064183] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Bioinformatics, or in silico biology, is a rapidly growing field that encompasses the theory and application of computational approaches to model, predict, and explain biological function at the molecular level. This information rich field requires new skills and new understanding of genome-scale studies in order to take advantage of the rapidly increasing amount of sequence, expression, and structure information in public and private databases. Toxicologists are poised to take advantage of the large public databases in an effort to decipher the molecular basis of toxicity. With the advent of high-throughput sequencing and computational methodologies, expressed sequences can be rapidly detected and quantitated in target tissues by database searching. Novel genes can also be isolated in silico, while their function can be predicted and characterized by virtue of sequence homology to other known proteins. Genomic DNA sequence data can be exploited to predict target genes and their modes of regulation, as well as identify susceptible genotypes based on single nucleotide polymorphism data. In addition, highly parallel gene expression profiling technologies will allow toxicologists to mine large databases of gene expression data to discover molecular biomarkers and other diagnostic and prognostic genes or expression profiles. This review serves to introduce to toxicologists the concepts of in silico biology most relevant to mechanistic and predictive toxicology, while highlighting the applicability of in silico methods using select examples.
Collapse
Affiliation(s)
- Mark R Fielden
- Department of Biochemistry and Molecular Biology, National Food Safety and Toxicology Center, Michigan State University, East Lansing 48824, USA
| | | | | | | | | |
Collapse
|
185
|
Kantety RV, La Rota M, Matthews DE, Sorrells ME. Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat. PLANT MOLECULAR BIOLOGY 2002; 48:501-10. [PMID: 11999831 DOI: 10.1023/a:1014875206165] [Citation(s) in RCA: 316] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Plant genomics projects involving model species and many agriculturally important crops are resulting in a rapidly increasing database of genomic and expressed DNA sequences. The publicly available collection of expressed sequence tags (ESTs) from several grass species can be used in the analysis of both structural and functional relationships in these genomes. We analyzed over 260000 EST sequences from five different cereals for their potential use in developing simple sequence repeat (SSR) markers. The frequency of SSR-containing ESTs (SSR-ESTs) in this collection varied from 1.5% for maize to 4.7% for rice. In addition, we identified several ESTs that are related to the SSR-ESTs by BLAST analysis. The SSR-ESTs and the related sequences were clustered within each species in order to reduce the redundancy and to produce a longer consensus sequence. The consensus and singleton sequences from each species were pooled and clustered to identify cross-species matches. Overall a reduction in the redundancy by 85% was observed when the resulting consensus and singleton sequences (3569) were compared to the total number of SSR-EST and related sequences analyzed (24 606). This information can be useful for the development of SSR markers that can amplify across the grass genera for comparative mapping and genetics. Functional analysis may reveal their role in plant metabolism and gene evolution.
Collapse
Affiliation(s)
- Ramesh V Kantety
- Department of Plant Breeding, Cornell University, Ithaca, NY 14853, USA
| | | | | | | |
Collapse
|
186
|
Kallberg Y, Oppermann U, Jörnvall H, Persson B. Short-chain dehydrogenase/reductase (SDR) relationships: a large family with eight clusters common to human, animal, and plant genomes. Protein Sci 2002; 11:636-41. [PMID: 11847285 PMCID: PMC2373483 DOI: 10.1110/ps.26902] [Citation(s) in RCA: 177] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
The progress in genome characterizations has opened new routes for studying enzyme families. The availability of the human genome enabled us to delineate the large family of short-chain dehydrogenase/reductase (SDR) members. Although the human genome releases are not yet final, we have already found 63 members. We have also compared these SDR forms with those of three model organisms: Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana. We detect eight SDR ortholog clusters in a cross-genome comparison. Four of these clusters represent extended SDR forms, a subgroup found in all life forms. The other four are classical SDRs with activities involved in cellular differentiation and signalling. We also find 18 SDR genes that are present only in the human genome of the four genomes studied, reflecting enzyme forms specific to mammals. Close to half of these gene products represent steroid dehydrogenases, emphasizing the regulatory importance of these enzymes.
Collapse
Affiliation(s)
- Yvonne Kallberg
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, S-171 77 Stockholm, Sweden
| | | | | | | |
Collapse
|
187
|
Telles GP, Braga MD, Dias Z, Tzy-Li L, Quitzau JA, Silva FRD, Meidanis J. Bioinformatics of the sugarcane EST project. Genet Mol Biol 2001. [DOI: 10.1590/s1415-47572001000100003] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The Sugarcane EST project (SUCEST) produced 291,904 expressed sequence tags (ESTs) in a consortium that involved 74 sequencing and data mining laboratories. We created a web site for this project that served as a ‘meeting point’ for receiving, processing, analyzing, and providing services to help explore the sequence data. In this paper we describe the information pathway that we implemented to support this project and a brief explanation of the clustering procedure, which resulted in 43,141 clusters.
Collapse
|
188
|
Abstract
The original clustering procedure adopted in the Sugarcane Expressed Sequence Tag project (SUCEST) had many problems, for instance too many clusters, the presence of ribosomal sequences, etc. We therefore redesigned the clustering procedure entirely, including a much more careful initial trimming of the reads. In this paper the new trimming and clustering strategies are described in detail and we give the new official figures for the project, 237,954 expressed sequence tags and 43,141 clusters.
Collapse
|
189
|
Gautheret D, Lambert A. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles. J Mol Biol 2001; 313:1003-11. [PMID: 11700055 DOI: 10.1006/jmbi.2001.5102] [Citation(s) in RCA: 209] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We present here a new approach to the problem of defining RNA signatures and finding their occurrences in sequence databases. The proposed method is based on "secondary structure profiles". An RNA sequence alignment with secondary structure information is used as an input. Two types of weight matrices/profiles are constructed from this alignment: single strands are represented by a classical lod-scores profile while helical regions are represented by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then conducted using a simultaneous search for helical profiles and dynamic programming alignment of single strand profiles. The algorithm has been implemented into a new software, ERPIN, that performs both profile construction and database search. Applications are presented for several RNA motifs. The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs. Furthermore, since the translation of alignments into profiles is straightforward with ERPIN, iterative searches can easily be conducted to enrich collections of homologous RNAs.
Collapse
Affiliation(s)
- D Gautheret
- Centre d'Immunologie de Marseille Luminy, CNRS UMR 6102/INSERM U 136, Luminy Case 906, 13288 Marseille Cedex 09, France.
| | | |
Collapse
|
190
|
Camargo AA, Samaia HP, Dias-Neto E, Simão DF, Migotto IA, Briones MR, Costa FF, Nagai MA, Verjovski-Almeida S, Zago MA, Andrade LE, Carrer H, El-Dorry HF, Espreafico EM, Habr-Gama A, Giannella-Neto D, Goldman GH, Gruber A, Hackel C, Kimura ET, Maciel RM, Marie SK, Martins EA, Nobrega MP, Paco-Larson ML, Pardini MI, Pereira GG, Pesquero JB, Rodrigues V, Rogatto SR, da Silva ID, Sogayar MC, Sonati MF, Tajara EH, Valentini SR, Alberto FL, Amaral ME, Aneas I, Arnaldi LA, de Assis AM, Bengtson MH, Bergamo NA, Bombonato V, de Camargo ME, Canevari RA, Carraro DM, Cerutti JM, Correa ML, Correa RF, Costa MC, Curcio C, Hokama PO, Ferreira AJ, Furuzawa GK, Gushiken T, Ho PL, Kimura E, Krieger JE, Leite LC, Majumder P, Marins M, Marques ER, Melo AS, Melo MB, Mestriner CA, Miracca EC, Miranda DC, Nascimento AL, Nobrega FG, Ojopi EP, Pandolfi JR, Pessoa LG, Prevedel AC, Rahal P, Rainho CA, Reis EM, Ribeiro ML, da Ros N, de Sa RG, Sales MM, Sant'anna SC, dos Santos ML, da Silva AM, da Silva NP, Silva WA, da Silveira RA, Sousa JF, Stecconi D, Tsukumo F, Valente V, Soares F, Moreira ES, Nunes DN, Correa RG, Zalcberg H, Carvalho AF, Reis LF, Brentani RR, Simpson AJ, de Souza SJ, Melo M. The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome. Proc Natl Acad Sci U S A 2001; 98:12103-8. [PMID: 11593022 PMCID: PMC59775 DOI: 10.1073/pnas.201182798] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Collapse
Affiliation(s)
- A A Camargo
- Ludwig Institute for Cancer Research, 01509-010, São Paulo, Brazil
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
191
|
Zhao S, Shatsman S, Ayodeji B, Geer K, Tsegaye G, Krol M, Gebregeorgis E, Shvartsbeyn A, Russell D, Overton L, Jiang L, Dimitrov G, Tran K, Shetty J, Malek JA, Feldblyum T, Nierman WC, Fraser CM. Mouse BAC ends quality assessment and sequence analyses. Genome Res 2001; 11:1736-45. [PMID: 11591651 PMCID: PMC311142 DOI: 10.1101/gr.179201] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
A large-scale BAC end-sequencing project at The Institute for Genomic Research (TIGR) has generated one of the most extensive sets of sequence markers for the mouse genome to date. With a sequencing success rate of >80%, an average read length of 485 bp, and ABI3700 capillary sequencers, we have generated 449,234 nonredundant mouse BAC end sequences (mBESs) with 218 Mb total from 257,318 clones from libraries RPCI-23 and RPCI-24, representing 15x clone coverage, 7% sequence coverage, and a marker every 7 kb across the genome. A total of 191,916 BACs have sequences from both ends providing 12x genome coverage. The average Q20 length is 406 bp and 84% of the bases have phred quality scores > or = 20. RPCI-24 mBESs have more Q20 bases and longer reads on average than RPCI-23 sequences. ABI3700 sequencers and the sample tracking system ensure that > 95% of mBESs are associated with the right clone identifiers. We have found that a significant fraction of mBESs contains L1 repeats and approximately 48% of the clones have both ends with > or = 100 bp contiguous unique Q20 bases. About 3% mBESs match ESTs and > 70% of matches were conserved between the mouse and the human or the rat. Approximately 0.1% mBESs contain STSs. About 0.2% mBESs match human finished sequences and > 70% of these sequences have EST hits. The analyses indicate that our high-quality mouse BAC end sequences will be a valuable resource to the community.
Collapse
Affiliation(s)
- S Zhao
- The Institute for Genomic Research, Rockville, Maryland 20850, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
192
|
Abstract
The recent release of the draft sequence and the eventual completion of the human genome present the scientific community with a rich source of data to mine. Yet, these data are content poor in the absence of additional correlative information. Expressed sequence tag (EST) datasets and their associated gene indices have existed for many years, and represent the first attempt at understanding the complexity of the genome. These datasets remain extremely important as information sources and, in particular, as tools for analyzing the completed genomes. Here, we discuss the nature of ESTs and their associated tools and gene-indexing databases. In particular, we will compare three EST gene indices (UNIGENE, Merck Gene Index Version 2.0 and Doubletwist CAT), discuss how these gene indices are applied for both genome analysis and drug discovery, and demonstrate their importance as a complementary dataset to the annotated human genome.
Collapse
Affiliation(s)
- J Yuan
- Department of Bioinformatics, Merck & Co., Inc., P.O. Box 2000-RY80-A1, Rahway, NJ 07065, USA.
| | | | | | | | | |
Collapse
|
193
|
Wright FA, Lemon WJ, Zhao WD, Sears R, Zhuo D, Wang JP, Yang HY, Baer T, Stredney D, Spitzner J, Stutz A, Krahe R, Yuan B. A draft annotation and overview of the human genome. Genome Biol 2001; 2:RESEARCH0025. [PMID: 11516338 PMCID: PMC55322 DOI: 10.1186/gb-2001-2-7-research0025] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2001] [Revised: 04/04/2001] [Accepted: 06/01/2001] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena. RESULTS We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome. CONCLUSIONS We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence.
Collapse
Affiliation(s)
- Fred A Wright
- Division of Human Cancer Genetics, The Ohio State University, 420 W. 12th Avenue, Columbus, OH 43210, USA
| | - William J Lemon
- Division of Human Cancer Genetics, The Ohio State University, 420 W. 12th Avenue, Columbus, OH 43210, USA
| | - Wei D Zhao
- Division of Human Cancer Genetics, The Ohio State University, 420 W. 12th Avenue, Columbus, OH 43210, USA
| | - Russell Sears
- Division of Human Cancer Genetics, The Ohio State University, 420 W. 12th Avenue, Columbus, OH 43210, USA
| | - Degen Zhuo
- Division of Human Cancer Genetics, The Ohio State University, 420 W. 12th Avenue, Columbus, OH 43210, USA
| | - Jian-Ping Wang
- Division of Human Cancer Genetics, The Ohio State University, 420 W. 12th Avenue, Columbus, OH 43210, USA
| | - Hee-Yung Yang
- LabBook.com, Busch Boulevard, Columbus, OH 43229, USA
| | - Troy Baer
- Ohio Supercomputer Center (OSC), Kinnear Road, Columbus, OH 43212, USA
| | - Don Stredney
- Ohio Supercomputer Center (OSC), Kinnear Road, Columbus, OH 43212, USA
- Department of Computer and Information Science, The Ohio State University, Neil Avenue, Columbus, OH 43210, USA
| | - Joe Spitzner
- LabBook.com, Busch Boulevard, Columbus, OH 43229, USA
| | - Al Stutz
- Ohio Supercomputer Center (OSC), Kinnear Road, Columbus, OH 43212, USA
- Department of Computer and Information Science, The Ohio State University, Neil Avenue, Columbus, OH 43210, USA
| | - Ralf Krahe
- Division of Human Cancer Genetics, The Ohio State University, 420 W. 12th Avenue, Columbus, OH 43210, USA
| | - Bo Yuan
- Division of Human Cancer Genetics, The Ohio State University, 420 W. 12th Avenue, Columbus, OH 43210, USA
| |
Collapse
|
194
|
Zhuo D, Zhao WD, Wright FA, Yang HY, Wang JP, Sears R, Baer T, Kwon DH, Gordon D, Gibbs S, Dai D, Yang Q, Spitzner J, Krahe R, Stredney D, Stutz A, Yuan B. Assembly, Annotation, and Integration of UNIGENE Clusters into the Human Genome Draft. Genome Res 2001. [DOI: 10.1101/gr.164501] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The recent release of the first draft of the human genome provides an unprecedented opportunity to integrate human genes and their functions in a complete positional context. However, at least three significant technical hurdles remain: first, to assemble a complete and nonredundant human transcript index; second, to accurately place the individual transcript indices on the human genome; and third, to functionally annotate all human genes. Here, we report the extension of the UNIGENE database through the assembly of its sequence clusters into nonredundant sequence contigs. Each resulting consensus was aligned to the human genome draft. A unique location for each transcript within the human genome was determined by the integration of the restriction fingerprint, assembled genomic contig, and radiation hybrid (RH) maps. A total of 59,500 UNIGENE clusters were mapped on the basis of at least three independent criteria as compared with the 30,000 human genes/ESTs currently mapped in Genemap'99. Finally, the extension of the human transcript consensus in this study enabled a greater number of putative functional assignments than the 11,000 annotated entries in UNIGENE. This study reports a draft physical map with annotations for a majority of the human transcripts, called the Human Index of Nonredundant Transcripts (HINT). Such information can be immediately applied to the discovery of new genes and the identification of candidate genes for positional cloning.
Collapse
|
195
|
Greco R, Ouwerkerk PB, Taal AJ, Favalli C, Beguiristain T, Puigdomènech P, Colombo L, Hoge JH, Pereira A. Early and multiple Ac transpositions in rice suitable for efficient insertional mutagenesis. PLANT MOLECULAR BIOLOGY 2001; 46:215-227. [PMID: 11442061 DOI: 10.1023/a:1010607318694] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
A GFP excision assay was developed to monitor the excision of Ac introduced into rice by Agrobacterium-mediated transformation. The presence of a strong double enhancer element of the CaMV 35S promoter adjacent to the Ac promoter induced very early excision, directly after transformation into the plant cell, exemplified by the absence of Ac in the T-DNA loci. Excision fingerprint analysis and characterization of transposition events from related regenerants revealed an inverse correlation between the number of excision events and transposed Ac copies, with single early excisions after transformation generating Ac amplification. New transpositions were generated at a frequency of 15-50% in different lines, yielding genotypes bearing multiple insertions, many of which were inherited in the progeny. The sequence of DNA flanking Ac in three representative lines provided a database of insertion tagged sites suitable for the identification of mutants of sequenced genes that can be examined for phenotypes in a reverse genetics strategy to elucidate gene function. Remarkably, two-thirds of Ac tagged sites showing homology to sequences in public databases were in predicted genes. A clear preference of transposon insertions in genes that are either predicted by protein coding capacity or by similarity to ESTs suggests that the efficiency of recovering knockout mutants of genes could be about three times higher than random. Linked Ac transposition, suitable for targeted tagging, was documented by segregation analysis of a crippled Ac element and by recovery of a set of six insertions in a contiguous sequence of 70 kb from chromosome 6 of rice.
Collapse
Affiliation(s)
- R Greco
- Plant Research International, Wageningen, The Netherlands
| | | | | | | | | | | | | | | | | |
Collapse
|
196
|
Zhuo D, Zhao WD, Wright FA, Yang HY, Wang JP, Sears R, Baer T, Kwon DH, Gordon D, Gibbs S, Dai D, Yang Q, Spitzner J, Krahe R, Stredney D, Stutz A, Yuan B. Assembly, annotation, and integration of UNIGENE clusters into the human genome draft. Genome Res 2001; 11:904-18. [PMID: 11337484 PMCID: PMC311045 DOI: 10.1101/gr.gr-1645r] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The recent release of the first draft of the human genome provides an unprecedented opportunity to integrate human genes and their functions in a complete positional context. However, at least three significant technical hurdles remain: first, to assemble a complete and nonredundant human transcript index; second, to accurately place the individual transcript indices on the human genome; and third, to functionally annotate all human genes. Here, we report the extension of the UNIGENE database through the assembly of its sequence clusters into nonredundant sequence contigs. Each resulting consensus was aligned to the human genome draft. A unique location for each transcript within the human genome was determined by the integration of the restriction fingerprint, assembled genomic contig, and radiation hybrid (RH) maps. A total of 59,500 UNIGENE clusters were mapped on the basis of at least three independent criteria as compared with the 30,000 human genes/ESTs currently mapped in Genemap'99. Finally, the extension of the human transcript consensus in this study enabled a greater number of putative functional assignments than the 11,000 annotated entries in UNIGENE. This study reports a draft physical map with annotations for a majority of the human transcripts, called the Human Index of Nonredundant Transcripts (HINT). Such information can be immediately applied to the discovery of new genes and the identification of candidate genes for positional cloning.
Collapse
Affiliation(s)
- D Zhuo
- Bioinformatics Group, James Cancer Hospital and Solove Research Institute, The Ohio State University, Columbus, Ohio 43210, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
197
|
Page NM, Woods RJ, Lowry PJ. A regulatory role for neurokinin B in placental physiology and pre-eclampsia. REGULATORY PEPTIDES 2001; 98:97-104. [PMID: 11231038 DOI: 10.1016/s0167-0115(00)00239-1] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Tachykinin dogma has assumed, so far, that neurokinin B (NKB) is a neuropeptide that is not produced in any peripheral tissue even though its endogenous receptor, NK3, has been found in a number of locations throughout the human body. We have found an abundant source of peripheral NKB in the human and rat placenta. In this review we describe the discovery of NKB in the placenta and examine its possible role in placental physiology and pre-eclampsia (PE). Excessive secretion of placental NKB into the maternal circulation during the third trimester of pregnancy has been found in women suffering from PE. This may provide the key to the cause of the multiple and complex symptoms associated with this potentially life-threatening illness. We also reveal the structural organisation of the human NKB gene for the first time as well as discussing putative mechanisms for its control.
Collapse
Affiliation(s)
- N M Page
- School of Animal and Microbial Sciences, The University of Reading, RG6 6AJ, Reading, UK
| | | | | |
Collapse
|
198
|
Abstract
Transgenic crops are very much in the news due to the increasing public debate on their acceptance. In the scientific community though, transgenic plants are proving to be powerful tools to study various aspects of plant sciences. The emerging scientific revolution sparked by genomics based technologies is producing enormous amounts of DNA sequence information that, together with plant transformation methodology, is opening up new experimental opportunities for functional genomics analysis. An overview is provided here on the use of transgenic technology for the functional analysis of plant genes in model plants and a link made to their utilization in transgenic crops. In transgenic plants, insertional mutagenesis using heterologous maize transposons or Agrobacterium mediated T-DNA insertions, have been valuable tools for the identification and isolation of genes that display a mutant phenotype. To discover functions of genes that do not display phenotypes when mutated, insertion sequences have been engineered to monitor or change the expression pattern of adjacent genes. These gene detector insertions can detect adjacent promoters, enhancers or gene exons and precisely reflect the expression pattern of the tagged gene. Activation tag insertions can mis-express the adjacent gene and confer dominant phenotypes that help bridge the phenotype gap. Employment of various forms of gene silencing technology broadens the scope of recovering knockout phenotypes for genes with redundant function. All these transgenic strategies describing gene-phenotype relationships can be addressed by high throughput reverse genetics methods that will help provide functions to the genes discovered by genome sequencing. The gene functions discovered by insertional mutagenesis and silencing strategies along with expression pattern analysis will provide an integrated functional genomics perspective and offer unique applications in transgenic crops.
Collapse
Affiliation(s)
- A Pereira
- Plant Research International, Wageningen, The Netherlands.
| |
Collapse
|
199
|
Rossberg M, Theres K, Acarkan A, Herrero R, Schmitt T, Schumacher K, Schmitz G, Schmidt R. Comparative sequence analysis reveals extensive microcolinearity in the lateral suppressor regions of the tomato, Arabidopsis, and Capsella genomes. THE PLANT CELL 2001; 13:979-88. [PMID: 11283350 PMCID: PMC135537 DOI: 10.1105/tpc.13.4.979] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2000] [Accepted: 01/28/2001] [Indexed: 05/19/2023]
Abstract
A 57-kb region of tomato chromosome 7 harboring five different genes was compared with the sequence of the Arabidopsis genome to search for microsynteny between the genomes of these two species. For all five genes, homologous sequences could be identified in a 30-kb region located on Arabidopsis chromosome 1. Only two inversion events distinguish the arrangement of the five genes in tomato from that in Arabidopsis. Inversions were not detected when the arrangement of the five Arabidopsis genes was compared with the arrangement in the orthologous region of Capsella, a plant closely related to Arabidopsis. These results provide evidence for microcolinearity between closely and distantly related dicotyledonous species. The degree of microcolinearity found can be exploited to localize orthologous genes in Arabidopsis and tomato in an unambiguous way.
Collapse
Affiliation(s)
- M Rossberg
- Max-Delbrück-Laboratorium in der Max-Planck-Gesellschaft, 50829 Cologne, Germany
| | | | | | | | | | | | | | | |
Collapse
|
200
|
Bevan M, Mayer K, White O, Eisen JA, Preuss D, Bureau T, Salzberg SL, Mewes HW. Sequence and analysis of the Arabidopsis genome. CURRENT OPINION IN PLANT BIOLOGY 2001; 4:105-110. [PMID: 11228431 DOI: 10.1016/s1369-5266(00)00144-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
The comprehensive analysis of the genome sequence of the plant Arabidopsis thaliana has been completed recently. The genome sequence and associated analyses provide the foundations for rapid progress in many fields of plant research, such as the exploitation of genetic variation in Arabidopsis ecotypes, the assessment of the transcriptome and proteome, and the association of genome changes at the sequence level with evolutionary processes. Nevertheless, genome sequencing and analysis are only the first steps towards a new plant biology. Much remains to be done to refine the analysis of encoded genes, to define the functions of encoded proteins systematically, and to establish new generations of databases to capture and relate diverse data sets generated in widely distributed laboratories.
Collapse
Affiliation(s)
- M Bevan
- Molecular Genetics Department, John Innes Centre, Colney Lane, NR4 7UH, Norwich, UK.
| | | | | | | | | | | | | | | |
Collapse
|