101
|
Adams MD, Kerlavage AR, Kelley JM, Gocayne JD, Fields C, Fraser CM, Venter JC. A model for high-throughput automated DNA sequencing and analysis core facilities. Nature 1994; 368:474-5. [PMID: 8133896 DOI: 10.1038/368474a0] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Affiliation(s)
- M D Adams
- Institute for Genomic Research, Gaithersburg, Maryland 20878
| | | | | | | | | | | | | |
Collapse
|
102
|
Wilson R, Ainscough R, Anderson K, Baynes C, Berks M, Bonfield J, Burton J, Connell M, Copsey T, Cooper J. 2.2 Mb of contiguous nucleotide sequence from chromosome III of C. elegans. Nature 1994; 368:32-8. [PMID: 7906398 DOI: 10.1038/368032a0] [Citation(s) in RCA: 1245] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
As part of our effort to sequence the 100-megabase (Mb) genome of the nematode Caenorhabditis elegans, we have completed the nucleotide sequence of a contiguous 2,181,032 base pairs in the central gene cluster of chromosome III. Analysis of the finished sequence has indicated an average density of about one gene per five kilobases; comparison with the public sequence databases reveals similarities to previously known genes for about one gene in three. In addition, the genomic sequence contains several intriguing features, including putative gene duplications and a variety of other repeats with potential evolutionary implications.
Collapse
Affiliation(s)
- R Wilson
- Department of Genetics, Washington University School of Medicine, St Louis, Missouri 63110
| | | | | | | | | | | | | | | | | | | |
Collapse
|
103
|
Matsubara K, Okubo K. Identification of new genes by systematic analysis of cDNAs and database construction. Curr Opin Biotechnol 1993; 4:672-7. [PMID: 7764463 DOI: 10.1016/0958-1669(93)90048-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The large-scale collection of partial cDNA sequences is becoming a powerful tool in biology. Similarity or motif searches in DNA databases using these partial cDNA sequences have facilitated the discovery of new genes of interest. By collecting and registering large numbers of partial sequences with a well designed non-biased cDNA library, an expression profile of active genes in a particular tissue can be obtained. Tissue-specific or stage-specific genes can be discovered by comparing the profiles from different tissues or from a tissue at different stages of development, respectively. The compilation of such expression profiles enables genes to be mapped to the tissue(s) where they are actively transcribed. The large-scale collation of gene sequences actively expressed in the body into databases complements efforts directed towards the structural analysis of the genome, with the ultimate aim of decoding all the genetic information carried in the human genome. This cDNA strategy is also being widely applied to organisms other than man.
Collapse
Affiliation(s)
- K Matsubara
- Institute for Molecular and Cellular Biology, Osaka University, Japan
| | | |
Collapse
|
104
|
Reddy GR, Chakrabarti D, Schuster SM, Ferl RJ, Almira EC, Dame JB. Gene sequence tags from Plasmodium falciparum genomic DNA fragments prepared by the "genease" activity of mung bean nuclease. Proc Natl Acad Sci U S A 1993; 90:9867-71. [PMID: 8234327 PMCID: PMC47673 DOI: 10.1073/pnas.90.21.9867] [Citation(s) in RCA: 28] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
A genes-first approach to genome sequencing is described which efficiently generates gene sequence tags from genomic DNA. Mung bean nuclease (EC 3.1.30.1) cleaves the genomic DNA of many organisms before and after genes and within some introns. Analysis of gene sequence tags prepared from mung bean nuclease-digested Plasmodium falciparum DNA demonstrates that this method has several advantages over the popular cDNA expressed sequence tag approach. To date, 673 sequence tags containing over 215 kb of sequence have been generated from 400 clones. Sixty clones (15%) have significant similarity to sequences in the protein and translated nucleic acid data bases. These represent 51 unique genes, of which only 5 encode previously known P. falciparum proteins. The identified proteins include those expressed in erythrocytic, exoerythrocytic, and gametocytic stages of the parasite. Thirty percent of clones identified appear to carry complete coding regions. The spacer DNA separating genes is rarely cloned. These gene sequence tags will form a useful data base from which to initiate projects to develop new therapeutics, vaccines, and strategies to control human malaria.
Collapse
Affiliation(s)
- G R Reddy
- Department of Infectious Diseases, College of Veterinary Medicine, University of Florida, Gainesville 32611
| | | | | | | | | | | |
Collapse
|
105
|
White O, Dunning T, Sutton G, Adams M, Venter JC, Fields C. A quality control algorithm for DNA sequencing projects. Nucleic Acids Res 1993; 21:3829-38. [PMID: 8367301 PMCID: PMC309901 DOI: 10.1093/nar/21.16.3829] [Citation(s) in RCA: 27] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known database. We have developed a statistical test to identify heterologous sequences that is based on the differences in hexamer composition of DNA from different organisms. This test does not require that sequences similar to potential heterologous contaminants are present in the database, and can in principle detect contamination by previously unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a quality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C.elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contamination in some public database sequences annotated as human. Results obtained with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets.
Collapse
Affiliation(s)
- O White
- Institute for Genomic Research, Gaithersburg, MD 20878
| | | | | | | | | | | |
Collapse
|
106
|
|
107
|
Fyrberg C, Fyrberg E. ADrosophila homologue of theSchizosaccharomyces pombe act2 gene. Biochem Genet 1993. [DOI: 10.1007/bf00553175] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
108
|
Adams MD, Kerlavage AR, Fields C, Venter JC. 3,400 new expressed sequence tags identify diversity of transcripts in human brain. Nat Genet 1993; 4:256-67. [PMID: 8358434 DOI: 10.1038/ng0793-256] [Citation(s) in RCA: 235] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
We present the results of the partial sequencing of over 3,400 expressed sequence tags (ESTs) from human brain cDNA clones, which increases the number of distinct genes expressed in the brain, that are represented by ESTs, to about 6,000. By choosing clones in an unbiased manner, it is possible to construct a profile of the transcriptional activity of the brain at different stages. Proteins that comprise the cytoskeleton are the most abundant; however, a large variety of regulatory proteins are also seen. About half of the ESTs predicted to contain a protein-coding region have no matches in the public peptide databases and may represent new gene families.
Collapse
Affiliation(s)
- M D Adams
- Receptor Biochemistry and Molecular Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892
| | | | | | | |
Collapse
|
109
|
|
110
|
Caplan AJ, Cyr DM, Douglas MG. Eukaryotic homologues of Escherichia coli dnaJ: a diverse protein family that functions with hsp70 stress proteins. Mol Biol Cell 1993; 4:555-63. [PMID: 8374166 PMCID: PMC300962 DOI: 10.1091/mbc.4.6.555] [Citation(s) in RCA: 186] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Affiliation(s)
- A J Caplan
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill 27599-7260
| | | | | |
Collapse
|
111
|
Hartmann E, Görlich D, Kostka S, Otto A, Kraft R, Knespel S, Bürger E, Rapoport TA, Prehn S. A tetrameric complex of membrane proteins in the endoplasmic reticulum. EUROPEAN JOURNAL OF BIOCHEMISTRY 1993; 214:375-81. [PMID: 7916687 DOI: 10.1111/j.1432-1033.1993.tb17933.x] [Citation(s) in RCA: 128] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The translocation site (translocon), at which nascent polypeptides pass through the endoplasmic reticulum membrane, contains a component previously called 'signal sequence receptor' that is now renamed as 'translocon-associated protein' (TRAP). Two glycosylated subunits of the TRAP complex have been identified before (alpha and beta subunits). We now show that TRAP complex is actually comprised of four membrane proteins (alpha, beta, gamma, delta), present in a stoichiometric relation, which are genuine neighbours in intact microsomes. The amino acid sequences of the additional, non-glycosylated subunits were deduced from cloning of the corresponding cDNAs. The delta subunit spans the membrane only once and has its major portion, containing a disulfide bridge, at the lumenal side. The gamma subunit is predicted to span the membrane four times.
Collapse
Affiliation(s)
- E Hartmann
- Max-Delbrück-Center for Molecular Medicine, Berlin-Buch, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
112
|
|
113
|
Yochem J, Greenwald I. A gene for a low density lipoprotein receptor-related protein in the nematode Caenorhabditis elegans. Proc Natl Acad Sci U S A 1993; 90:4572-6. [PMID: 8506301 PMCID: PMC46554 DOI: 10.1073/pnas.90.10.4572] [Citation(s) in RCA: 96] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
A >23-kb gene that encodes a large integral membrane protein with a predicted structure similar to that of the low density lipoprotein (LDL) receptor-related protein (LRP) of mammals has been isolated and sequenced from the free-living nematode Caenorhabditis elegans. The 4753-amino acid predicted C. elegans product shares a nearly identical number and arrangement of amino acid sequence motifs with human LRP, and several exons of the C. elegans LRP gene correspond to exons of related parts of the human LDL receptor gene. The existence of an apparent homolog of LRP in C. elegans offers the possibility of genetic analysis of the in vivo roles of LRP and of the relationship between protein structure and function in a simple model organism.
Collapse
Affiliation(s)
- J Yochem
- Department of Molecular Biology, Princeton University, NJ 08544
| | | |
Collapse
|
114
|
|
115
|
|
116
|
|
117
|
McCombie WR, Martin-Gallardo A, Gocayne JD, FitzGerald M, Dubnick M, Kelley JM, Castilla L, Liu LI, Wallace S, Trapp S. Expressed genes, Alu repeats and polymorphisms in cosmids sequenced from chromosome 4p16.3. Nat Genet 1992; 1:348-53. [PMID: 1338771 DOI: 10.1038/ng0892-348] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The sequences of three cosmids (90 kilobases) from the Huntington's disease region in chromosome 4p16.3 have been determined. A 30,837 base overlap of DNA sequenced from two individuals was found to contain 72 DNA sequence polymorphisms, an average of 2.3 polymorphisms per kilobase (kb). The assembled 58 kb contig contains 62 Alu repeats, and eleven predicted exons representing at least three expressed genes that encode previously unidentified proteins. Each of these genes is associated with a CpG island. The structure of one of the new genes, hda1-1, has been determined by characterizing cDNAs from a placental library. This gene is expressed in a variety of tissues and may encode a novel housekeeping gene.
Collapse
Affiliation(s)
- W R McCombie
- Section of Receptor Biochemistry and Molecular Biology, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892
| | | | | | | | | | | | | | | | | | | |
Collapse
|
118
|
Affiliation(s)
- D L Roussell
- Department of Molecular Microbiology and Immunology, School of Medicine, University of Missouri, Columbia 65212
| | | |
Collapse
|
119
|
|
120
|
|