1
|
Wei X, Xu Z, Wang G, Hou J, Ma X, Liu H, Liu J, Chen B, Luo M, Xie B, Li R, Ruan J, Liu X. pBACode: a random-barcode-based high-throughput approach for BAC paired-end sequencing and physical clone mapping. Nucleic Acids Res 2017; 45:e52. [PMID: 27980066 PMCID: PMC5397170 DOI: 10.1093/nar/gkw1261] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 12/09/2016] [Indexed: 12/14/2022] Open
Abstract
Applications that use Bacterial Artificial Chromosome (BAC) libraries often require paired-end sequences and knowledge of the physical location of each clone in plates. To facilitate obtaining this information in high-throughput, we generated pBACode vectors: a pool of BAC cloning vectors, each with a pair of random barcodes flanking its cloning site. In a pBACode BAC library, the BAC ends and their linked barcodes can be sequenced in bulk. Barcode pairs are determined by sequencing the empty pBACode vectors, which allows BAC ends to be paired according to their barcodes. For physical clone mapping, the barcodes are used as unique markers for their linked genomic sequence. After multi-dimensional pooling of BAC clones, the barcodes are sequenced and deconvoluted to locate each clone. We generated a pBACode library of 94,464 clones for the flounder Paralichthys olivaceus and obtained paired-end sequence from 95.4% of the clones. Incorporating BAC paired-ends into the genome preassembly improved its continuity by over 10-fold. Furthermore, we were able to use the barcodes to map the physical locations of each clone in just 50 pools, with up to 11 808 clones per pool. Our physical clone mapping located 90.2% of BAC clones, enabling targeted characterization of chromosomal rearrangements.
Collapse
Affiliation(s)
- Xiaolin Wei
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.,PTN (Peking University-Tsinghua University-National Institute of Biological Sciences) Joint Graduate Program, Beijing 100084, China.,School of Life Sciences, Peking University, Beijing 100084, China
| | - Zhichao Xu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.,PTN (Peking University-Tsinghua University-National Institute of Biological Sciences) Joint Graduate Program, Beijing 100084, China
| | - Guixing Wang
- Beidaihe Central Experiment Station, Chinese Academy of Fishery Sciences, Qinhuangdao 066100, China
| | - Jilun Hou
- Beidaihe Central Experiment Station, Chinese Academy of Fishery Sciences, Qinhuangdao 066100, China
| | - Xiaopeng Ma
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China.,PTN (Peking University-Tsinghua University-National Institute of Biological Sciences) Joint Graduate Program, Beijing 100084, China
| | - Haijin Liu
- Beidaihe Central Experiment Station, Chinese Academy of Fishery Sciences, Qinhuangdao 066100, China
| | - Jiadong Liu
- National Key Laboratory of Crop Genetic Improvement and College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Bo Chen
- National Key Laboratory of Crop Genetic Improvement and College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Meizhong Luo
- National Key Laboratory of Crop Genetic Improvement and College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Bingyan Xie
- Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Ruiqiang Li
- Novogene Bioinformatics Institute, Beijing 100083, China
| | - Jue Ruan
- Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong 518120, China
| | - Xiao Liu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
2
|
Li Z, Linghu E, Cheng J. Screening of hepatocyte proteins binding with the middle surface protein of the hepatitis B virus by the yeast two-hybrid system. Mol Med Rep 2014; 9:2342-6. [PMID: 24676405 DOI: 10.3892/mmr.2014.2069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2013] [Accepted: 02/20/2014] [Indexed: 11/05/2022] Open
Abstract
The effect of the middle hepatitis B virus surface protein (MHBs) remains to be elucidated. To investigate the biological function of the MHBs protein, the present study performed yeast two-hybrid screening to search for proteins that interact with the MHBs protein in hepatocytes. The bait plasmid expressing the MHBs protein was constructed by cloning the gene of the MHBs protein into pGBKT7, then the recombinant plasmid DNA was transformed into AH109 yeast (a type). The transformed yeast AH109 was mated with yeast Y187 (α type) containing the liver cDNA library plasmid in 2X yeast peptone dextrose adenine (YPDA) medium. The mated diploid yeast was plated on quadruple dropout medium (SD/-Trp-Leu-His-Ade) containing X-α-gal for selection and screening. Following extracting and sequencing of the plasmids from positive (blue) colonies, the sequence analysis was conducted and analyzed by bioinformatics methods. Two colonies were selected and sequenced. Among them, one was the human DNA sequence from the clone RP11-490D19 on chromosome 9 and the other was homo sapiens 12 BAC RP11-180M15 (Roswell Park Cancer Institute Human BAC Library). The yeast two-hybrid system is an effective method for identifying hepatocyte proteins that interact with MHBs. The MHBs protein binds with different proteins suggesting that it has multiple functions in vivo.
Collapse
Affiliation(s)
- Zhiqun Li
- Department of Gastroenterology and Hepatology, Chinese PLA General Hospital, Beijing 100853, P.R. China
| | - Enqiang Linghu
- Department of Gastroenterology and Hepatology, Chinese PLA General Hospital, Beijing 100853, P.R. China
| | - Jun Cheng
- Institute of Infectious Diseases, Ditan Hospital, Capital Medical University, Beijing 100015, P.R. China
| |
Collapse
|
3
|
Liu GE, Alkan C, Jiang L, Zhao S, Eichler EE. Comparative analysis of Alu repeats in primate genomes. Genome Res 2009; 19:876-85. [PMID: 19411604 DOI: 10.1101/gr.083972.108] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Using bacteria artificial chromosome (BAC) end sequences (16.9 Mb) and high-quality alignments of genomic sequences (17.4 Mb), we performed a global assessment of the divergence distributions, phylogenies, and consensus sequences for Alu elements in primates including lemur, marmoset, macaque, baboon, and chimpanzee as compared to human. We found that in lemurs, Alu elements show a broader and more symmetric sequence divergence distribution, suggesting a steady rate of Alu retrotransposition activity among prosimians. In contrast, Alu elements in anthropoids show a skewed distribution shifted toward more ancient elements with continual declining rates in recent Alu activity along the hominoid lineage of evolution. Using an integrated approach combining mutation profile and insertion/deletion analyses, we identified nine novel lineage-specific Alu subfamilies in lemur (seven), marmoset (one), and baboon/macaque (one) containing multiple diagnostic mutations distinct from their human counterparts-Alu J, S, and Y subfamilies, respectively. Among these primates, we show that that the lemur has the lowest density of Alu repeats (55 repeats/Mb), while marmoset has the greatest abundance (188 repeats/Mb). We estimate that approximately 70% of lemur and 16% of marmoset Alu elements belong to lineage-specific subfamilies. Our analysis has provided an evolutionary framework for further classification and refinement of the Alu repeat phylogeny. The differences in the distribution and rates of Alu activity have played an important role in subtly reshaping the structure of primate genomes. The functional consequences of these changes among the diverse primate lineages over such short periods of evolutionary time are an important area of future investigation.
Collapse
Affiliation(s)
- George E Liu
- USDA, ARS, ANRI, Bovine Functional Genomics Laboratory, Beltsville, MD 20705, USA.
| | | | | | | | | |
Collapse
|
4
|
Ratnakumar A, Barris W, McWilliam S, Brauning R, McEwan JC, Snelling WM, Dalrymple BP. A multiway analysis for identifying high integrity bovine BACs. BMC Genomics 2009; 10:46. [PMID: 19166603 PMCID: PMC2660975 DOI: 10.1186/1471-2164-10-46] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2008] [Accepted: 01/23/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In large genomics projects involving many different types of analyses of bacterial artificial chromosomes (BACs), such as fingerprinting, end sequencing (BES) and full BAC sequencing there are many opportunities for the identities of BACs to become confused. However, by comparing the results from the different analyses, inconsistencies can be identified and a set of high integrity BACs preferred for future research can be defined. RESULTS The location of each bovine BAC in the BAC fingerprint-based genome map and in the genome assembly were compared based on the reported BESs, and for a smaller number of BACs the full sequence. BACs with consistent positions in all three datasets, or if the full sequence was not available, for both the fingerprint map and BES-based alignments, were deemed to be correctly positioned. BACs with consistent BES-based and fingerprint-based locations, but with conflicting locations based on the fully sequenced BAC, appeared to have been misidentified during sequencing, and included a number of apparently swapped BACs. Inconsistencies between BES-based and fingerprint map positions identified thirty one plates from the CHORI-240 library that appear to have suffered substantial systematic problems during the end-sequencing of the BACs. No systematic problems were identified in the fingerprinting of the BACs. Analysis of BACs overlapping in the assembly identified a small overrepresentation of clones with substantial overlap in the library and a substantial enrichment of highly overlapping BACs on the same plate in the CHORI-240 library. More than half of these BACs appear to have been present as duplicates on the original BAC-library plates and thus should be avoided in subsequent projects. CONCLUSION Our analysis shows that approximately 95% of the bovine CHORI-240 library clones with both a BAC fingerprint and two BESs mapping to the genome in the expected orientations (approximately 27% of all BACs) have consistent locations in the BAC fingerprint map and the genome assembly. We have developed a broadly applicable methodology for checking the integrity of BAC-based datasets even where only incomplete and partially assembled genomic sequence is available.
Collapse
Affiliation(s)
- Abhirami Ratnakumar
- CSIRO Livestock Industries, 306 Carmody Road, St. Lucia, QLD 4067, Australia.
| | | | | | | | | | | | | |
Collapse
|
5
|
Murakami K, Toyoda A, Hattori M, Kuroki Y, Fujiyama A, Kojima T, Matsuda M, Sakaki Y, Yamamoto MT. BAC library construction and BAC end sequencing of five Drosophila species: the comparative map with the D. melanogaster genome. Genes Genet Syst 2008; 83:245-56. [PMID: 18670136 DOI: 10.1266/ggs.83.245] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
We constructed and characterized arrayed bacterial artificial chromosome (BAC) libraries of five Drosophila species (D. melanogaster, D. simulans, D. sechellia, D. auraria, and D. ananassae), which are genetically well characterized in the studies of meiosis, evolution, population genetics, and developmental biology. The BAC libraries comprise 8,000 to 12,500 clones for each species, estimated to cover the most of the genomes. We sequenced both ends of most of these BAC clones with a success rate of 91%. Of these, 53,701 clones consisting of non-repetitive BAC end sequences (BESs) were mapped with reference of the public D. melanogaster genome sequences. The BES mapping estimated that the BAC libraries of D. auraria and D. ananassae covered 47% and 57% of the D. melanogaster genome, respectively, and those of D. melanogaster, D. sechellia, and D. simulans covered 94-97%. The low coverage by BESs of D. auraria and D. ananassae may be due to the high sequence divergence with D. melanogaster. From the comparative BES mapping, 111 possible breakpoints of chromosomal rearrangements were identified in these four species. The breakpoints of the major chromosome rearrangement between D. simulans and D. melanogaster on the third chromosome were determined within 20 kb in 84E and 30 kb in 93E/F. Corresponding breakpoints were also identified in D. sechellia. The BAC clones described here will be an important addition to the Drosophila genomic resources.
Collapse
|
6
|
Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AWC, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC. The diploid genome sequence of an individual human. PLoS Biol 2008; 5:e254. [PMID: 17803354 PMCID: PMC1964779 DOI: 10.1371/journal.pbio.0050254] [Citation(s) in RCA: 1114] [Impact Index Per Article: 69.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2007] [Accepted: 07/30/2007] [Indexed: 01/20/2023] Open
Abstract
Presented here is a genome sequence of an individual human. It was produced from approximately 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2-206 bp), 292,102 heterozygous insertion/deletion events (indels)(1-571 bp), 559,473 homozygous indels (1-82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.
Collapse
Affiliation(s)
- Samuel Levy
- J. Craig Venter Institute, Rockville, Maryland, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Xu P, Wang S, Liu L, Peatman E, Somridhivej B, Thimmapuram J, Gong G, Liu Z. Channel catfish BAC-end sequences for marker development and assessment of syntenic conservation with other fish species. Anim Genet 2006; 37:321-6. [PMID: 16879340 DOI: 10.1111/j.1365-2052.2006.01453.x] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In the present study, 25 195 BAC ends for channel catfish (Ictalurus punctatus) were sequenced, generating 20 366 clean BAC-end sequences (BES), with an average read length of 557 bp after trimming. A total of 11 414 601 bp were generated, representing approximately 1.2% of the catfish genome. Based on this survey, the catfish genome was found to be highly AT-rich, with 60.7% A+T and 39.3% G+C. Approximately 12% of the catfish genome consisted of dispersed repetitive elements, with the Tc1/mariner transposons making up the largest percentage by base pair (4.57%). Microsatellites were detected in 17.5% of BES. Catfish BACs were anchored to the zebrafish and Tetraodon genome sequences by BLASTN, generating 16% and 8.2% significant hits (E < e(-5)) respectively. A total of 1074 and 773 significant hits were unique to the zebrafish and Tetraodon genomes, respectively, of which 417 and 406, respectively, were identified as known genes in other species, providing a major genome resource for comparative genomic mapping.
Collapse
Affiliation(s)
- P Xu
- The Fish Molecular Genetics and Biotechnology Laboratory, Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences, Aquatic Genomics Unit, Auburn University, Auburn, AL 36849, USA
| | | | | | | | | | | | | | | |
Collapse
|
8
|
Jang W, Yonescu R, Knutsen T, Brown T, Reppert T, Sirotkin K, Schuler GD, Ried T, Kirsch IR. Linking the human cytogenetic map with nucleotide sequence: the CCAP clone set. ACTA ACUST UNITED AC 2006; 168:89-97. [PMID: 16843097 DOI: 10.1016/j.cancergencyto.2006.01.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2005] [Accepted: 01/03/2006] [Indexed: 11/18/2022]
Abstract
We present the completed dataset and clone repository of the Cancer Chromosome Aberration Project (CCAP), an initiative developed and funded through the intramural program of the U.S. National Cancer Institute, to provide seamless linkage of human cytogenetic markers with the primary nucleotide sequence of the human genome. Spaced at 1-2 Mb intervals across the human genome, 1,339 bacterial artificial chromosome (BAC) clones have been localized to chromosomal bands through high-resolution fluorescence in situ hybridization (FISH) mapping. Of these clones, 99.8% can be positioned on the primary human genome sequence and 95% are placed at or close to their precise nucleotide starts and stops. This dataset can be studied and manipulated within generally available public Web sites. The clones are available from a commercial repository. The CCAP BAC clone set provides anchors for the interrogation of gene and sequence involvement in oncogenic and developmental disorders when the starting point is the recognition of a structural, numerical, or interstitial chromosomal aberration. This dataset also provides a current view of the quality and coherence of the available genome sequence and insight into the nucleotide and three-dimensional structures that manifest as Giemsa light and dark chromosomal banding patterns.
Collapse
Affiliation(s)
- Wonhee Jang
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Leeb T, Vogl C, Zhu B, de Jong PJ, Binns MM, Chowdhary BP, Scharfe M, Jarek M, Nordsiek G, Schrader F, Blöcker H. A human-horse comparative map based on equine BAC end sequences. Genomics 2006; 87:772-6. [PMID: 16603334 DOI: 10.1016/j.ygeno.2006.03.002] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2005] [Revised: 12/15/2005] [Accepted: 03/04/2006] [Indexed: 11/18/2022]
Abstract
In an effort to increase the density of sequence-based markers for the horse genome we generated 9473 BAC end sequences (BESs) from the CHORI-241 BAC library with an average read length of 677 bp. BLASTN searches with the BESs revealed 4036 meaningful hits (E <or= 10(-5)) in the human genome that provide useful markers for the human-horse comparative map. The 4036 BLASTN hits allowed the anchoring of 3079 BAC clones to the human genome, on average one corresponding equine BAC clone per megabase of human DNA. We used the BLASTN anchored BESs for an in silico prediction of the gene content and chromosome assignment of comparatively mapped equine BAC clones. As a first verification of our in silico mapping strategy we placed 19 equine BESs with matches to HSA6 onto the RH map. All markers were assigned to the predicted localizations on ECA10, ECA20, and ECA31, respectively.
Collapse
Affiliation(s)
- Tosso Leeb
- Institute of Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559 Hannover, Germany.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Hong JM, Chae SH, Oriero N, Larkin DM, Choi CB, Lee JY, Lewin HA, Bae JH, Choi I, Yeo JS. Identification and chromosomal localization of repeat sequences through BAC end sequence analysis in Korean cattle. J Genet 2005; 84:329-35. [PMID: 16385167 DOI: 10.1007/bf02715805] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- J M Hong
- Institute of Biotechnology and Department of Biotechnology, Gyeongsan 712-749, Korea
| | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Milosavljevic A, Harris RA, Sodergren EJ, Jackson AR, Kalafus KJ, Hodgson A, Cree A, Dai W, Csuros M, Zhu B, de Jong PJ, Weinstock GM, Gibbs RA. Pooled genomic indexing of rhesus macaque. Genome Res 2005; 15:292-301. [PMID: 15687293 PMCID: PMC546531 DOI: 10.1101/gr.3162505] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Pooled genomic indexing (PGI) is a method for mapping collections of bacterial artificial chromosome (BAC) clones between species by using a combination of clone pooling and DNA sequencing. PGI has been used to map a total of 3858 BAC clones covering approximately 24% of the rhesus macaque (Macaca mulatta) genome onto 4178 homologous loci in the human genome. A number of intrachromosomal rearrangements were detected by mapping multiple segments within the individual rhesus BACs onto multiple disjoined loci in the human genome. Transversal pooling designs involving shuffled BAC arrays were employed for robust mapping even with modest DNA sequence read coverage. A further innovation, short-tag pooled genomic indexing (ST-PGI), was also introduced to further improve the economy of mapping by sequencing multiple, short, mapable tags within a single sequencing reaction.
Collapse
Affiliation(s)
- Aleksandar Milosavljevic
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Zhao S, Shetty J, Hou L, Delcher A, Zhu B, Osoegawa K, de Jong P, Nierman WC, Strausberg RL, Fraser CM. Human, mouse, and rat genome large-scale rearrangements: stability versus speciation. Genome Res 2004; 14:1851-60. [PMID: 15364903 PMCID: PMC524408 DOI: 10.1101/gr.2663304] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Using paired-end sequences from bacterial artificial chromosomes, we have constructed high-resolution synteny and rearrangement breakpoint maps among human, mouse, and rat genomes. Among the >300 syntenic blocks identified are segments of over 40 Mb without any detected interspecies rearrangements, as well as regions with frequently broken synteny and extensive rearrangements. As closely related species, mouse and rat share the majority of the breakpoints and often have the same types of rearrangements when compared with the human genome. However, the breakpoints not shared between them indicate that mouse rearrangements are more often interchromosomal, whereas intrachromosomal rearrangements are more prominent in rat. Centromeres may have played a significant role in reorganizing a number of chromosomes in all three species. The comparison of the three species indicates that genome rearrangements follow a path that accommodates a delicate balance between maintaining a basic structure underlying all mammalian species and permitting variations that are necessary for speciation.
Collapse
Affiliation(s)
- Shaying Zhao
- Institute for Genomic Research, Rockville, Maryland 20850, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Krzywinski M, Bosdet I, Smailus D, Chiu R, Mathewson C, Wye N, Barber S, Brown-John M, Chan S, Chand S, Cloutier A, Girn N, Lee D, Masson A, Mayo M, Olson T, Pandoh P, Prabhu AL, Schoenmakers E, Tsai M, Albertson D, Lam W, Choy CO, Osoegawa K, Zhao S, de Jong PJ, Schein J, Jones S, Marra MA. A set of BAC clones spanning the human genome. Nucleic Acids Res 2004; 32:3651-60. [PMID: 15247347 PMCID: PMC484185 DOI: 10.1093/nar/gkh700] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2004] [Revised: 06/22/2004] [Accepted: 06/22/2004] [Indexed: 11/15/2022] Open
Abstract
Using the human bacterial artificial chromosome (BAC) fingerprint-based physical map, genome sequence assembly and BAC end sequences, we have generated a fingerprint-validated set of 32 855 BAC clones spanning the human genome. The clone set provides coverage for at least 98% of the human fingerprint map, 99% of the current assembled sequence and has an effective resolving power of 79 kb. We have made the clone set publicly available, anticipating that it will generally facilitate FISH or array-CGH-based identification and characterization of chromosomal alterations relevant to disease.
Collapse
Affiliation(s)
- Martin Krzywinski
- BC Cancer Agency Genome Sciences Center and BC Cancer Agency, Vancouver, BC V5Z 4E6, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Hong CP, Lee SJ, Park JY, Plaha P, Park YS, Lee YK, Choi JE, Kim KY, Lee JH, Lee J, Jin H, Choi SR, Lim YP. Construction of a BAC library of Korean ginseng and initial analysis of BAC-end sequences. Mol Genet Genomics 2004; 271:709-16. [PMID: 15197578 DOI: 10.1007/s00438-004-1021-9] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2003] [Accepted: 04/30/2004] [Indexed: 10/26/2022]
Abstract
We estimated the genome size of Korean ginseng (Panax ginseng C.A. Meyer), a medicinal herb, constructed a HindIII BAC library, and analyzed BAC-end sequences to provide an initial characterization of the library. The 1C nuclear DNA content of Korean ginseng was estimated to be 3.33 pg (3.12 x 10(3) Mb). The BAC library consists of 106,368 clones with an average size of 98.61 kb, amounting to 3.34 genome equivalents. Sequencing of 2167 BAC clones generated 2492 BAC-end sequences with an average length of 400 bp. Analysis using BLAST and motif searches revealed that 10.2%, 20.9% and 3.8% of the BAC-end sequences contained protein-coding regions, transposable elements and microsatellites, respectively. A comparison of the functional categories represented by the protein-coding regions found in BAC-end sequences with those of Arabidopsis revealed that proteins pertaining to energy metabolism, subcellular localization, cofactor requirement and transport facilitation were more highly represented in the P. ginseng sample. In addition, a sequence encoding a glucosyltransferase-like protein implicated in the ginsenoside biosynthesis pathway was also found. The majority of the transposable element sequences found belonged to the gypsy type (67.6%), followed by copia (11.7%) and LINE (8.0%) retrotransposons, whereas DNA transposons accounted for only 2.1% of the total in our sequence sample. Higher levels of transposable elements than protein-coding regions suggest that mobile elements have played an important role in the evolution of the genome of Korean ginseng, and contributed significantly to its complexity. We also identified 103 microsatellites with 3-38 repeats in their motifs. The BAC library and BAC-end sequences will serve as a useful resource for physical mapping, positional cloning and genome sequencing of P. ginseng.
Collapse
Affiliation(s)
- C P Hong
- Department of Horticulture, and Genome Research Center, Chungnam National University, 305-764, Daejeon, Korea
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Istrail S, Sutton GG, Florea L, Halpern AL, Mobarry CM, Lippert R, Walenz B, Shatkay H, Dew I, Miller JR, Flanigan MJ, Edwards NJ, Bolanos R, Fasulo D, Halldorsson BV, Hannenhalli S, Turner R, Yooseph S, Lu F, Nusskern DR, Shue BC, Zheng XH, Zhong F, Delcher AL, Huson DH, Kravitz SA, Mouchard L, Reinert K, Remington KA, Clark AG, Waterman MS, Eichler EE, Adams MD, Hunkapiller MW, Myers EW, Venter JC. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc Natl Acad Sci U S A 2004; 101:1916-21. [PMID: 14769938 PMCID: PMC357027 DOI: 10.1073/pnas.0307971100] [Citation(s) in RCA: 136] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304-1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860-921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats.
Collapse
Affiliation(s)
- Sorin Istrail
- Applied Biosystems, 45 West Gude Drive, Rockville, MD 20850, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, Repping S, Pyntikova T, Ali J, Bieri T, Chinwalla A, Delehaunty A, Delehaunty K, Du H, Fewell G, Fulton L, Fulton R, Graves T, Hou SF, Latrielle P, Leonard S, Mardis E, Maupin R, McPherson J, Miner T, Nash W, Nguyen C, Ozersky P, Pepin K, Rock S, Rohlfing T, Scott K, Schultz B, Strong C, Tin-Wollam A, Yang SP, Waterston RH, Wilson RK, Rozen S, Page DC. The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature 2003; 423:825-37. [PMID: 12815422 DOI: 10.1038/nature01722] [Citation(s) in RCA: 1395] [Impact Index Per Article: 66.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2003] [Accepted: 04/08/2003] [Indexed: 01/06/2023]
Abstract
The male-specific region of the Y chromosome, the MSY, differentiates the sexes and comprises 95% of the chromosome's length. Here, we report that the MSY is a mosaic of heterochromatic sequences and three classes of euchromatic sequences: X-transposed, X-degenerate and ampliconic. These classes contain all 156 known transcription units, which include 78 protein-coding genes that collectively encode 27 distinct proteins. The X-transposed sequences exhibit 99% identity to the X chromosome. The X-degenerate sequences are remnants of ancient autosomes from which the modern X and Y chromosomes evolved. The ampliconic class includes large regions (about 30% of the MSY euchromatin) where sequence pairs show greater than 99.9% identity, which is maintained by frequent gene conversion (non-reciprocal transfer). The most prominent features here are eight massive palindromes, at least six of which contain testis genes.
Collapse
MESH Headings
- Chromosomes, Human, X/genetics
- Chromosomes, Human, Y/genetics
- Crossing Over, Genetic/genetics
- DNA Transposable Elements/genetics
- Euchromatin/genetics
- Evolution, Molecular
- Female
- Gene Amplification/genetics
- Gene Conversion/genetics
- Genes/genetics
- Heterochromatin/genetics
- Humans
- In Situ Hybridization, Fluorescence
- Male
- Models, Genetic
- Multigene Family/genetics
- Organ Specificity
- Pseudogenes/genetics
- Sequence Homology, Nucleic Acid
- Sex Characteristics
- Sex Determination Processes
- Species Specificity
- Testis/metabolism
- Transcription, Genetic/genetics
- Transducin
Collapse
Affiliation(s)
- Helen Skaletsky
- Howard Hughes Medical Institute, Whitehead Institute, and Department of Biology, Massachusetts Institute of Technology, 9 Cambridge Center, Cambridge, Massachusetts 02142, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Liu G, Zhao S, Bailey JA, Sahinalp SC, Alkan C, Tuzun E, Green ED, Eichler EE. Analysis of primate genomic variation reveals a repeat-driven expansion of the human genome. Genome Res 2003; 13:358-68. [PMID: 12618366 PMCID: PMC430288 DOI: 10.1101/gr.923303] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2002] [Accepted: 01/02/2003] [Indexed: 01/04/2023]
Abstract
We performed a detailed analysis of both single-nucleotide and large insertion/deletion events based on large-scale comparison of 10.6 Mb of genomic sequence from lemur, baboon, and chimpanzee to human. Using a human genomic reference, optimal global alignments were constructed from large (>50-kb) genomic sequence clones. These alignments were examined for the pattern, frequency, and nature of mutational events. Whereas rates of single-nucleotide substitution remain relatively constant (1-2 x 10(-9) substitutions/site/year), rates of retrotransposition vary radically among different primate lineages. These differences have lead to a 15%-20% expansion of human genome size over the last 50 million years of primate evolution, 90% of it due to new retroposon insertions. Orthologous comparisons with the chimpanzee suggest that the human genome continues to significantly expand due to shifts in retrotransposition activity. Assuming that the primate genome sequence we have sampled is representative, we estimate that human euchromatin has expanded 30 Mb and 550 Mb compared to the primate genomes of chimpanzee and lemur, respectively.
Collapse
Affiliation(s)
- Ge Liu
- Department of Genetics, Case Western Reserve University School of Medicine and University Hospitals of Cleveland, Cleveland, Ohio 44106, USA
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Poulsen TS, Silahtaroglu AN, Gisselø CG, Tommerup N, Johnsen HE. Detection of illegitimate rearrangements within the immunoglobulin light chain loci in B cell malignancies using end sequenced probes. Leukemia 2002; 16:2148-55. [PMID: 12357370 DOI: 10.1038/sj.leu.2402648] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2002] [Accepted: 05/17/2002] [Indexed: 11/09/2022]
Abstract
Translocations involving the immunoglobulin loci are recurring events of B cell oncogenesis. The majority of translocations involve the immunoglobulin heavy chain (IGH) locus, while a minor part involves the immunoglobulin light chain loci consisting of the kappa light chain (IGK) located at 2p11.2 and the lambda light chain (IGL) located at 22q11.2. We characterised BAC clones, spanning the IGK and IGL loci, for detection of illegitimate rearrangements by fluorescence in situ hybridisation (FISH). Within the IGL region we have identified six end sequenced probes (22M5, 1152K19, 2036J16, 3188M21, 3115E23 and 274M7) covering the variable (IGLV) cluster and two probes (165G5 and 31L9) covering the constant (IGLC) cluster. Within the IGK region four probes (969D7, 316G9, 122B6 and 2575M21) have been identified covering the variable (IGKV) cluster, and one probe (1021F11) covering the IGK constant (IGKC) cluster. A series of 24 cell lines of different origin have been analysed for the presence of translocations involving the immunoglobulin light chain loci by dual-colour FISH where the split of the variable cluster and the constant cluster indicated a translocation. Probes established in this study can be used for universal screening of illegitimate rearrangements within the immunoglobulin light chain loci in B cell malignancies.
Collapse
Affiliation(s)
- T S Poulsen
- The Research Laboratory, Department of Haematology L, Herlev Hospital, University of Copenhagen, Denmark
| | | | | | | | | |
Collapse
|
19
|
Weber JL, David D, Heil J, Fan Y, Zhao C, Marth G. Human diallelic insertion/deletion polymorphisms. Am J Hum Genet 2002; 71:854-62. [PMID: 12205564 PMCID: PMC378541 DOI: 10.1086/342727] [Citation(s) in RCA: 245] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2002] [Accepted: 07/09/2002] [Indexed: 12/20/2022] Open
Abstract
We report the identification and characterization of 2,000 human diallelic insertion/deletion polymorphisms (indels) distributed throughout the human genome. Candidate indels were identified by comparison of overlapping genomic or cDNA sequences. Average confirmation rate for indels with a > or =2-nt allele-length difference was 58%, but the confirmation rate for indels with a 1-nt length difference was only 14%. The vast majority of the human diallelic indels were monomorphic in chimpanzees and gorillas. The ratio of deletionrcolon;insertion mutations was 4.1. Allele frequencies for the indels were measured in Europeans, Africans, Japanese, and Native Americans. New alleles were generally lower in frequency than old alleles. This tendency was most pronounced for the Africans, who are likely to be closest among the four groups to the original modern human population. Diallelic indels comprise approximately 8% of all human polymorphisms. Their abundance and ease of analysis make them useful for many applications.
Collapse
Affiliation(s)
- James L Weber
- Center for Medical Genetics, Marshfield Medical Research Foundation, Marshfield, WI 54449, USA.
| | | | | | | | | | | |
Collapse
|
20
|
Christian SL, McDonough J, Liu Cy CY, Shaikh S, Vlamakis V, Badner JA, Chakravarti A, Gershon ES. An evaluation of the assembly of an approximately 15-Mb region on human chromosome 13q32-q33 linked to bipolar disorder and schizophrenia. Genomics 2002; 79:635-56. [PMID: 11991713 DOI: 10.1006/geno.2002.6765] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The human 13q32-q33 region has been linked to both bipolar disorder and schizophrenia. Before completion of the draft sequences, we developed an approximately 15-Mb comprehensive map for the region extending from D13S1300 to ATA35H12. This map was assembled using publicly available mapping data and sequence-tagged site (STS)-based PCR confirmation. We then compared this map with the NCBI, Celera Genomics, and UCSC Golden Path data in February, June, and September 2001. All data sets showed gaps, misassignment of STSs, and errors in orientation and marker order. Surprisingly, the completed sequences of many bacterial artificial chromosomes (BACs) had been truncated. Of 21 gaps that were detected, 4 were present in both the NCBI and Celera databases. All gaps could be filled using 1-2 BAC clones. A total of 39 loci mapped to additional sites within the human genome, providing evidence of segmental duplications. Additionally, 61 unique cDNA clones were sequenced to increase available transcribed sequence, and 11,353 reference single-nucleotide polymorphisms (SNPs) with an average density of 1 SNP/3720 bases were identified. Overall, integration of the data from multiple sources is still needed for complete assembly of the 13q32-q33 region. (c)
Collapse
Affiliation(s)
- Susan L Christian
- Department of Psychiatry, The University of Chicago, Chicago, Illinois 60637, USA.
| | | | | | | | | | | | | | | |
Collapse
|
21
|
Poulsen TS, Silahtaroglu AN, Gisselø CG, Gaarsdal E, Rasmussen T, Tommerup N, Johnsen HE. Detection of illegitimate rearrangement within the immunoglobulin locus on 14q32.3 in B-cell malignancies using end-sequenced probes. Genes Chromosomes Cancer 2001; 32:265-74. [PMID: 11579466 DOI: 10.1002/gcc.1193] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Translocation involving the immunoglobulin heavy chain (IGH) locus is a recurring event in B-cell oncogenesis. The aim of this study was to characterize clones from bacterial artificial chromosome (BAC) libraries and/or bacteriophage P1 artificial chromosome libraries spanning the IGH locus for detection of illegitimate rearrangement within the region by fluorescence in situ hybridization (FISH). In silico analysis of the IGH variable (IGHV) DNA sequence (NT_001716.v1) was performed to identify BAC probes located within the IGHV cluster. Clones of the constant (IGHC) cluster were found in the literature or at http://www.biologia.uniba.it/rmc/. Validation, orientation, and overlap of these probes were confirmed using interphase-, metaphase-, and fiber-FISH. We have identified seven BAC end-sequenced probes (3087C18, 47P23, 76N15, 12F16, 101G24, 112H5, and 151B17) covering 612 kb of the distal IGHV cluster, which, together with probes covering the IGHC cluster (11771 and 998D24), could be used in interphase nuclei and metaphase chromosome analysis. A visual split of the IGHV and IGHC clusters indicating a translocation was analyzed by dual-color FISH in a series of 21 cell lines of different origins. Translocations were found, as expected, in eight of eight myelomas, four of four lymphomas, none of five leukemias, and none of four Epstein-Barr virus-transformed B-lymphoblastoid cell lines. To summarize, we have established a set of IGHV and IGHC probes that can be used for universal screening of illegitimate rearrangement within the IGH locus in B-cell malignancies. These probes allow for routine FISH analysis to detect this early central oncogenic event.
Collapse
MESH Headings
- Chromosome Banding
- Chromosomes, Artificial, Bacterial/genetics
- Chromosomes, Human, Pair 15/genetics
- Chromosomes, Human, Pair 16/genetics
- DNA Probes/genetics
- Gene Rearrangement, B-Lymphocyte, Heavy Chain
- Genetic Markers/genetics
- Humans
- Immunoglobulin Heavy Chains/genetics
- In Situ Hybridization, Fluorescence/methods
- Lymphoma, B-Cell/genetics
- Molecular Sequence Data
- Nucleic Acid Hybridization/methods
- Translocation, Genetic/genetics
- Tumor Cells, Cultured
Collapse
Affiliation(s)
- T S Poulsen
- Research Laboratory, Department of Haematology L, Herlev Hospital, University of Copenhagen, Denmark.
| | | | | | | | | | | | | |
Collapse
|
22
|
Zhao S, Shatsman S, Ayodeji B, Geer K, Tsegaye G, Krol M, Gebregeorgis E, Shvartsbeyn A, Russell D, Overton L, Jiang L, Dimitrov G, Tran K, Shetty J, Malek JA, Feldblyum T, Nierman WC, Fraser CM. Mouse BAC ends quality assessment and sequence analyses. Genome Res 2001; 11:1736-45. [PMID: 11591651 PMCID: PMC311142 DOI: 10.1101/gr.179201] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
A large-scale BAC end-sequencing project at The Institute for Genomic Research (TIGR) has generated one of the most extensive sets of sequence markers for the mouse genome to date. With a sequencing success rate of >80%, an average read length of 485 bp, and ABI3700 capillary sequencers, we have generated 449,234 nonredundant mouse BAC end sequences (mBESs) with 218 Mb total from 257,318 clones from libraries RPCI-23 and RPCI-24, representing 15x clone coverage, 7% sequence coverage, and a marker every 7 kb across the genome. A total of 191,916 BACs have sequences from both ends providing 12x genome coverage. The average Q20 length is 406 bp and 84% of the bases have phred quality scores > or = 20. RPCI-24 mBESs have more Q20 bases and longer reads on average than RPCI-23 sequences. ABI3700 sequencers and the sample tracking system ensure that > 95% of mBESs are associated with the right clone identifiers. We have found that a significant fraction of mBESs contains L1 repeats and approximately 48% of the clones have both ends with > or = 100 bp contiguous unique Q20 bases. About 3% mBESs match ESTs and > 70% of matches were conserved between the mouse and the human or the rat. Approximately 0.1% mBESs contain STSs. About 0.2% mBESs match human finished sequences and > 70% of these sequences have EST hits. The analyses indicate that our high-quality mouse BAC end sequences will be a valuable resource to the community.
Collapse
Affiliation(s)
- S Zhao
- The Institute for Genomic Research, Rockville, Maryland 20850, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
23
|
Abstract
The data for the public working draft of the human genome contains roughly 400,000 initial sequence contigs in approximately 30,000 large insert clones. Many of these initial sequence contigs overlap. A program, GigAssembler, was built to merge them and to order and orient the resulting larger sequence contigs based on mRNA, paired plasmid ends, EST, BAC end pairs, and other information. This program produced the first publicly available assembly of the human genome, a working draft containing roughly 2.7 billion base pairs and covering an estimated 88% of the genome that has been used for several recent studies of the genome. Here we describe the algorithm used by GigAssembler.
Collapse
Affiliation(s)
- W J Kent
- Department of Biology, University of California at Santa Cruz, Santa Cruz, California 95064, USA.
| | | |
Collapse
|
24
|
Osoegawa K, Mammoser AG, Wu C, Frengen E, Zeng C, Catanese JJ, de Jong PJ. A bacterial artificial chromosome library for sequencing the complete human genome. Genome Res 2001; 11:483-96. [PMID: 11230172 PMCID: PMC311044 DOI: 10.1101/gr.169601] [Citation(s) in RCA: 196] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2000] [Accepted: 01/09/2001] [Indexed: 01/20/2023]
Abstract
A 30-fold redundant human bacterial artificial chromosome (BAC) library with a large average insert size (178 kb) has been constructed to provide the intermediate substrate for the international genome sequencing effort. The DNA was obtained from a single anonymous volunteer, whose identity was protected through a double-blind donor selection protocol. DNA fragments were generated by partial digestion with EcoRI (library segments 1--4: 24-fold) and MboI (segment 5: sixfold) and cloned into the pBACe3.6 and pTARBAC1 vectors, respectively. The quality of the library was assessed by extensive analysis of 169 clones for rearrangements and artifacts. Eighteen BACs (11%) revealed minor insert rearrangements, and none was chimeric. This BAC library, designated as "RPCI-11," has been used widely as the central resource for insert-end sequencing, clone fingerprinting, high-throughput sequence analysis and as a source of mapped clones for diagnostic and functional studies.
Collapse
Affiliation(s)
- K Osoegawa
- Department of Cancer Genetics, Roswell Park Cancer Institute, Buffalo, New York 14263, USA
| | | | | | | | | | | | | |
Collapse
|
25
|
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigó R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Deslattes Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J, Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A, Zhu X. The sequence of the human genome. Science 2001; 291:1304-51. [PMID: 11181995 DOI: 10.1126/science.1058040] [Citation(s) in RCA: 7685] [Impact Index Per Article: 334.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.
Collapse
Affiliation(s)
- J C Venter
- Celera Genomics, 45 West Gude Drive, Rockville, MD 20850, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann Y, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ, Szustakowki J. Initial sequencing and analysis of the human genome. Nature 2001; 409:860-921. [PMID: 11237011 DOI: 10.1038/35057062] [Citation(s) in RCA: 14536] [Impact Index Per Article: 632.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
Collapse
Affiliation(s)
- E S Lander
- Whitehead Institute for Biomedical Research, Center for Genome Research, Cambridge, MA 02142, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
McPherson JD, Marra M, Hillier L, Waterston RH, Chinwalla A, Wallis J, Sekhon M, Wylie K, Mardis ER, Wilson RK, Fulton R, Kucaba TA, Wagner-McPherson C, Barbazuk WB, Gregory SG, Humphray SJ, French L, Evans RS, Bethel G, Whittaker A, Holden JL, McCann OT, Dunham A, Soderlund C, Scott CE, Bentley DR, Schuler G, Chen HC, Jang W, Green ED, Idol JR, Maduro VV, Montgomery KT, Lee E, Miller A, Emerling S, Gibbs R, Scherer S, Gorrell JH, Sodergren E, Clerc-Blankenburg K, Tabor P, Naylor S, Garcia D, de Jong PJ, Catanese JJ, Nowak N, Osoegawa K, Qin S, Rowen L, Madan A, Dors M, Hood L, Trask B, Friedman C, Massa H, Cheung VG, Kirsch IR, Reid T, Yonescu R, Weissenbach J, Bruls T, Heilig R, Branscomb E, Olsen A, Doggett N, Cheng JF, Hawkins T, Myers RM, Shang J, Ramirez L, Schmutz J, Velasquez O, Dixon K, Stone NE, Cox DR, Haussler D, Kent WJ, Furey T, Rogic S, Kennedy S, Jones S, Rosenthal A, Wen G, Schilhabel M, Gloeckner G, Nyakatura G, Siebert R, Schlegelberger B, Korenberg J, Chen XN, Fujiyama A, Hattori M, Toyoda A, Yada T, Park HS, Sakaki Y, Shimizu N, Asakawa S, Kawasaki K, Sasaki T, Shintani A, Shimizu A, Shibuya K, Kudoh J, Minoshima S, Ramser J, Seranski P, Hoff C, Poustka A, Reinhardt R, Lehrach H. A physical map of the human genome. Nature 2001; 409:934-41. [PMID: 11237014 DOI: 10.1038/35057157] [Citation(s) in RCA: 549] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The human genome is by far the largest genome to be sequenced, and its size and complexity present many challenges for sequence assembly. The International Human Genome Sequencing Consortium constructed a map of the whole genome to enable the selection of clones for sequencing and for the accurate assembly of the genome sequence. Here we report the construction of the whole-genome bacterial artificial chromosome (BAC) map and its integration with previous landmark maps and information from mapping efforts focused on specific chromosomal regions. We also describe the integration of sequence data with the map.
Collapse
Affiliation(s)
- J D McPherson
- Washington University School of Medicine, Genome Sequencing Center, Department of Genetics, St. Louis, Missouri 63108, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Abstract
The Human Genome Project has generated extensive map and sequence data for a large number of Bacterial Artificial Chromosome (BAC) clones. In order to maximize the efficient use of the data and to minimize the redundant work for the research community, The Institute for Genomic Research (TIGR) comprehensive BAC resource (cBACr) (http://www.tigr.org/tdb/BacResource/BAC_resourc e_intro. html) was built as an expansion of the TIGR human BAC ends database. This resource collects, integrates and reports the information on library, maps, sequence, annotation and functions for each human and mouse BAC. The current database contains 635 016 human BACs and 265 617 mouse BACs that were characterized by various approaches, among which 22 705 human clones and 1000 mouse clones have sequence and annotation data.
Collapse
Affiliation(s)
- S Zhao
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA.
| |
Collapse
|
29
|
Semple C. Bases and spaces: resources on the web for accessing the draft human genome. Genome Biol 2000; 1:REVIEWS2001. [PMID: 11178254 PMCID: PMC138875 DOI: 10.1186/gb-2000-1-4-reviews2001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
SUMMARY Much is expected of the draft human genome sequence, and yet there is no central resource to host the plethora of sequence and mapping information available. Consequently, finding the most useful and reliable human genome data and resources currently available on the web can be challenging, but is not impossible.
Collapse
Affiliation(s)
- C Semple
- Medical Genetics Section, Department of Medical Sciences, The University of Edinburgh, Molecular Medicine Centre, Western General Hospital, Edinburgh, EH4 2XU, UK.
| |
Collapse
|