1
|
Zhang MQ. A personal journey on cracking the genomic codes. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-021-0245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
2
|
Xu Z, Lance B, Vargas C, Arpinar B, Bhandarkar S, Kraemer E, Kochut KJ, Miller JA, Wagner JR, Weise MJ, Wunderlich JK, Stringer J, Smulian G, Cushion MT, Arnold J. Mapping by sequencing the Pneumocystis genome using the ordering DNA sequences V3 tool. Genetics 2003; 163:1299-313. [PMID: 12702676 PMCID: PMC1462508 DOI: 10.1093/genetics/163.4.1299] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A bioinformatics tool called ODS3 has been created for mapping by sequencing. The tool allows the creation of integrated genomic maps from genetic, physical mapping, and sequencing data and permits an integrated genome map to be stored, retrieved, viewed, and queried in a stand-alone capacity, in a client/server relationship with the Fungal Genome Database (FGDB), and as a web-browsing tool for the FGDB. In that ODS3 is programmed in Java, the tool promotes platform independence and supports export of integrated genome-mapping data in the extensible markup language (XML) for data interchange with other genome information systems. The tool ODS3 is used to create an initial integrated genome map of the AIDS-related fungal pathogen, Pneumocystis carinii. Contig dynamics would indicate that this physical map is approximately 50% complete with approximately 200 contigs. A total of 10 putative multigene families were found. Two of these putative families were previously characterized in P. carinii, namely the major surface glycoproteins (MSGs) and HSP70 proteins; three of these putative families (not previously characterized in P. carinii) were found to be similar to families encoding the HSP60 in Schizosaccharomyces pombe, the heat-shock psi protein in S. pombe, and the RNA synthetase family (i.e., MES1) in Saccharomyces cerevisiae. Physical mapping data are consistent with the 16S, 5.8S, and 26S rDNA genes being single copy in P. carinii. No other fungus outside this genus is known to have the rDNA genes in single copy.
Collapse
Affiliation(s)
- Zheng Xu
- Department of Genetics, University of Georgia, Athens, Georgia 30602, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Bhandarkar SM, Machaka SA, Shete SS, Kota RN. Parallel computation of a maximum-likelihood estimator of a physical map. Genetics 2001; 157:1021-43. [PMID: 11238392 PMCID: PMC1461556 DOI: 10.1093/genetics/157.3.1021] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Reconstructing a physical map of a chromosome from a genomic library presents a central computational problem in genetics. Physical map reconstruction in the presence of errors is a problem of high computational complexity that provides the motivation for parallel computing. Parallelization strategies for a maximum-likelihood estimation-based approach to physical map reconstruction are presented. The estimation procedure entails a gradient descent search for determining the optimal spacings between probes for a given probe ordering. The optimal probe ordering is determined using a stochastic optimization algorithm such as simulated annealing or microcanonical annealing. A two-level parallelization strategy is proposed wherein the gradient descent search is parallelized at the lower level and the stochastic optimization algorithm is simultaneously parallelized at the higher level. Implementation and experimental results on a distributed-memory multiprocessor cluster running the parallel virtual machine (PVM) environment are presented using simulated and real hybridization data.
Collapse
Affiliation(s)
- S M Bhandarkar
- Department of Computer Science, The University of Georgia, Athens, Georgia 30602-7404, USA.
| | | | | | | |
Collapse
|
4
|
Wendl MC, Marra MA, Hillier LW, Chinwalla AT, Wilson RK, Waterston RH. Theories and Applications for Sequencing Randomly Selected Clones. Genome Res 2001. [DOI: 10.1101/gr.133901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Theory is developed for the process of sequencing randomly selected large-insert clones. Genome size, library depth, clone size, and clone distribution are considered relevant properties and perfect overlap detection for contig assembly is assumed. Genome-specific and nonrandom effects are neglected. Order of magnitude analysis indicates library depth is of secondary importance compared to the other variables, especially as clone size diminishes. In such cases, the well-known Poisson coverage law is a good approximation. Parameters derived from these models are used to examine performance for the specific case of sequencing random human BAC clones. We compare coverage and redundancy rates for libraries possessing uniform and nonuniform clone distributions. Results are measured against data from map-based human-chromosome-2 sequencing. We conclude that the map-based approach outperforms random clone sequencing, except early in a project. However, simultaneous use of both strategies can be beneficial if a performance-based estimate for halting random clone sequencing is made. Results further show that the random approach yields maximum effectiveness using nonbiased rather than biased libraries.
Collapse
|
5
|
Wendl MC, Marra MA, Hillier LW, Chinwalla AT, Wilson RK, Waterston RH. Theories and applications for sequencing randomly selected clones. Genome Res 2001; 11:274-80. [PMID: 11157790 PMCID: PMC311021 DOI: 10.1101/gr.gr-1339r] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Theory is developed for the process of sequencing randomly selected large-insert clones. Genome size, library depth, clone size, and clone distribution are considered relevant properties and perfect overlap detection for contig assembly is assumed. Genome-specific and nonrandom effects are neglected. Order of magnitude analysis indicates library depth is of secondary importance compared to the other variables, especially as clone size diminishes. In such cases, the well-known Poisson coverage law is a good approximation. Parameters derived from these models are used to examine performance for the specific case of sequencing random human BAC clones. We compare coverage and redundancy rates for libraries possessing uniform and nonuniform clone distributions. Results are measured against data from map-based human-chromosome-2 sequencing. We conclude that the map-based approach outperforms random clone sequencing, except early in a project. However, simultaneous use of both strategies can be beneficial if a performance-based estimate for halting random clone sequencing is made. Results further show that the random approach yields maximum effectiveness using nonbiased rather than biased libraries.
Collapse
Affiliation(s)
- M C Wendl
- Genome Sequencing Center, Washington University, St. Louis, Missouri 63108, USA.
| | | | | | | | | | | |
Collapse
|
6
|
Enkerli J, Reed H, Briley A, Bhatt G, Covert SF. Physical map of a conditionally dispensable chromosome in Nectria haematococca mating population VI and location of chromosome breakpoints. Genetics 2000; 155:1083-94. [PMID: 10880471 PMCID: PMC1461165 DOI: 10.1093/genetics/155.3.1083] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Certain isolates of the plant pathogenic fungus Nectria haematococca mating population (MP) VI contain a 1.6-Mb conditionally dispensable (CD) chromosome carrying the phytoalexin detoxification genes MAK1 and PDA6-1. This chromosome is structurally unstable during sexual reproduction. As a first step in our analysis of the mechanisms underlying this chromosomal instability, hybridization between overlapping cosmid clones was used to construct a map of the MAK1 PDA6-1 chromosome. The map consists of 33 probes that are linked by 199 cosmid clones. The polymerase chain reaction and Southern analysis of N. haematococca MP VI DNA digested with infrequently cutting restriction enzymes were used to close gaps and order the hybridization-derived contigs. Hybridization to a probe extended from telomeric repeats was used to anchor the ends of the map to the actual chromosome ends. The resulting map is estimated to cover 95% of the MAK1 PDA6-1 chromosome and is composed of two ordered contigs. Thirty-eight percent of the clones in the minimal map are known to contain repeated DNA sequences. Three dispersed repeats were cloned during map construction; each is present in five to seven copies on the chromosome. The cosmid clones representing the map were probed with deleted forms of the CD chromosome and the results were integrated into the map. This allowed the identification of chromosome breakpoints and deletions.
Collapse
Affiliation(s)
- J Enkerli
- Department of Botany, University of Georgia, Athens, Georgia 30602, USA
| | | | | | | | | |
Collapse
|
7
|
Abstract
The parking strategy is an iterative approach to DNA sequencing. Each iteration consists of sequencing a novel portion of target DNA that does not overlap any previously sequenced region. Subject to the constraint of no overlap, each new region is chosen randomly. A parking strategy is often ideal in the early stages of a project for rapidly generating unique data. As a project progresses, parking becomes progressively more expensive and eventually prohibitive. We present a mathematical model with a generalization to allow for overlaps. This model predicts multiple parameters, including progress, costs, and the distribution of gap sizes left by a parking strategy. The highly fragmented nature of the gaps left after an initial parking strategy may make it difficult to finish a project efficiently. Therefore, in addition to our parking model, we model gap closing by walking. Our gap-closing model is generalizable to many other strategies. Our discussion includes modified parking strategies and hybrids with other strategies. A hybrid parking strategy has been employed for portions of the Human Genome Project.
Collapse
Affiliation(s)
- J C Roach
- The Institute for Systems Biology, Seattle, Washington 98105 USA.
| | | | | |
Collapse
|
8
|
Prade RA, Griffith J, Kochut K, Arnold J, Timberlake WE. In vitro reconstruction of the Aspergillus (= Emericella) nidulans genome. Proc Natl Acad Sci U S A 1997; 94:14564-9. [PMID: 9405653 PMCID: PMC25056 DOI: 10.1073/pnas.94.26.14564] [Citation(s) in RCA: 49] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/1997] [Accepted: 10/29/1997] [Indexed: 02/05/2023] Open
Abstract
A physical map of the 31-megabase Aspergillus nidulans genome is reported, in which 94% of 5,134 cosmids are assigned to 49 contiguous segments. The physical map is the result of a two-way ordering process, in which clones and probes were ordered simultaneously on a binary DNA/DNA hybridization matrix. Compression by elimination of redundant clones resulted in a minimal map, which is a chromosome walk. Repetitive DNA is nonrandomly dispersed in the A. nidulans genome, reminiscent of heterochromatic banding patterns of higher eukaryotes. We hypothesize gene clusters may arise by horizontal transfer and spread by transposition to explain the nonrandom pattern of repeats along chromosomes.
Collapse
Affiliation(s)
- R A Prade
- Department of Microbiology and Molecular Genetics, Oklahoma State University, Stillwater, OK 74078-0289, USA
| | | | | | | | | |
Collapse
|
9
|
Abstract
The aim of this paper is to provide general results for predicting progress in a physical mapping project by anchoring random clones, when clones and anchors are not homogeneously distributed along the genome. A complete physical map of the DNA of an organism consists of overlapping clones spanning the entire genome. Several schemes can be used to construct such a map, depending on the way that clones overlap. We focus here on the approach consisting of assembling clones sharing a common random short sequence called an anchor. Some mathematical analyses providing statistical properties of anchored clones have been developed in the stationary case. Modeling the clone and anchor processes as nonhomogeneous Poisson processes provides such an analysis in a general nonstationary framework. We apply our results to two natural nonhomogeneous models to illustrate the effect of inhomogeneity. This study reveals that using homogeneous processes for clones and anchors provides an overly optimistic assessment of the progress of the mapping project.
Collapse
Affiliation(s)
- S Schbath
- I.N.R.A., Unité de Biométrie, Jouy-en-Josas, France.
| |
Collapse
|
10
|
López-Nieto CE, Nigam SK. Selective amplification of protein-coding regions of large sets of genes using statistically designed primer sets. Nat Biotechnol 1996; 14:857-61. [PMID: 9631010 DOI: 10.1038/nbt0796-857] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
We describe a novel approach to design a set of primers selective for large groups of genes. This method is based on the distribution frequency of all nucleotide combinations (octa- to decanucleotides), and the combined ability of primer pairs, based on these oligonucleotides, to detect genes. By analyzing 1000 human mRNAs, we found that a surprisingly small subset of octanucleotides is shared by a high proportion of human protein-coding region sense strands. By computer simulation of polymerase chain reactions, a set based on only 30 primers was able to detect approximately 75% of known (and presumably unknown) human protein-coding regions. To validate the method and provide experimental support for the feasibility of the more ambitious goal of targeting human protein-coding regions, we sought to apply the technique to a large protein family: G-protein coupled receptors (GPCRs). Our results indicate that there is sufficient low level homology among human coding regions to allow design of a limited set of primer pairs that can selectively target coding regions in general, as well as genomic subsets (e.g., GPCRs). The approach should be generally applicable to human coding regions, and thus provide an efficient method for analyzing much of the transcriptionally active human genome.
Collapse
Affiliation(s)
- C E López-Nieto
- Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA
| | | |
Collapse
|
11
|
Xiong M, Chen HJ, Prade RA, Wang Y, Griffith J, Timberlake WE, Arnold J. On the consistency of a physical mapping method to reconstruct a chromosome in vitro. Genetics 1996; 142:267-84. [PMID: 8770604 PMCID: PMC1206956 DOI: 10.1093/genetics/142.1.267] [Citation(s) in RCA: 22] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
During recent years considerable effort has been invested in creating physical maps for a variety of organisms as part of the Human Genome Project and in creating various methods for physical mapping. The statistical consistency of a physical mapping method to reconstruct a chromosome, however, has not been investigated. In this paper, we first establish that a model of physical mapping by binary fingerprinting of DNA fragments is identifiable using the key assumption-for a large randomly generated recombinant DNA library, there exists a staircase of DNA fragments across the chromosomal region of interest. Then we briefly introduce epi-convergence theory of variational analysis and transform the physical mapping problem into a constrained stochastic optimization problem. By doing so, we prove epi-convergence of the physical mapping model and epi-convergence of the physical mapping method. Combining the identifiability of our physical mapping model and the epi-convergence of a physical mapping method, finally we establish strong consistency of a physical mapping method.
Collapse
Affiliation(s)
- M Xiong
- Department of Mathematics and Molecular Biology, University of Southern California, Los Angeles 90089, USA
| | | | | | | | | | | | | |
Collapse
|
12
|
Balding DJ. Design and analysis of chromosome physical mapping experiments. Philos Trans R Soc Lond B Biol Sci 1994; 344:329-35. [PMID: 7800702 DOI: 10.1098/rstb.1994.0071] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Mathematical and statistical aspects of constructing ordered-clone physical maps of chromosomes are reviewed. Three broad problems are addressed: analysis of fingerprint data to identify configurations of overlapping clones, prediction of the rate of progress of a mapping strategy and optimal design of pooling schemes for screening large clone libraries.
Collapse
Affiliation(s)
- D J Balding
- School of Mathematical Sciences, Queen Mary & Westfield College, University of London, U.K
| |
Collapse
|