1
|
Asymptotics of pooling design performance. J Appl Probab 2016. [DOI: 10.1017/s0021900200017770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We analyse the expected performance of various group testing, or pooling, designs. The context is that of identifying characterized clones in a large collection of clones. Here we choose as performance criterion the expected number of unresolved ‘negative’ clones, and we aim to minimize this quantity. Technically, long inclusion–exclusion summations are encountered which, aside from being computationally demanding, give little inkling of the qualitative effect of parametric control on the pooling strategy. We show that readily-interpreted re-summation can be performed, leading to asymptotic forms and systematic corrections. We apply our results to randomized designs, illustrating how they might be implemented for approximating combinatorial formulae.
Collapse
|
2
|
Abstract
We analyse the expected performance of various group testing, or pooling, designs. The context is that of identifying characterized clones in a large collection of clones. Here we choose as performance criterion the expected number of unresolved ‘negative’ clones, and we aim to minimize this quantity. Technically, long inclusion–exclusion summations are encountered which, aside from being computationally demanding, give little inkling of the qualitative effect of parametric control on the pooling strategy. We show that readily-interpreted re-summation can be performed, leading to asymptotic forms and systematic corrections. We apply our results to randomized designs, illustrating how they might be implemented for approximating combinatorial formulae.
Collapse
|
3
|
Wendl MC, Waterston RH. Generalized gap model for bacterial artificial chromosome clone fingerprint mapping and shotgun sequencing. Genome Res 2002; 12:1943-9. [PMID: 12466299 PMCID: PMC187573 DOI: 10.1101/gr.655102] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We develop an extension to the Lander-Waterman theory for characterizing gaps in bacterial artificial chromosome fingerprint mapping and shotgun sequencing projects. It supports a larger set of descriptive statistics and is applicable to a wider range of project parameters. We show that previous assertions regarding inconsistency of the Lander-Waterman theory at higher coverages are incorrect and that another well-known but ostensibly different model is in fact the same. The apparent paradox of infinite island lengths is resolved. Several applications are shown, including evolution of the probability density function, calculation of closure probabilities, and development of a probabilistic method for computing stopping points in bacterial artificial chromosome shotgun sequencing.
Collapse
Affiliation(s)
- Michael C Wendl
- Washington University School of Medicine, St. Louis, Missouri 63108, USA.
| | | |
Collapse
|
4
|
Abstract
The aim of this paper is to provide general results for predicting progress in a physical mapping project by anchoring random clones, when clones and anchors are not homogeneously distributed along the genome. A complete physical map of the DNA of an organism consists of overlapping clones spanning the entire genome. Several schemes can be used to construct such a map, depending on the way that clones overlap. We focus here on the approach consisting of assembling clones sharing a common random short sequence called an anchor. Some mathematical analyses providing statistical properties of anchored clones have been developed in the stationary case. Modeling the clone and anchor processes as nonhomogeneous Poisson processes provides such an analysis in a general nonstationary framework. We apply our results to two natural nonhomogeneous models to illustrate the effect of inhomogeneity. This study reveals that using homogeneous processes for clones and anchors provides an overly optimistic assessment of the progress of the mapping project.
Collapse
Affiliation(s)
- S Schbath
- I.N.R.A., Unité de Biométrie, Jouy-en-Josas, France.
| |
Collapse
|
5
|
Abstract
Arabidopsis thaliana is a small flowering plant that is a member of the family cruciferae. It has many characteristics--diploid genetics, rapid growth cycle, relatively low repetitive DNA content, and small genome size--that recommend it as the model for a plant genome project. The current status of the genetic and physical maps, as well as efforts to sequence the genome, are presented. Examples are given of genes isolated by using map-based cloning. The importance of the Arabidopsis project for plant biology in general is discussed.
Collapse
Affiliation(s)
- H M Goodman
- Department of Genetics, Harvard Medical School, Massachusetts General Hospital, Boston, MA 02114, USA
| | | | | |
Collapse
|
6
|
Port E, Sun F, Martin D, Waterman MS. Genomic mapping by end-characterized random clones: a mathematical analysis. Genomics 1995; 26:84-100. [PMID: 7782090 DOI: 10.1016/0888-7543(95)80086-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Physical maps can be constructed by "fingerprinting" a large number of random clones and inferring overlap between clones when the fingerprints are sufficiently similar. E. Lander and M. Waterman (Genomics 2: 231-239, 1988) gave a mathematical analysis of such mapping strategies. The analysis is useful for comparing various fingerprinting methods. Recently it has been proposed that ends of clones rather than the entire clone be fingerprinted or characterized. Such fingerprints, which include sequenced clone ends, require a mathematical analysis deeper than that of Lander-Waterman. This paper studies clone islands, which can include uncharacterized regions, and also the islands that are formed entirely from the ends of clones.
Collapse
Affiliation(s)
- E Port
- Department of Mathematics, University of Southern California, Los Angeles 90089-1113, USA
| | | | | | | |
Collapse
|
7
|
Greenberg D, Istrail S. The chimeric mapping problem: algorithmic strategies and performance evaluation on synthetic genomic data. COMPUTERS & CHEMISTRY 1994; 18:207-20. [PMID: 7952891 DOI: 10.1016/0097-8485(94)85015-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The Human Genome Project requires better software for the creation of physical maps of chromosomes. Current mapping techniques involve breaking large segments of DNA into smaller, more-manageable pieces, gathering information on all the small pieces, and then constructing a map of the original large piece from the information about the small pieces. Unfortunately, in the process of breaking up the DNA some information is lost and noise of various types is introduced; in particular, the order of the pieces is not preserved. Thus, the map maker must solve a combinatorial problem in order to reconstruct the map. Good software is indispensable for quick, accurate reconstruction. The reconstruction is complicated by various experimental errors. A major source of difficulty--which seems to be inherent to the recombination technology--is the presence of chimeric DNA clones. It is fairly common for two disjoint DNA pieces to form a chimera, i.e., a fusion of two pieces which appears as a single piece. Attempts to order chimera will fail unless they are algorithmically divided into their constituent pieces. Despite consensus within the genomic mapping community of the critical importance of correcting chimerism, algorithms for solving the chimeric clone problem have received only passing attention in the literature. Based on a model proposed by Lander (1992a, b) this paper presents the first algorithms for analyzing chimerism. We construct physical maps in the presence of chimerism by creating optimization functions which have minimizations which correlate with map quality. Despite the fact that these optimization functions are invariably NP-complete our algorithms are guaranteed to produce solutions which are close to the optimum. The practical import of using these algorithms depends on the strength of the correlation of the function to the map quality as well as on the accuracy of the approximations. We employ two fundamentally different optimization functions as a means of avoiding biases likely to decorrelate the solutions from the desired map. Experiments on simulated data show that both our algorithm which minimizes the number of chimeric fragments in a solution and our algorithm which minimizes the maximum number of fragments per clone in a solution do, in fact, correlate to high quality solutions. Furthermore, tests on simulated data using parameters set to mimic real experiments show that that the algorithms have the potential to find high quality solutions with real data. We plan to test our software against real data from the Whitehead Institute and from Los Alamos Genomic Research Center in the near future.
Collapse
Affiliation(s)
- D Greenberg
- Sandia National Laboratories, Algorithms and Discrete Mathematics Department, Albuquerque, NM
| | | |
Collapse
|
8
|
Zhang MQ, Marr TG. Genome mapping by nonrandom anchoring: a discrete theoretical analysis. Proc Natl Acad Sci U S A 1993; 90:600-4. [PMID: 8421694 PMCID: PMC45711 DOI: 10.1073/pnas.90.2.600] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
As part of our effort to construct a physical map of the genome of the fission yeast Schizosaccharomyces pombe, we have made theoretical predictions for the progress expected, as measured by the expected length fraction of island coverage and by the expected properties of the anchored islands such as the number and the size of islands. Our experimental strategy is to construct a random clone library and screen the library for clones having unique sequence at both ends. This scheme is essentially the same as the clone-limited double sequence-tagged-site selection scheme which was used in a computer simulation by Palazzolo et al. [Palazzolo, M. J., Sawyer, S. A., Martin, C. H., Smoller, D. A. & Hartl, D. L. (1991) Proc. Natl. Acad. Sci. USA 88, 8034-8038]. Both simulation and ongoing experiments in our laboratory have shown that the nonrandom anchoring method is far superior to random anchoring. In this paper, we propose a theoretical model to explain the simulated data and the experimental data.
Collapse
Affiliation(s)
- M Q Zhang
- Cold Spring Harbor Laboratory, NY 11724
| | | |
Collapse
|
9
|
Thaler DS, Noordewier MO. MEPS parameters and graph analysis for the use of recombination to construct ordered sets of overlapping clones. Genomics 1992; 13:1065-74. [PMID: 1505944 DOI: 10.1016/0888-7543(92)90020-s] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Homologous recombination can provide a basis for the construction of an ordered set of overlapping clones. The principle is to make two libraries, each in a vector that has a different selectable marker flanking the insert site. Recombination between the flanking markers, leading to a selectable phenotype, can only occur as the consequence of crossing over between inserts. The two libraries are crossed in a matrix, allowing the construction of an ordered set. The logic, akin to S. Benzer's (1961, Genetics 47:403-415) for the arrangement of deletion and point mutations, has a graph theoretic formulation, which helps to cope with the complex and noisy data inherent in the physical mapping of genomes rich in repeated sequences. The minimum length of identity required for homologous recombination is called the MEPS (minimum efficient processing segment) and is a property of each recombination pathway. The amount and the type of sequence similarity required for two sequences to recombine is different from that implied by either the conservation of restriction sites or by most procedures of hybridization.
Collapse
Affiliation(s)
- D S Thaler
- Laboratory of Molecular Genetics and Informatics, Rockefeller University, New York, New York 10021
| | | |
Collapse
|
10
|
Marr TG, Yan X, Yu Q. Genomic mapping by single copy landmark detection: a predictive model with a discrete mathematical approach. Mamm Genome 1992; 3:644-9. [PMID: 1450514 DOI: 10.1007/bf00352482] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
One of the goals of the Human Genome Project is to produce libraries of largely contiguous, ordered sets of molecular clones for use in sequencing and gene mapping projects. This is planned to be done for human and many model organisms. Theory and practice have shown that long-range contiguity and the degree to which the entire genome is covered by ordered clones can be affected by many biological variables. Many laboratories are currently experimenting with different experimental strategies and theoretical models to help plan strategies for accomplishing long-range molecular mapping of genomes. Here we describe a new mathematical model and formulas for helping to plan genome mapping projects, using various single-copy landmark (SCL) detection, or "anchoring", strategies. We derive formulas that allow us to examine the effects of interactions among the following variables: average insert size of the cloning vector, average size of SCL, the number of SCL, and the redundancy in coverage of the clone library. We also examine and compare three different ways in which anchoring can be implemented: (1) anchors are selected independently of the library to be ordered (random anchoring); (2) anchors are made from end probes from both ends of clones in the library to be ordered (nonrandom anchoring); and (3) anchors are made from one end or the other, randomly, from clones in the library to be ordered (nonrandom anchoring). Our results show that, for biologically realistic conditions, nonrandom anchoring is always more effective than random anchoring for contig building, and there is little to be gained from making SCL from both ends of clones vs. only one end of clones.(ABSTRACT TRUNCATED AT 250 WORDS)
Collapse
Affiliation(s)
- T G Marr
- Cold Spring Harbor Laboratory, New York 11724
| | | | | |
Collapse
|
11
|
Arratia R, Lander ES, Tavaré S, Waterman MS. Genomic mapping by anchoring random clones: a mathematical analysis. Genomics 1991; 11:806-27. [PMID: 1783390 DOI: 10.1016/0888-7543(91)90004-x] [Citation(s) in RCA: 68] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
A complete physical map of the DNA of an organism, consisting of overlapping clones spanning the genome, is an extremely useful tool for genomic analysis. Various methods for the construction of such physical maps are available. One approach is to assemble the physical map by "fingerprinting" a large number of random clones and inferring overlap between clones with sufficiently similar fingerprints. E.S. Lander and M.S. Waterman (1988, Genomics 2:231-239) have recently provided a mathematical analysis of such physical mapping schemes, useful for planning such a project. Another approach is to assemble the physical map by "anchoring" a large number of random clones--that is, by taking random short regions called anchors and identifying the clones containing each anchor. Here, we provide a mathematical analysis of such a physical mapping scheme.
Collapse
Affiliation(s)
- R Arratia
- Department of Mathematics, University of Southern California, Los Angeles 90089
| | | | | | | |
Collapse
|
12
|
Ewens WJ, Bell CJ, Donnelly PJ, Dunn P, Matallana E, Ecker JR. Genome mapping with anchored clones: theoretical aspects. Genomics 1991; 11:799-805. [PMID: 1686019 DOI: 10.1016/0888-7543(91)90003-w] [Citation(s) in RCA: 25] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
As part of our effort to construct a physical map of the genome of Arabidopsis thaliana we have made a mathematical analysis of our experimental approach of anchoring yeast artificial chromosome clones with genetically mapped RFLPs and RAPDs. The details of this analysis are presented and their implications for mapping the Arabidopsis genome are discussed.
Collapse
Affiliation(s)
- W J Ewens
- Department of Biology, University of Pennsylvania, Philadelphia 19104
| | | | | | | | | | | |
Collapse
|
13
|
Barillot E, Lacroix B, Cohen D. Theoretical analysis of library screening using a N-dimensional pooling strategy. Nucleic Acids Res 1991; 19:6241-7. [PMID: 1956784 PMCID: PMC329134 DOI: 10.1093/nar/19.22.6241] [Citation(s) in RCA: 96] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
A solution to the problem of library screening is analysed. We examine how to retrieve those clones that are positive for a single copy landmark from a whole library while performing only a minimum number of laboratory tests: the clones are arranged on a matrix (i.e in 2 dimensions) and pooled according to the rows and columns. A fingerprint is determined for each pool and an analysis allows selection of a list containing all the positive clones, plus a few false positives. These false positives are eliminated by using another (or several other) matrix which has to be reconfigured in a way as different as possible from the previous one. We examine the use of cubes (3 dimensions) or hypercubes of any dimension instead of matrices and analyse how to reconfigure them in order to eliminate the false positives as efficiently as possible. The advantage of the method proposed is the low number of tests required and the low number of pools that require to be prepared [only 258 pools and 282 tests (258 + 24 verifications) are needed to screen the 72,000 clones of the CEPH YAC library (1) with a sequence-tagged site]. Furthermore, this method allows easy and systematic screenings and can be applied to a large physical mapping project, which will lead to an interesting map with a low, precisely known, rate of error: when fingerprinting a 150 Mb chromosome with the CEPH YAC library and 1750 sequence-tagged sites, 903,000 tests would be necessary to obtain about 20 contigs of an average length of 6.7 Mb, while only about one false positive would be expected in the resultant map. Finally, STSs can be ordered by dividing a clone library into sublibraries (corresponding to groups of microplates for example) and testing each STS on pooled clones from each sublibrary. This allows to dedicate to each STSs a fingerprint that consists in the list of the positive pools. In many cases these fingerprints will be enough to order the STSs. Indeed if large YACs (greater than 1 Mb) can be obtained, the combined screening of DNA families and YAC DNA pools would allow an integrated construction of both genetic and physical maps of the human genome, that will also reduce the optimal number of meioses needed for a 1 centimorgan linkage map.
Collapse
Affiliation(s)
- E Barillot
- Centre d'Etude du Polymorphisme Humain (CEPH), Paris, France
| | | | | |
Collapse
|
14
|
Green ED, Green P. Sequence-tagged site (STS) content mapping of human chromosomes: theoretical considerations and early experiences. PCR METHODS AND APPLICATIONS 1991; 1:77-90. [PMID: 1842934 DOI: 10.1101/gr.1.2.77] [Citation(s) in RCA: 75] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The magnitude of the effort required to complete the human genome project will require constant refinements of the tools available for the large-scale study of DNA. Such improvements must include both the development of more powerful technologies and the reformulation of the theoretical strategies that account for the changing experimental capabilities. The two technological advances described here, PCR and YAC cloning, have rapidly become incorporated into the standard armamentarium of genome analysis and represent key examples of how technological developments continue to drive experimental strategies in molecular biology. Because of its high sensitivity, specificity, and potential for automation, PCR is transforming many aspects of DNA mapping. Similarly, by providing the means to isolate and study larger pieces of DNA, YAC cloning has made practical the achievement of megabase-level continuity in physical maps. Taken together, these two technologies can be envisioned as providing a powerful strategy for constructing physical maps of whole chromosomes. Undoubtedly, future technological developments will promote even more effective mapping strategies. Nonetheless, the theoretical projections and practical experience described here suggest that constructing YAC-based STS-content maps of whole human chromosomes is now possible. Random STSs can be efficiently generated and used to screen collections of YAC clones, and contiguous YAC coverage of regions exceeding 2 Mb can be readily obtained. While the predicted laboratory effort required for mapping whole human chromosomes remains daunting, it is clearly feasible.
Collapse
Affiliation(s)
- E D Green
- Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110
| | | |
Collapse
|