1
|
Abstract
Once a biochemical method has been devised to sample RNA or DNA of interest, sequencing can be used to identify the sampled molecules with high fidelity and low bias. High-throughput sequencing has therefore become the primary data acquisition method for many genomics studies and is being used more and more to address molecular biology questions. By applying principles of statistical experimental design, sequencing experiments can be made more sensitive to the effects under study as well as more biologically sound, hence more replicable.
Collapse
Affiliation(s)
- Loren A Honaas
- Department of Biology, The Pennsylvania State University, USDA ARS, Tree Fruit Res Lab, 1104 N Western Ave, Wenatchee, WA, 98801, USA
| | - Naomi S Altman
- Department of Statistics and Huck Institutes of Life Sciences, The Pennsylvania State University, 312 Thomas Building, University Park, PA, 16802-2111, USA.
| | - Martin Krzywinski
- Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada
| |
Collapse
|
2
|
Frenkel Z, Paux E, Mester D, Feuillet C, Korol A. LTC: a novel algorithm to improve the efficiency of contig assembly for physical mapping in complex genomes. BMC Bioinformatics 2010; 11:584. [PMID: 21118513 PMCID: PMC3098104 DOI: 10.1186/1471-2105-11-584] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2010] [Accepted: 11/30/2010] [Indexed: 11/25/2022] Open
Abstract
Background Physical maps are the substrate of genome sequencing and map-based cloning and their construction relies on the accurate assembly of BAC clones into large contigs that are then anchored to genetic maps with molecular markers. High Information Content Fingerprinting has become the method of choice for large and repetitive genomes such as those of maize, barley, and wheat. However, the high level of repeated DNA present in these genomes requires the application of very stringent criteria to ensure a reliable assembly with the FingerPrinted Contig (FPC) software, which often results in short contig lengths (of 3-5 clones before merging) as well as an unreliable assembly in some difficult regions. Difficulties can originate from a non-linear topological structure of clone overlaps, low power of clone ordering algorithms, and the absence of tools to identify sources of gaps in Minimal Tiling Paths (MTPs). Results To address these problems, we propose a novel approach that: (i) reduces the rate of false connections and Q-clones by using a new cutoff calculation method; (ii) obtains reliable clusters robust to the exclusion of single clone or clone overlap; (iii) explores the topological contig structure by considering contigs as networks of clones connected by significant overlaps; (iv) performs iterative clone clustering combined with ordering and order verification using re-sampling methods; and (v) uses global optimization methods for clone ordering and Band Map construction. The elements of this new analytical framework called Linear Topological Contig (LTC) were applied on datasets used previously for the construction of the physical map of wheat chromosome 3B with FPC. The performance of LTC vs. FPC was compared also on the simulated BAC libraries based on the known genome sequences for chromosome 1 of rice and chromosome 1 of maize. Conclusions The results show that compared to other methods, LTC enables the construction of highly reliable and longer contigs (5-12 clones before merging), the detection of "weak" connections in contigs and their "repair", and the elongation of contigs obtained by other assembly methods.
Collapse
Affiliation(s)
- Zeev Frenkel
- University of Haifa, Institute of Evolution, Haifa 31905, Israel.
| | | | | | | | | |
Collapse
|
3
|
Wendl MC. Algebraic correction methods for computational assessment of clone overlaps in DNA fingerprint mapping. BMC Bioinformatics 2007; 8:127. [PMID: 17442113 PMCID: PMC1868038 DOI: 10.1186/1471-2105-8-127] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2007] [Accepted: 04/18/2007] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND The Sulston score is a well-established, though approximate metric for probabilistically evaluating postulated clone overlaps in DNA fingerprint mapping. It is known to systematically over-predict match probabilities by various orders of magnitude, depending upon project-specific parameters. Although the exact probability distribution is also available for the comparison problem, it is rather difficult to compute and cannot be used directly in most cases. A methodology providing both improved accuracy and computational economy is required. RESULTS We propose a straightforward algebraic correction procedure, which takes the Sulston score as a provisional value and applies a power-law equation to obtain an improved result. Numerical comparisons indicate dramatically increased accuracy over the range of parameters typical of traditional agarose fingerprint mapping. Issues with extrapolating the method into parameter ranges characteristic of newer capillary electrophoresis-based projects are also discussed. CONCLUSION Although only marginally more expensive to compute than the raw Sulston score, the correction provides a vastly improved probabilistic description of hypothesized clone overlaps. This will clearly be important in overlap assessment and perhaps for other tasks as well, for example in using the ranking of overlap probabilities to assist in clone ordering.
Collapse
Affiliation(s)
- Michael C Wendl
- Genome Sequencing Center, Washington University, St. Louis, MO 63108, USA.
| |
Collapse
|
4
|
Wendl MC, Waterston RH. Generalized gap model for bacterial artificial chromosome clone fingerprint mapping and shotgun sequencing. Genome Res 2002; 12:1943-9. [PMID: 12466299 PMCID: PMC187573 DOI: 10.1101/gr.655102] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We develop an extension to the Lander-Waterman theory for characterizing gaps in bacterial artificial chromosome fingerprint mapping and shotgun sequencing projects. It supports a larger set of descriptive statistics and is applicable to a wider range of project parameters. We show that previous assertions regarding inconsistency of the Lander-Waterman theory at higher coverages are incorrect and that another well-known but ostensibly different model is in fact the same. The apparent paradox of infinite island lengths is resolved. Several applications are shown, including evolution of the probability density function, calculation of closure probabilities, and development of a probabilistic method for computing stopping points in bacterial artificial chromosome shotgun sequencing.
Collapse
Affiliation(s)
- Michael C Wendl
- Washington University School of Medicine, St. Louis, Missouri 63108, USA.
| | | |
Collapse
|
5
|
Selleri L, Smith MW, Holmsen AL, Romo AJ, Thomas SD, Paternotte C, Romberg LC, Wei YH, Evans GA. High-resolution physical mapping of a 250-kb region of human chromosome 11q24 by genomic sequence sampling (GSS). Genomics 1995; 26:489-501. [PMID: 7607672 DOI: 10.1016/0888-7543(95)80167-k] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
A physical map of the region of human chromosome 11q24 containing the FLI1 gene, disrupted by the t(11;22) translocation in Ewing sarcoma and primitive neuroectodermal tumors, was analyzed by genomic sequence sampling. Using a 4- to 5-fold coverage chromosome 11-specific library, 22 region-specific cosmid clones were identified by phenol emulsion reassociation hybridization, with a 245-kb yeast artificial chromosome clone containing the FLI1 gene, and by directed "walking" techniques. Cosmid contigs were constructed by individual clone fingerprinting using restriction enzyme digestion and assembly with the Genome Reconstruction and AsseMbly (GRAM) computer algorithm. The relative orientation and spacing of cosmid contigs with respect to the chromosome was determined by the structural analysis of cosmid clones and by direct visual in situ hybridization mapping. Each cosmid clone in the contig was subjected to "one-pass" end sequencing, and the resulting ordered sequence fragments represent approximately 5% of the complete DNA sequence, making the entire region accessible by PCR amplification. The sequence samples were analyzed for putative exons, repetitive DNAs, and simple sequence repeats using a variety of computer algorithms. Based upon the computer predictions, Southern and Northern blot experiments led to the independent identification and localization of the FLI1 gene as well as a previously unknown gene located in this region of chromosome 11q24. This approach to high-resolution physical analysis of human chromosomes allows the assembly of detailed sequence-based maps and provides a tool for further structural and functional analysis of the genome.
Collapse
Affiliation(s)
- L Selleri
- Molecular Genetics Laboratory, Salk Institute for Biological Studies, La Jolla, California 92037, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Singh GB, Krawetz SA. CLONEPLACER: a software tool for simulating contig formation for ordered shotgun sequencing. Genomics 1995; 25:555-8. [PMID: 7789990 DOI: 10.1016/0888-7543(95)80057-s] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
This communication describes a software tool that enables one to simulate large-scale regional mapping using an ordered shotgun sequencing approach. The analysis routines that are provided yield an estimate of the depth of coverage of the physical map, the largest contig formed, and the number of gaps remaining at any given juncture in the project. A detailed listing describing the span of each contig within the physical map is also presented. This provides an a priori means of estimating the resources that will be required to undertake any megabase mapping or sequencing project. CLONEPLACER provides the much needed guide to deriving the optimal strategy.
Collapse
Affiliation(s)
- G B Singh
- College of Engineering, Wayne State University, Detroit, Michigan 48202, USA
| | | |
Collapse
|
7
|
Singh GB, Nelson JE, Mcalinden TP, Krawetz SA. ISWAC: proposed system for the integrated assembly of chromosomes. DNA SEQUENCE : THE JOURNAL OF DNA SEQUENCING AND MAPPING 1994; 5:67-76. [PMID: 7703507 DOI: 10.3109/10425179409039707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The generation of a physical map as an integral part of sequence project management is a problem that present computer systems do not address. Primarily, the analysis performed is based solely on the information available from a single knowledge level. Management systems that are currently available do not adequately model the multi-layer top down strategy that is most often utilized to manage large scale sequencing projects. Single layered approaches reflect an algorithmic inadequacy since interacting data sets are required to provide a good solution. The analysis tool that is currently under development termed ISWAC, the Integrated System for Wholistic Assembly of Chromosomes, overcomes these limitations by integrating information available from five layers of knowledge. These knowledge layers utilize information from the linkage map, physical map, restriction map, clone strategy map and the DNA sequence itself. The approach we are implementing, reviews current project status and continually refines the experimental strategy necessary to efficiently complete the sequencing task. To facilitate project completion the system is designed to interactively recommend strategies based on partial information. The utility of this tool is enhanced by implementing knowledge representation techniques that allow reasoning with approximate concepts characteristic of these data-sets. In addition, the raw physical data is maintained within an integrated map database to ease data verification. This paper presents the first discussion of the design specifications for a computer system to assimilate the various forms of data that are being generated as part of the human genome project. It was specifically written to stimulate discussion regarding data standardization, translation, analysis and most important, an understandable user-interphase for the molecular biologist. We would hope that interested readers would respond by assisting in the definition of a set of universal data standards and adopting them in their laboratories.
Collapse
Affiliation(s)
- G B Singh
- College of Engineering, Wayne State University, Detroit, MI 48202
| | | | | | | |
Collapse
|
8
|
Bellanné-Chantelot C, Lacroix B, Ougen P, Billault A, Beaufils S, Bertrand S, Georges I, Glibert F, Gros I, Lucotte G. Mapping the whole human genome by fingerprinting yeast artificial chromosomes. Cell 1992; 70:1059-68. [PMID: 1525822 DOI: 10.1016/0092-8674(92)90254-a] [Citation(s) in RCA: 120] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Physical mapping of the human genome has until now been envisioned through single chromosome strategies. We demonstrate that by using large insert yeast artificial chromosomes (YACs) a whole genome approach becomes feasible. YACs (22,000) of 810 kb mean size (5 genome equivalents) have been fingerprinted to obtain individual patterns of restriction fragments detected by a LINE-1 (L1) probe. More than 1000 contigs were assembled. Ten randomly chosen contigs were validated by metaphase chromosome fluorescence in situ hybridization, as well as by analyzing the inter-Alu PCR patterns of their constituent YACs. We estimate that 15% to 20% of the human genome, mainly the L1-rich regions, is already covered with contigs larger than 3 Mb.
Collapse
|
9
|
Abstract
In this paper we describe a method for the statistical reconstruction of a large DNA sequence from a set of sequenced fragments. We assume that the fragments have been assembled and address the problem of determining the degree to which the reconstructed sequence is free from errors, i.e., its accuracy. A consensus distribution is derived from the assembled fragment configuration based upon the rates of sequencing errors in the individual fragments. The consensus distribution can be used to find a minimally redundant consensus sequence that meets a prespecified confidence level, either base by base or across any region of the sequence. A likelihood-based procedure for the estimation of the sequencing error rates, which utilizes an iterative EM algorithm, is described. Prior knowledge of the error rates is easily incorporated into the estimation procedure. The methods are applied to a set of assembled sequence fragments from the human G6PD locus. We close the paper with a brief discussion of the relevance and practical implications of this work.
Collapse
Affiliation(s)
- G A Churchill
- Biometrics Unit, Cornell University, Ithaca, New York 14853
| | | |
Collapse
|
10
|
Stallings RL, Doggett NA, Callen D, Apostolou S, Chen LZ, Nancarrow JK, Whitmore SA, Harris P, Michison H, Breuning M. Evaluation of a cosmid contig physical map of human chromosome 16. Genomics 1992; 13:1031-9. [PMID: 1505942 DOI: 10.1016/0888-7543(92)90016-l] [Citation(s) in RCA: 28] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
A cosmid contig physical map of human chromosome 16 has been developed by repetitive sequence finger-printing of approximately 4000 cosmid clones obtained from a chromosome 16-specific cosmid library. The arrangement of clones in contigs is determined by (1) estimating cosmid length and determining the likelihoods for all possible pairwise clone overlaps, using the fingerprint data, and (2) using an optimization technique to fit contig maps to these estimates. Two important questions concerning this contig map are how much of chromosome 16 is covered and how accurate are the assembled contigs. Both questions can be addressed by hybridization of single-copy sequence probes to gridded arrays of the cosmids. All of the fingerprinted clones have been arrayed on nylon membranes so that any region of interest can be identified by hybridization. The hybridization experiments indicate that approximately 84% of the euchromatic arms of chromosome 16 are covered by contigs and singleton cosmids. Both grid hybridization (26 contigs) and pulsed-field gel electrophoresis experiments (11 contigs) confirmed the assembled contigs, indicating that false positive overlaps occur infrequently in the present map. Furthermore, regional localization of 93 contigs and singleton cosmids to a somatic cell hybrid mapping panel indicates that there is no bias in the coverage of the euchromatic arms.
Collapse
Affiliation(s)
- R L Stallings
- Life Sciences Division, Los Alamos National Laboratory, New Mexico 87545
| | | | | | | | | | | | | | | | | | | |
Collapse
|