1
|
Bui TV, Echizen I, Kuribayashi M, Kojima T, Nguyen TD. Group Testing with Blocks of Positives and Inhibitors. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1562. [PMID: 36359652 PMCID: PMC9689211 DOI: 10.3390/e24111562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/15/2022] [Accepted: 10/27/2022] [Indexed: 06/16/2023]
Abstract
The main goal of group testing is to identify a small number of specific items among a large population of items. In this paper, we consider specific items as positives and inhibitors and non-specific items as negatives. In particular, we consider a novel model called group testing with blocks of positives and inhibitors. A test on a subset of items is positive if the subset contains at least one positive and does not contain any inhibitors, and it is negative otherwise. In this model, the input items are linearly ordered, and the positives and inhibitors are subsets of small blocks (at unknown locations) of consecutive items over that order. We also consider two specific instantiations of this model. The first instantiation is that model that contains a single block of consecutive items consisting of exactly known numbers of positives and inhibitors. The second instantiation is the model that contains a single block of consecutive items containing known numbers of positives and inhibitors. Our contribution is to propose efficient encoding and decoding schemes such that the numbers of tests used to identify only positives or both positives and inhibitors are less than the ones in the state-of-the-art schemes. Moreover, the decoding times mostly scale to the numbers of tests that are significantly smaller than the state-of-the-art ones, which scale to both the number of tests and the number of items.
Collapse
Affiliation(s)
- Thach V. Bui
- Department of Computer Science, National University of Singapore, Singapore 117417, Singapore
| | - Isao Echizen
- National Institute of Informatics, Tokyo 101-8430, Japan
- Department of Information and Communication Engineering, University of Tokyo, Tokyo 113-8654, Japan
| | - Minoru Kuribayashi
- Graduate School of Natural Science and Technology, Okayama University, Okayama 700-8530, Japan
| | - Tetsuya Kojima
- National Institute of Technology, Tokyo College, Hachioji, Tokyo 193-0997, Japan
| | - Thuc D. Nguyen
- Faculty of Information Technology, University of Science, VNU-HCMC, Ho Chi Minh City 72711, Vietnam
- Faculty of Information Technology, Vietnam National University, Ho Chi Minh City 720300, Vietnam
| |
Collapse
|
2
|
Escobar M, Jeanneret G, Bravo-Sánchez L, Castillo A, Gómez C, Valderrama D, Roa M, Martínez J, Madrid-Wolff J, Cepeda M, Guevara-Suarez M, Sarmiento OL, Medaglia AL, Forero-Shelton M, Velasco M, Pedraza JM, Laajaj R, Restrepo S, Arbelaez P. Smart pooling: AI-powered COVID-19 informative group testing. Sci Rep 2022; 12:6519. [PMID: 35444162 PMCID: PMC9020431 DOI: 10.1038/s41598-022-10128-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 03/15/2022] [Indexed: 11/09/2022] Open
Abstract
Massive molecular testing for COVID-19 has been pointed out as fundamental to moderate the spread of the pandemic. Pooling methods can enhance testing efficiency, but they are viable only at low incidences of the disease. We propose Smart Pooling, a machine learning method that uses clinical and sociodemographic data from patients to increase the efficiency of informed Dorfman testing for COVID-19 by arranging samples into all-negative pools. To do this, we ran an automated method to train numerous machine learning models on a retrospective dataset from more than 8000 patients tested for SARS-CoV-2 from April to July 2020 in Bogotá, Colombia. We estimated the efficiency gains of using the predictor to support Dorfman testing by simulating the outcome of tests. We also computed the attainable efficiency gains of non-adaptive pooling schemes mathematically. Moreover, we measured the false-negative error rates in detecting the ORF1ab and N genes of the virus in RT-qPCR dilutions. Finally, we presented the efficiency gains of using our proposed pooling scheme on proof-of-concept pooled tests. We believe Smart Pooling will be efficient for optimizing massive testing of SARS-CoV-2.
Collapse
Affiliation(s)
- María Escobar
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia
| | - Guillaume Jeanneret
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia
| | - Laura Bravo-Sánchez
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia
| | - Angela Castillo
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia
| | - Catalina Gómez
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia.,Department of Computer Science, Johns Hopkins University, Baltimore, USA
| | - Diego Valderrama
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia
| | - Mafe Roa
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia
| | - Julián Martínez
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia
| | - Jorge Madrid-Wolff
- Laboratory of Applied Photonics Devices, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Martha Cepeda
- School of Science, Universidad de los Andes, Bogotá, Colombia
| | - Marcela Guevara-Suarez
- Applied Genomics Research Group, Vice Presidency for Research and Creation, Universidad de los Andes, Bogotá, Colombia
| | | | - Andrés L Medaglia
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia.,Department of Industrial Engineering, Universidad de los Andes, Bogotá, Colombia
| | | | - Mauricio Velasco
- Department of Mathematics, Universidad de los Andes, Bogotá, Colombia
| | - Juan M Pedraza
- Department of Physics, Universidad de los Andes, Bogotá, Colombia
| | - Rachid Laajaj
- School of Economics, Universidad de los Andes, Bogotá, Colombia
| | - Silvia Restrepo
- Applied Genomics Research Group, Vice Presidency for Research and Creation, Universidad de los Andes, Bogotá, Colombia
| | - Pablo Arbelaez
- Center for Research and Formation in Artificial Intelligence, Universidad de los Andes, Bogotá, Colombia. .,Department of Biomedical Engineering, Universidad de los Andes, Bogotá, Colombia.
| |
Collapse
|
3
|
Sobczyk J, Pyne MT, Barker A, Mayer J, Hanson KE, Samore MH, Noriega R. Efficient and effective single-step screening of individual samples for SARS-CoV-2 RNA using multi-dimensional pooling and Bayesian inference. J R Soc Interface 2021; 18:20210155. [PMID: 34129787 PMCID: PMC8205536 DOI: 10.1098/rsif.2021.0155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Rapid and widespread implementation of infectious disease surveillance is a critical component in the response to novel health threats. Molecular assays are the preferred method to detect a broad range of viral pathogens with high sensitivity and specificity. The implementation of molecular assay testing in a rapidly evolving public health emergency, such as the ongoing COVID-19 pandemic, can be hindered by resource availability or technical constraints. We present a screening strategy that is easily scaled up to support a sustained large volume of testing over long periods of time. This non-adaptive pooled-sample screening protocol employs Bayesian inference to yield a reportable outcome for each individual sample in a single testing step (no confirmation of positive results required). The proposed method is validated using clinical specimens tested using a real-time reverse transcription polymerase chain reaction test for SARS-CoV-2. This screening protocol has substantial advantages for its implementation, including higher sample throughput, faster time to results, no need to retrieve previously screened samples from storage to undergo retesting, and excellent performance of the algorithm's sensitivity and specificity compared with the individual test's metrics.
Collapse
Affiliation(s)
- Juliana Sobczyk
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Michael T Pyne
- ARUP Institute for Clinical and Experimental Pathology®, Salt Lake City, UT, USA
| | - Adam Barker
- ARUP Institute for Clinical and Experimental Pathology®, Salt Lake City, UT, USA
| | - Jeanmarie Mayer
- Division of Epidemiology, University of Utah Health Sciences Center, Salt Lake City, UT, USA.,Division of Infectious Diseases, University of Utah Health Sciences Center, Salt Lake City, UT, USA
| | - Kimberly E Hanson
- ARUP Institute for Clinical and Experimental Pathology®, Salt Lake City, UT, USA.,Division of Infectious Disease, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Matthew H Samore
- Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT, USA.,Informatics, Decision Enhancement, and Analytic Science (IDEAS) Center of Innovation, Veterans Affairs Salt Lake City Health Care System, Salt Lake City, UT, USA
| | - Rodrigo Noriega
- Department of Chemistry, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
4
|
Nakamoto M, Takeuchi Y, Akita K, Kumagai R, Suzuki J, Koyama T, Noda T, Yoshida K, Ozaki A, Araki K, Sakamoto T. A novel C-type lectin gene is a strong candidate gene for Benedenia disease resistance in Japanese yellowtail, Seriola quinqueradiata. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2017; 76:361-369. [PMID: 28705457 DOI: 10.1016/j.dci.2017.07.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 07/08/2017] [Accepted: 07/08/2017] [Indexed: 06/07/2023]
Abstract
Little is known about mechanisms of resistance to parasitic diseases in marine finfish. Benedenia disease is caused by infection by the monogenean parasite Benedenia seriolae. Previous quantitative trait locus (QTL) analyses have identified a major QTL associated with resistance to Benedenia disease in linkage group Squ2 of the Japanese yellowtail/amberjack Seriola quinqueradiata. To uncover the bioregulatory mechanism of Benedenia disease resistance, complete Illumina sequencing of BAC clones carrying genomic DNA for the QTL region in linkage group Squ2 was performed to reveal a novel C-type lectin in this region. Expression of the mRNA of this C-type lectin was detected in skin tissue parasitized by B. seriolae. Scanning for single nucleotide polymorphisms (SNPs) uncovered a SNP in the C-type lectin/C-type lectin-like domain that was significantly associated with B. seriolae infection levels. These results strongly suggest that the novel C-type lectin gene controls resistance to Benedenia disease in Japanese yellowtails.
Collapse
Affiliation(s)
- Masatoshi Nakamoto
- Department of Aquatic Marine Biosciences, Tokyo University of Marine Science and Technology, Tokyo 108-8477, Japan
| | - Yusuke Takeuchi
- Department of Aquatic Marine Biosciences, Tokyo University of Marine Science and Technology, Tokyo 108-8477, Japan
| | - Kazuki Akita
- Department of Aquatic Marine Biosciences, Tokyo University of Marine Science and Technology, Tokyo 108-8477, Japan
| | - Ryo Kumagai
- Department of Aquatic Marine Biosciences, Tokyo University of Marine Science and Technology, Tokyo 108-8477, Japan
| | - Junpei Suzuki
- Department of Aquatic Marine Biosciences, Tokyo University of Marine Science and Technology, Tokyo 108-8477, Japan
| | - Takashi Koyama
- Department of Aquatic Marine Biosciences, Tokyo University of Marine Science and Technology, Tokyo 108-8477, Japan
| | - Tsutomu Noda
- Goto Laboratory of the Seikai National Fisheries Research Institute, Japan Fisheries Research and Education Agency, Nagasaki 853-0508, Japan
| | - Kazunori Yoshida
- Goto Laboratory of the Seikai National Fisheries Research Institute, Japan Fisheries Research and Education Agency, Nagasaki 853-0508, Japan
| | - Akiyuki Ozaki
- National Research Institute of Aquaculture, Japan Fisheries Research and Education Agency, Mie 516-0193, Japan
| | - Kazuo Araki
- National Research Institute of Aquaculture, Japan Fisheries Research and Education Agency, Mie 516-0193, Japan
| | - Takashi Sakamoto
- Department of Aquatic Marine Biosciences, Tokyo University of Marine Science and Technology, Tokyo 108-8477, Japan.
| |
Collapse
|
5
|
Aprahamian H, Bish DR, Bish EK. Residual risk and waste in donated blood with pooled nucleic acid testing. Stat Med 2016; 35:5283-5301. [PMID: 27488928 DOI: 10.1002/sim.7066] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 07/07/2016] [Accepted: 07/08/2016] [Indexed: 11/09/2022]
Abstract
An accurate estimation of the residual risk of transfusion-transmittable infections (TTIs), which includes the human immunodeficiency virus (HIV), hepatitis B and C viruses (HBV, HCV), among others, is essential, as it provides the basis for blood screening assay selection. While the highly sensitive nucleic acid testing (NAT) technology has recently become available, it is highly costly. As a result, in most countries, including the United States, the current practice for human immunodeficiency virus, hepatitis B virus, hepatitis C virus screening in donated blood is to use pooled NAT. Pooling substantially reduces the number of tests required, especially for TTIs with low prevalence rates. However, pooling also reduces the test's sensitivity, because the viral load of an infected sample might be diluted by the other samples in the pool to the point that it is not detectable by NAT, leading to potential TTIs. Infection-free blood may also be falsely discarded, resulting in wasted blood. We derive expressions for the residual risk, expected number of tests, and expected amount of blood wasted for various two-stage pooled testing schemes, including Dorfman-type and array-based testing, considering infection progression, infectivity of the blood unit, and imperfect tests under the dilution effect and measurement errors. We then calibrate our model using published data and perform a case study. Our study offers key insights on how pooled NAT, used within different testing schemes, contributes to the safety and cost of blood. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Hrayer Aprahamian
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, U.S.A..
| | - Douglas R Bish
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, U.S.A
| | - Ebru K Bish
- Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA, U.S.A
| |
Collapse
|
6
|
Asymptotics of pooling design performance. J Appl Probab 2016. [DOI: 10.1017/s0021900200017770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We analyse the expected performance of various group testing, or pooling, designs. The context is that of identifying characterized clones in a large collection of clones. Here we choose as performance criterion the expected number of unresolved ‘negative’ clones, and we aim to minimize this quantity. Technically, long inclusion–exclusion summations are encountered which, aside from being computationally demanding, give little inkling of the qualitative effect of parametric control on the pooling strategy. We show that readily-interpreted re-summation can be performed, leading to asymptotic forms and systematic corrections. We apply our results to randomized designs, illustrating how they might be implemented for approximating combinatorial formulae.
Collapse
|
7
|
Abstract
We analyse the expected performance of various group testing, or pooling, designs. The context is that of identifying characterized clones in a large collection of clones. Here we choose as performance criterion the expected number of unresolved ‘negative’ clones, and we aim to minimize this quantity. Technically, long inclusion–exclusion summations are encountered which, aside from being computationally demanding, give little inkling of the qualitative effect of parametric control on the pooling strategy. We show that readily-interpreted re-summation can be performed, leading to asymptotic forms and systematic corrections. We apply our results to randomized designs, illustrating how they might be implemented for approximating combinatorial formulae.
Collapse
|
8
|
Construction, complete sequence, and annotation of a BAC contig covering the silkworm chorion locus. Sci Data 2015; 2:150062. [PMID: 26594380 PMCID: PMC4640134 DOI: 10.1038/sdata.2015.62] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 08/11/2015] [Indexed: 11/28/2022] Open
Abstract
The silkmoth chorion was studied extensively by F.C. Kafatos’ group for almost 40 years. However, the complete structure of the chorion locus was not obtained in the genome sequence of Bombyx mori published in 2008 due to repetitive sequences, resulting in gaps and an incomplete view of the locus. To obtain the complete sequence of the chorion locus, expressed sequence tags (ESTs) derived from follicular epithelium cells were used as probes to screen a bacterial artificial chromosome (BAC) library. Seven BACs were selected to construct a contig which covered the whole chorion locus. By Sanger sequencing, we successfully obtained complete sequences of the chorion locus spanning 871,711 base pairs on chromosome 2, where we annotated 127 chorion genes. The dataset reported here will recruit more researchers to revisit one of the oldest model systems which has been used to study developmentally regulated gene expression. It also provides insights into egg development and fertilization mechanisms and is relevant to applications related to improvements in breeding procedures and transgenesis.
Collapse
|
9
|
Cviková K, Cattonaro F, Alaux M, Stein N, Mayer KF, Doležel J, Bartoš J. High-throughput physical map anchoring via BAC-pool sequencing. BMC PLANT BIOLOGY 2015; 15:99. [PMID: 25887276 PMCID: PMC4407875 DOI: 10.1186/s12870-015-0429-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Accepted: 01/20/2015] [Indexed: 05/08/2023]
Abstract
BACKGROUND Physical maps created from large insert DNA libraries, typically cloned in BAC vector, are valuable resources for map-based cloning and de novo genome sequencing. The maps are most useful if contigs of overlapping DNA clones are anchored to chromosome(s), and ordered along them using molecular markers. Here we present a novel approach for anchoring physical maps, based on sequencing three-dimensional pools of BAC clones from minimum tilling path. RESULTS We used physical map of wheat chromosome arm 3DS to validate the method with two different DNA sequence datasets. The first comprised 567 genes ordered along the chromosome arm based on syntenic relationship of wheat with the sequenced genomes of Brachypodium, rice and sorghum. The second dataset consisted of 7,136 SNP-containing sequences, which were mapped genetically in Aegilops tauschii, the donor of the wheat D genome. Mapping of sequence reads from individual BAC pools to the first and the second datasets enabled unambiguous anchoring 447 and 311 3DS-specific sequences, respectively, or 758 in total. CONCLUSIONS We demonstrate the utility of the novel approach for BAC contig anchoring based on mass parallel sequencing of three-dimensional pools prepared from minimum tilling path of physical map. The existing genetic markers as well as any other DNA sequence could be mapped to BAC clones in a single in silico experiment. The approach reduces significantly the cost and time needed for anchoring and is applicable to any genomic project involving the construction of anchored physical map.
Collapse
Affiliation(s)
- Kateřina Cviková
- Institute of Experimental Botany, Centre of Region Haná for Biotechnological and Agricultural Research, Šlechtitelů 31, 78371, Olomouc-Holice, Czech Republic.
| | - Federica Cattonaro
- Istituto di Genomica Applicata, Via J. Linussio 51, 33100, Udine, Italy.
| | - Michael Alaux
- INRA, UR1164 URGI - Research Unit in Genomics-Info, INRA de Versailles, Route de Saint-Cyr, 78026, Versailles, France.
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research, Corrensstraße 3, 06466, Stadt Seeland, OT Gatersleben, Germany.
| | - Klaus Fx Mayer
- Plant Genome and Systems Biology, Helmholtz Zentrum München, 85764, Neuherberg, Germany.
| | - Jaroslav Doležel
- Institute of Experimental Botany, Centre of Region Haná for Biotechnological and Agricultural Research, Šlechtitelů 31, 78371, Olomouc-Holice, Czech Republic.
| | - Jan Bartoš
- Institute of Experimental Botany, Centre of Region Haná for Biotechnological and Agricultural Research, Šlechtitelů 31, 78371, Olomouc-Holice, Czech Republic.
| |
Collapse
|
10
|
Cao CC, Li C, Sun X. Quantitative group testing-based overlapping pool sequencing to identify rare variant carriers. BMC Bioinformatics 2014; 15:195. [PMID: 24934981 PMCID: PMC4229885 DOI: 10.1186/1471-2105-15-195] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 06/10/2014] [Indexed: 11/23/2022] Open
Abstract
Background Genome-wide association studies have revealed that rare variants are responsible for a large portion of the heritability of some complex human diseases. This highlights the increasing importance of detecting and screening for rare variants. Although the massively parallel sequencing technologies have greatly reduced the cost of DNA sequencing, the identification of rare variant carriers by large-scale re-sequencing remains prohibitively expensive because of the huge challenge of constructing libraries for thousands of samples. Recently, several studies have reported that techniques from group testing theory and compressed sensing could help identify rare variant carriers in large-scale samples with few pooled sequencing experiments and a dramatically reduced cost. Results Based on quantitative group testing, we propose an efficient overlapping pool sequencing strategy that allows the efficient recovery of variant carriers in numerous individuals with much lower costs than conventional methods. We used random k-set pool designs to mix samples, and optimized the design parameters according to an indicative probability. Based on a mathematical model of sequencing depth distribution, an optimal threshold was selected to declare a pool positive or negative. Then, using the quantitative information contained in the sequencing results, we designed a heuristic Bayesian probability decoding algorithm to identify variant carriers. Finally, we conducted in silico experiments to find variant carriers among 200 simulated Escherichia coli strains. With the simulated pools and publicly available Illumina sequencing data, our method correctly identified the variant carriers for 91.5–97.9% variants with the variant frequency ranging from 0.5 to 1.5%. Conclusions Using the number of reads, variant carriers could be identified precisely even though samples were randomly selected and pooled. Our method performed better than the published DNA Sudoku design and compressed sequencing, especially in reducing the required data throughput and cost.
Collapse
Affiliation(s)
| | | | - Xiao Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| |
Collapse
|
11
|
Zeng Q, Yuan F, Xu X, Shi X, Nie X, Zhuang H, Chen X, Wang Z, Wang X, Huang L, Han D, Kang Z. Construction and characterization of a bacterial artificial chromosome library for the hexaploid wheat line 92R137. BIOMED RESEARCH INTERNATIONAL 2014; 2014:845806. [PMID: 24895618 PMCID: PMC4026951 DOI: 10.1155/2014/845806] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Revised: 03/26/2014] [Accepted: 04/18/2014] [Indexed: 11/18/2022]
Abstract
For map-based cloning of genes conferring important traits in the hexaploid wheat line 92R137, a bacterial artificial chromosome (BAC) library, including two sublibraries, was constructed using the genomic DNA of 92R137 digested with restriction enzymes HindIII and BamHI. The BAC library was composed of total 765,696 clones, of which 390,144 were from the HindIII digestion and 375,552 from the BamHI digestion. Through pulsed-field gel electrophoresis (PFGE) analysis of 453 clones randomly selected from the HindIII sublibrary and 573 clones from the BamHI sublibrary, the average insert sizes were estimated as 129 and 113 kb, respectively. Thus, the HindIII sublibrary was estimated to have a 3.01-fold coverage and the BamHI sublibrary a 2.53-fold coverage based on the estimated hexaploid wheat genome size of 16,700 Mb. The 765,696 clones were arrayed in 1,994 384-well plates. All clones were also arranged into plate pools and further arranged into 5-dimensional (5D) pools. The probability of identifying a clone corresponding to any wheat DNA sequence (such as gene Yr26 for stripe rust resistance) from the library was estimated to be more than 99.6%. Through polymerase chain reaction screening the 5D pools with Xwe173, a marker tightly linked to Yr26, six BAC clones were successfully obtained. These results demonstrate that the BAC library is a valuable genomic resource for positional cloning of Yr26 and other genes of interest.
Collapse
Affiliation(s)
- Qingdong Zeng
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Fengping Yuan
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Agronomy, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xin Xu
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Agronomy, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xue Shi
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Agronomy, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xiaojun Nie
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Agronomy, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Hua Zhuang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xianming Chen
- Wheat Genetics, Quality, Physiology, and Disease Research Unit, Agricultural Research Service, United States Department of Agriculture, and Department of Plant Pathology, Washington State University, Pullman, WA 99164-6430, USA
| | - Zhonghua Wang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Agronomy, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xiaojie Wang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Lili Huang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Dejun Han
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Agronomy, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Zhensheng Kang
- State Key Laboratory of Crop Stress Biology for Arid Areas and College of Plant Protection, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
12
|
Cao CC, Li C, Huang Z, Ma X, Sun X. Identifying rare variants with optimal depth of coverage and cost-effective overlapping pool sequencing. Genet Epidemiol 2013; 37:820-30. [PMID: 24166758 DOI: 10.1002/gepi.21769] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Revised: 09/09/2013] [Accepted: 09/27/2013] [Indexed: 01/19/2023]
Abstract
Genome-wide association studies have identified hundreds of genetic variants associated with complex diseases although most variants identified so far explain only a small proportion of heritability, suggesting that rare variants are responsible for missing heritability. Identification of rare variants through large-scale resequencing becomes increasing important but still prohibitively expensive despite the rapid decline in the sequencing costs. Nevertheless, group testing based overlapping pool sequencing in which pooled rather than individual samples are sequenced will greatly reduces the efforts of sample preparation as well as the costs to screen for rare variants. Here, we proposed an overlapping pool sequencing to screen rare variants with optimal sequencing depth and a corresponding cost model. We formulated a model to compute the optimal depth for sufficient observations of variants in pooled sequencing. Utilizing shifted transversal design algorithm, appropriate parameters for overlapping pool sequencing could be selected to minimize cost and guarantee accuracy. Due to the mixing constraint and high depth for pooled sequencing, results showed that it was more cost-effective to divide a large population into smaller blocks which were tested using optimized strategies independently. Finally, we conducted an experiment to screen variant carriers with frequency equaled 1%. With simulated pools and publicly available human exome sequencing data, the experiment achieved 99.93% accuracy. Utilizing overlapping pool sequencing, the cost for screening variant carriers with frequency equaled 1% in 200 diploid individuals dropped to at least 66% at which target sequencing region was set to 30 Mb.
Collapse
Affiliation(s)
- Chang-Chang Cao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | | | | | | | | |
Collapse
|
13
|
Utilization of super BAC pools and Fluidigm access array platform for high-throughput BAC clone identification: proof of concept. J Biomed Biotechnol 2012; 2012:405940. [PMID: 22910714 PMCID: PMC3403795 DOI: 10.1155/2012/405940] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Accepted: 05/20/2012] [Indexed: 11/17/2022] Open
Abstract
Bacterial artificial chromosome (BAC) libraries are critical for identifying full-length genomic sequences, correlating genetic and physical maps, and comparative genomics. Here we describe the utilization of the Fluidigm access array genotyping system in conjunction with KASPar genotyping technology to identify individual BAC clones corresponding to specific single-nucleotide polymorphisms (SNPs) from an Amplicon Express seven-plate super pooled Amaranthus hypochondriacus BAC library. Ninety-six SNP loci, spanning the length of A. hypochondriacus linkage groups 1, 2, and 15, were simultaneously tested for clone identification from four BAC super pools, corresponding to 28 384-well plates, using a single Fluidigm integrated fluidic chip (IFC). Forty-six percent of the SNPs were associated with a single unambiguous identified BAC clone. PCR amplification and next-generation sequencing of individual BAC clones confirmed the IFC clone identification. Utilization of the Fluidigm Dynamic array platform allowed for the simultaneous PCR screening of 10,752 BAC pools for 96 SNP tag sites in less than three hours at a cost of ~$0.05 per reaction.
Collapse
|
14
|
Kanamori T, Uehara H, Jimbo M. Pooling Design and Bias Correction in DNA Library Screening. JOURNAL OF STATISTICAL THEORY AND PRACTICE 2012. [DOI: 10.1080/15598608.2012.647585] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
15
|
Advances in BAC-based physical mapping and map integration strategies in plants. J Biomed Biotechnol 2012; 2012:184854. [PMID: 22500080 PMCID: PMC3303678 DOI: 10.1155/2012/184854] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2011] [Revised: 10/26/2011] [Accepted: 11/11/2011] [Indexed: 12/29/2022] Open
Abstract
In the advent of next-generation sequencing (NGS) platforms, map-based sequencing strategy has been recently suppressed being too expensive and laborious. The detailed studies on NGS drafts alone indicated these assemblies remain far from gold standard reference quality, especially when applied on complex genomes. In this context the conventional BAC-based physical mapping has been identified as an important intermediate layer in current hybrid sequencing strategy. BAC-based physical map construction and its integration with high-density genetic maps have benefited from NGS and high-throughput array platforms. This paper addresses the current advancements of BAC-based physical mapping and high-throughput map integration strategies to obtain densely anchored well-ordered physical maps. The resulted maps are of immediate utility while providing a template to harness the maximum benefits of the current NGS platforms.
Collapse
|
16
|
de Boer JM, Borm TJA, Jesse T, Brugmans B, Wiggers-Perebolte L, de Leeuw L, Tang X, Bryan GJ, Bakker J, van Eck HJ, Visser RGF. A hybrid BAC physical map of potato: a framework for sequencing a heterozygous genome. BMC Genomics 2011; 12:594. [PMID: 22142254 PMCID: PMC3261212 DOI: 10.1186/1471-2164-12-594] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2011] [Accepted: 12/05/2011] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Potato is the world's third most important food crop, yet cultivar improvement and genomic research in general remain difficult because of the heterozygous and tetraploid nature of its genome. The development of physical map resources that can facilitate genomic analyses in potato has so far been very limited. Here we present the methods of construction and the general statistics of the first two genome-wide BAC physical maps of potato, which were made from the heterozygous diploid clone RH89-039-16 (RH). RESULTS First, a gel electrophoresis-based physical map was made by AFLP fingerprinting of 64478 BAC clones, which were aligned into 4150 contigs with an estimated total length of 1361 Mb. Screening of BAC pools, followed by the KeyMaps in silico anchoring procedure, identified 1725 AFLP markers in the physical map, and 1252 BAC contigs were anchored the ultradense potato genetic map. A second, sequence-tag-based physical map was constructed from 65919 whole genome profiling (WGP) BAC fingerprints and these were aligned into 3601 BAC contigs spanning 1396 Mb. The 39733 BAC clones that overlap between both physical maps provided anchors to 1127 contigs in the WGP physical map, and reduced the number of contigs to around 2800 in each map separately. Both physical maps were 1.64 times longer than the 850 Mb potato genome. Genome heterozygosity and incomplete merging of BAC contigs are two factors that can explain this map inflation. The contig information of both physical maps was united in a single table that describes hybrid potato physical map. CONCLUSIONS The AFLP physical map has already been used by the Potato Genome Sequencing Consortium for sequencing 10% of the heterozygous genome of clone RH on a BAC-by-BAC basis. By layering a new WGP physical map on top of the AFLP physical map, a genetically anchored genome-wide framework of 322434 sequence tags has been created. This reference framework can be used for anchoring and ordering of genomic sequences of clone RH (and other potato genotypes), and opens the possibility to finish sequencing of the RH genome in a more efficient way via high throughput next generation approaches.
Collapse
Affiliation(s)
- Jan M de Boer
- Wageningen UR Plant Breeding, Wageningen University and Research Centre, Droevendaalstesteeg 1, 6708 PD Wageningen, The Netherlands.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Hudgens MG, Kim HY. Optimal Configuration of a Square Array Group Testing Algorithm. COMMUN STAT-THEOR M 2011; 40:436-448. [PMID: 21218195 DOI: 10.1080/03610920903391303] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
We consider the optimal configuration of a square array group testing algorithm (denoted A2) to minimize the expected number of tests per specimen. For prevalence greater than 0.2498, individual testing is shown to be more efficient than A2. For prevalence less than 0.2498, closed form lower and upper bounds on the optimal group sizes for A2 are given. Arrays of dimension 2 × 2, 3 × 3, and 4 × 4 are shown to never be optimal. The results are illustrated by considering the design of a specimen pooling algorithm for detection of recent HIV infections in Malawi.
Collapse
Affiliation(s)
- Michael G Hudgens
- Department of Biostatistics, University of North Carolina at Chapel Hill
| | | |
Collapse
|
18
|
Chen HB, De Bonis A. An almost optimal algorithm for generalized threshold group testing with inhibitors. J Comput Biol 2011; 18:851-64. [PMID: 21210744 DOI: 10.1089/cmb.2010.0030] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Group testing is a search paradigm where one is given a population S of n elements and an P is a subset of S defective elements and the goal is to determine P by performing tests on subsets of S. In classical group testing a test on a subset Q is a subset of S receives a YES response if [formula: see text] ≥ 1, and a NO response otherwise. In group testing with inhibitors (GTI), identifying the defective items is more difficult due to the presence of elements called inhibitors that interfere with the queries so that the answer to a query is YES if and only if the queried group contains at least one defective item and no inhibitor. In the present article, we consider a new generalization of the GTI model in which there are two unknown thresholds h and g and the response to a test is YES both in the case when the queried subset contains at least one defective item and less than h inhibitors, and in the case when the queried subset contains at least g defective items. Moreover, our search model assumes that no knowledge on the number |P| of defective items is given. We derive lower bounds on the minimum number of tests required to determine the defective items under this model and present an algorithm that uses an almost optimal number of tests.
Collapse
Affiliation(s)
- Hong-Bin Chen
- Department of Applied Mathematics, National Chiao Tung University, Hsinchu, Taiwan
| | | |
Collapse
|
19
|
You FM, Luo MC, Xu K, Deal KR, Anderson OD, Dvorak J. A new implementation of high-throughput five-dimensional clone pooling strategy for BAC library screening. BMC Genomics 2010; 11:692. [PMID: 21129228 PMCID: PMC3016418 DOI: 10.1186/1471-2164-11-692] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2010] [Accepted: 12/06/2010] [Indexed: 11/29/2022] Open
Abstract
Background A five-dimensional (5-D) clone pooling strategy for screening of bacterial artificial chromosome (BAC) clones with molecular markers utilizing highly-parallel Illumina GoldenGate assays and PCR facilitates high-throughput BAC clone and BAC contig anchoring on a genetic map. However, this strategy occasionally needs manual PCR to deconvolute pools and identify truly positive clones. Results A new implementation is reported here for our previously reported clone pooling strategy. Row and column pools of BAC clones are divided into sub-pools with 1~2× genome coverage. All BAC pools are screened with Illumina's GoldenGate assay and the BAC pools are deconvoluted to identify individual positive clones. Putative positive BAC clones are then further analyzed to find positive clones on the basis of them being neighbours in a contig. An exhaustive search or brute force algorithm was designed for this deconvolution and integrated into a newly developed software tool, FPCBrowser, for analyzing clone pooling data. This algorithm was used with empirical data for 55 Illumina GoldenGate SNP assays detecting SNP markers mapped on Aegilops tauschii chromosome 2D and Ae. tauschii contig maps. Clones in single contigs were successfully assigned to 48 (87%) specific SNP markers on the map with 91% precision. Conclusion A new implementation of 5-D BAC clone pooling strategy employing both GoldenGate assay screening and assembled BAC contigs is shown here to be a high-throughput, low cost, rapid, and feasible approach to screening BAC libraries and anchoring BAC clones and contigs on genetic maps. The software FPCBrowser with the integrated clone deconvolution algorithm has been developed and is downloadable at http://avena.pw.usda.gov/wheatD/fpcbrowser.shtml.
Collapse
Affiliation(s)
- Frank M You
- Department of Plant Sciences, University of California, Davis, CA 95516, USA
| | | | | | | | | | | |
Collapse
|
20
|
Liu SY, Yu K, Huffner M, Park SJ, Banik M, Pauls KP, Crosby W. Construction of a BAC library and a physical map of a major QTL for CBB resistance of common bean (Phaseolus vulgaris L.). Genetica 2010; 138:709-16. [PMID: 20419470 DOI: 10.1007/s10709-010-9450-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2009] [Accepted: 02/25/2010] [Indexed: 11/29/2022]
Abstract
A major quantitative trait loci (QTL) conditioning common bacterial blight (CBB) resistance in common bean (Phaseolus vulgaris L.) lines HR45 and HR67 was derived from XAN159, a resistant line obtained from an interspecific cross between common bean lines and the tepary bean (P. acutifolius L.) line PI319443. This source of CBB resistance is widely used in bean breeding. Several other CBB resistance QTL have been identified but none of them have been physically mapped. Four molecular markers tightly linked to this QTL have been identified suitable for marker assisted selection and physical mapping of the resistance gene. A bacterial artificial chromosome (BAC) library was constructed from high molecular weight DNA of HR45 and is composed of 33,024 clones. The size of individual BAC clone inserts ranges from 30 kb to 280 kb with an average size of 107 kb. The library is estimated to represent approximately sixfold genome coverage. The BAC library was screened as BAC pools using four PCR-based molecular markers. Two to seven BAC clones were identified by each marker. Two clones were found to have both markers PV-tttc001 and STS183. One preliminary contig was assembled based on DNA finger printing of those positive BAC clones. The minimum tiling path of the contig contains 6 BAC clones spanning an estimated size of 750 kb covering the QTL region.
Collapse
Affiliation(s)
- S Y Liu
- Agriculture Agri-Food Canada, Greenhouse and Processing Crops Research Center, Harrow, ON, N0R 1G0, Canada
| | | | | | | | | | | | | |
Collapse
|
21
|
Febrer M, Wilhelm E, Al-Kaff N, Wright J, Powell W, Bevan MW, Boulton MI. Rapid identification of the three homoeologues of the wheat dwarfing gene Rht using a novel PCR-based screen of three-dimensional BAC pools. Genome 2010; 52:993-1000. [PMID: 19953127 DOI: 10.1139/g09-073] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A high-throughput two-step PCR strategy for the identification of selected genes from a BAC library derived from hexaploid wheat (16,974 Mbp) is described. The screen is based on the pooling of DNA from BAC clones into 675 "superpools" arrayed in a three-dimensional configuration. Each BAC clone is represented in three superpools to allow the identification of candidate 384-well plates of clones after the first round of PCR; identification is facilitated by an associated Perl script. A second round of PCR detects the specific BAC clone within the candidate plate that corresponds to the gene of interest. Thus, a single copy of the target gene can be identified from the library of over 700,000 clones (approximately 5 genome equivalents) by assaying only three 384-well plates. The pooling strategy was validated by screening the library with primers specific for the reduced height (Rht-1a) gene. Using relatively stringent selection criteria, 13 Rht-containing clones were identified from 17 candidate plates, and sequence analysis of the amplified products showed that all three Rht homoeologues were represented. Furthermore, the method confirmed the estimated coverage of the BAC library. Thus, this methodology allows the rapid and cost-effective identification of genes, and their homoeologues, from large-insert libraries of complex genomes such as hexaploid wheat.
Collapse
Affiliation(s)
- Melanie Febrer
- John Innes Centre, Norwich Research Park, Colney, Norwich, NR4 7UH, UK
| | | | | | | | | | | | | |
Collapse
|
22
|
Erlich Y, Gordon A, Brand M, Hannon GJ, Mitra PP. Compressed Genotyping. IEEE TRANSACTIONS ON INFORMATION THEORY 2010; 56:706-723. [PMID: 21451737 PMCID: PMC3065185 DOI: 10.1109/tit.2009.2037043] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Over the past three decades we have steadily increased our knowledge on the genetic basis of many severe disorders. Nevertheless, there are still great challenges in applying this knowledge routinely in the clinic, mainly due to the relatively tedious and expensive process of genotyping. Since the genetic variations that underlie the disorders are relatively rare in the population, they can be thought of as a sparse signal. Using methods and ideas from compressed sensing and group testing, we have developed a cost-effective genotyping protocol to detect carriers for severe genetic disorders. In particular, we have adapted our scheme to a recently developed class of high throughput DNA sequencing technologies. The mathematical framework presented here has some important distinctions from the 'traditional' compressed sensing and group testing frameworks in order to address biological and technical constraints of our setting.
Collapse
Affiliation(s)
- Yaniv Erlich
- Watson School of Biological Science, Cold Spring Harbor Laboratory, NY, 11724 USA
| | - Assaf Gordon
- Watson School of Biological Science, Cold Spring Harbor Laboratory, NY, 11724 USA
| | | | - Gregory J. Hannon
- Watson School of Biological Science, Cold Spring Harbor Laboratory, NY, 11724 USA
| | - Partha P. Mitra
- Watson School of Biological Science, Cold Spring Harbor Laboratory, NY, 11724 USA
| |
Collapse
|
23
|
Uehara H, Jimbo M. A positive detecting code and its decoding algorithm for DNA library screening. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:652-666. [PMID: 19875863 DOI: 10.1109/tcbb.2007.70266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The study of gene functions requires high-quality DNA libraries. However, a large number of tests and screenings are necessary for compiling such libraries. We describe an algorithm for extracting as much information as possible from pooling experiments for library screening. Collections of clones are called pools, and a pooling experiment is a group test for detecting all positive clones. The probability of positiveness for each clone is estimated according to the outcomes of the pooling experiments. Clones with high chance of positiveness are subjected to confirmatory testing. In this paper, we introduce a new positive clone detecting algorithm, called the Bayesian network pool result decoder (BNPD). The performance of BNPD is compared, by simulation, with that of the Markov chain pool result decoder (MCPD) proposed by Knill et al. in 1996. Moreover, the combinatorial properties of pooling designs suitable for the proposed algorithm are discussed in conjunction with combinatorial designs and d-disjunct matrices. We also show the advantage of utilizing packing designs or BIB designs for the BNPD algorithm.
Collapse
Affiliation(s)
- Hiroaki Uehara
- Department of Mathematics, Keio University, 3-14-1 Hiyoshi, Kouhoku-ku, Yokohama 223-8522, Japan.
| | | |
Collapse
|
24
|
Xin X, Rual JF, Hirozane-Kishikawa T, Hill DE, Vidal M, Boone C, Thierry-Mieg N. Shifted Transversal Design smart-pooling for high coverage interactome mapping. Genome Res 2009; 19:1262-9. [PMID: 19447967 DOI: 10.1101/gr.090019.108] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
"Smart-pooling," in which test reagents are multiplexed in a highly redundant manner, is a promising strategy for achieving high efficiency, sensitivity, and specificity in systems-level projects. However, previous applications relied on low redundancy designs that do not leverage the full potential of smart-pooling, and more powerful theoretical constructions, such as the Shifted Transversal Design (STD), lack experimental validation. Here we evaluate STD smart-pooling in yeast two-hybrid (Y2H) interactome mapping. We employed two STD designs and two established methods to perform ORFeome-wide Y2H screens with 12 baits. We found that STD pooling achieves similar levels of sensitivity and specificity as one-on-one array-based Y2H, while the costs and workloads are divided by three. The screening-sequencing approach is the most cost- and labor-efficient, yet STD identifies about twofold more interactions. Screening-sequencing remains an appropriate method for quickly producing low-coverage interactomes, while STD pooling appears as the method of choice for obtaining maps with higher coverage.
Collapse
Affiliation(s)
- Xiaofeng Xin
- Banting and Best Department of Medical Research and Department of Molecular Genetics, Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | | | | | | | | | | | | |
Collapse
|
25
|
Erlich Y, Chang K, Gordon A, Ronen R, Navon O, Rooks M, Hannon GJ. DNA Sudoku--harnessing high-throughput sequencing for multiplexed specimen analysis. Genome Res 2009; 19:1243-53. [PMID: 19447965 DOI: 10.1101/gr.092957.109] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Next-generation sequencers have sufficient power to analyze simultaneously DNAs from many different specimens, a practice known as multiplexing. Such schemes rely on the ability to associate each sequence read with the specimen from which it was derived. The current practice of appending molecular barcodes prior to pooling is practical for parallel analysis of up to many dozen samples. Here, we report a strategy that permits simultaneous analysis of tens of thousands of specimens. Our approach relies on the use of combinatorial pooling strategies in which pools rather than individual specimens are assigned barcodes. Thus, the identity of each specimen is encoded within the pooling pattern rather than by its association with a particular sequence tag. Decoding the pattern allows the sequence of an original specimen to be inferred with high confidence. We verified the ability of our encoding and decoding strategies to accurately report the sequence of individual samples within a large number of mixed specimens in two ways. First, we simulated data both from a clone library and from a human population in which a sequence variant associated with cystic fibrosis was present. Second, we actually pooled, sequenced, and decoded identities within two sets of 40,000 bacterial clones comprising approximately 20,000 different artificial microRNAs targeting Arabidopsis or human genes. We achieved greater than 97% accuracy in these trials. The strategies reported here can be applied to a wide variety of biological problems, including the determination of genotypic variation within large populations of individuals.
Collapse
Affiliation(s)
- Yaniv Erlich
- Watson School of Biological Sciences, Howard Hughes Medical Institute, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | | | | | | | | | | | |
Collapse
|
26
|
Coates BS, Sumerford DV, Hellmich RL, Lewis LC. Repetitive genome elements in a European corn borer, Ostrinia nubilalis, bacterial artificial chromosome library were indicated by bacterial artificial chromosome end sequencing and development of sequence tag site markers: implications for lepidopteran genomic research. Genome 2009; 52:57-67. [PMID: 19132072 DOI: 10.1139/g08-104] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The European corn borer, Ostrinia nubilalis, is a serious pest of food, fiber, and biofuel crops in Europe, North America, and Asia and a model system for insect olfaction and speciation. A bacterial artificial chromosome library constructed for O. nubilalis contains 36 864 clones with an estimated average insert size of >or=120 kb and genome coverage of 8.8-fold. Screening OnB1 clones comprising approximately 2.76 genome equivalents determined the physical position of 24 sequence tag site markers, including markers linked to ecologically important and Bacillus thuringiensis toxin resistance traits. OnB1 bacterial artificial chromosome end sequence reads (GenBank dbGSS accessions ET217010 to ET217273) showed homology to annotated genes or expressed sequence tags and identified repetitive genome elements, O. nubilalis miniature subterminal inverted repeat transposable elements (OnMITE01 and OnMITE02), and ezi-like long interspersed nuclear elements. Mobility of OnMITE01 was demonstrated by the presence or absence in O. nubilalis of introns at two different loci. A (GTCT)n tetranucleotide repeat at the 5' ends of OnMITE01 and OnMITE02 are evidence for transposon-mediated movement of lepidopteran microsatellite loci. The number of repetitive elements in lepidopteran genomes will affect genome assembly and marker development. Single-locus sequence tag site markers described here have downstream application for integration within linkage maps and comparative genomic studies.
Collapse
Affiliation(s)
- Brad S Coates
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Genetics Laboratory, Iowa State University, Ames, IA 50011, USA.
| | | | | | | |
Collapse
|
27
|
Luo MC, Xu K, Ma Y, Deal KR, Nicolet CM, Dvorak J. A high-throughput strategy for screening of bacterial artificial chromosome libraries and anchoring of clones on a genetic map constructed with single nucleotide polymorphisms. BMC Genomics 2009; 10:28. [PMID: 19149906 PMCID: PMC2647554 DOI: 10.1186/1471-2164-10-28] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2008] [Accepted: 01/18/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Current techniques of screening bacterial artificial chromosome (BAC) libraries for molecular markers during the construction of physical maps are slow, laborious and often assign multiple BAC contigs to a single locus on a genetic map. These limitations are the principal impediment in the construction of physical maps of large eukaryotic genomes. It is hypothesized that this impediment can be overcome by screening multidimensional pools of BAC clones using the highly parallel Illumina GoldenGate assay. RESULTS To test the efficacy of the Golden Gate assay in BAC library screening, multidimensional pools involving 302976 Aegilops tauschii BAC clones were genotyped for the presence/absence of specific gene sequences with multiplexed Illumina GoldenGate oligonucleotide assays previously used to place single nucleotide polymorphisms on an Ae. tauschii genetic map. Of 1384 allele-informative oligonucleotide assays, 87.6% successfully clustered BAC pools into those positive for a BAC clone harboring a specific gene locus and those negative for it. The location of the positive BAC clones within contigs assembled from 199190 fingerprinted Ae. tauschii BAC clones was used to evaluate the precision of anchoring of BAC clones and contigs on the Ae. tauschii genetic map. For 41 (95%) assays, positive BAC clones were neighbors in single contigs. Those contigs could be unequivocally assigned to loci on the genetic map. For two (5%) assays, positive clones were in two different contigs and the relationships of these contigs to loci on the Ae. tauschii genetic map were equivocal. Screening of BAC libraries with a simple five-dimensional BAC pooling strategy was evaluated and shown to allow direct detection of positive BAC clones without the need for manual deconvolution of BAC clone pools. CONCLUSION The highly parallel Illumina oligonucleotide assay is shown here to be an efficient tool for screening BAC libraries and a strategy for high-throughput anchoring of BAC contigs on genetic maps during the construction of physical maps of eukaryotic genomes. In most cases, screening of BAC libraries with Illumina oligonucleotide assays results in the unequivocal relationship of BAC clones with loci on the genetic map.
Collapse
Affiliation(s)
- Ming-Cheng Luo
- Department of Plant Sciences, University of California, Davis, CA 95616,
| | | | | | | | | | | |
Collapse
|
28
|
Wu X, Zhong G, Findley SD, Cregan P, Stacey G, Nguyen HT. Genetic marker anchoring by six-dimensional pools for development of a soybean physical map. BMC Genomics 2008; 9:28. [PMID: 18211698 PMCID: PMC2259328 DOI: 10.1186/1471-2164-9-28] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2007] [Accepted: 01/22/2008] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Integrated genetic and physical maps are extremely valuable for genomic studies and as important references for assembling whole genome shotgun sequences. Screening of a BAC library using molecular markers is an indispensable procedure for integration of both physical and genetic maps of a genome. Molecular markers provide anchor points for integration of genetic and physical maps and also validate BAC contigs assembled based solely on BAC fingerprints. We employed a six-dimensional BAC pooling strategy and an in silico approach to anchor molecular markers onto the soybean physical map. RESULTS A total of 1,470 markers (580 SSRs and 890 STSs) were anchored by PCR on a subset of a Williams 82 BstY I BAC library pooled into 208 pools in six dimensions. This resulted in 7,463 clones (approximately 1x genome equivalent) associated with 1470 markers, of which the majority of clones (6,157, 82.5%) were anchored by one marker and 1106 (17.5%) individual clones contained two or more markers. This contributed to 1184 contigs having anchor points through this 6-D pool screening effort. In parallel, the 21,700 soybean Unigene set from NCBI was used to perform in silico mapping on 80,700 Williams 82 BAC end sequences (BES). This in silico analysis yielded 9,835 positive results anchored by 4152 unigenes that contributed to 1305 contigs and 1624 singletons. Among the 1305 contigs, 305 have not been previously anchored by PCR. Therefore, 1489 (78.8%) of 1893 contigs are anchored with molecular markers. These results are being integrated with BAC fingerprints to assemble the BAC contigs. Ultimately, these efforts will lead to an integrated physical and genetic map resource. CONCLUSION We demonstrated that the six-dimensional soybean BAC pools can be efficiently used to anchor markers to soybean BACs despite the complexity of the soybean genome. In addition to anchoring markers, the 6-D pooling method was also effective for targeting BAC clones for investigating gene families and duplicated regions in the genome, as well as for extending physical map contigs.
Collapse
Affiliation(s)
- Xiaolei Wu
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri-Columbia, Columbia, MO 65211, USA
| | - Guohua Zhong
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri-Columbia, Columbia, MO 65211, USA
| | - Seth D Findley
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri-Columbia, Columbia, MO 65211, USA
| | - Perry Cregan
- Soybean Genomics and Improvement Laboratory, USDA-ARS, Beltsville, MD 20705, USA
| | - Gary Stacey
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri-Columbia, Columbia, MO 65211, USA
- Department of Biochemistry; Department of Molecular Microbiology and Immunology, University of Missouri-Columbia, Columbia, MO 65211, USA
| | - Henry T Nguyen
- Division of Plant Sciences and National Center for Soybean Biotechnology, University of Missouri-Columbia, Columbia, MO 65211, USA
| |
Collapse
|
29
|
Abstract
MOTIVATION In high-throughput projects aiming to identify rare positives using a binary assay, smart-pooling constitutes an appealing strategy liable of significantly reducing the number of tests while correcting for experimental noise. In order to perform simulations for choosing an appropriate set of pools, and later to interpret the experimental results, the pool outcomes must be 'decoded'. The intuitive aim is clearly to identify the positives that gave rise to an observation, whether real or simulated. However, this goal is not well-formalized and has been the focus of very few studies. RESULTS We first provide a clear combinatorial formalization of the 'decoding problem'. We then present interpool, an exact algorithm to solve this problem. An efficient implementation is freely available. Its usefulness is illustrated in the context of yeast-two-hybrid interactome mapping with the Shifted Transversal Design. AVAILABILITY The implementation, licensed under the GNU GPL, can be downloaded from http://www-timc.imag.fr/Nicolas.Thierry-Mieg/.
Collapse
|
30
|
Explicit Non-adaptive Combinatorial Group Testing Schemes. AUTOMATA, LANGUAGES AND PROGRAMMING 2008. [DOI: 10.1007/978-3-540-70575-8_61] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
31
|
Kim HY, Hudgens MG, Dreyfuss JM, Westreich DJ, Pilcher CD. Comparison of Group Testing Algorithms for Case Identification in the Presence of Test Error. Biometrics 2007; 63:1152-63. [PMID: 17501946 DOI: 10.1111/j.1541-0420.2007.00817.x] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
We derive and compare the operating characteristics of hierarchical and square array-based testing algorithms for case identification in the presence of testing error. The operating characteristics investigated include efficiency (i.e., expected number of tests per specimen) and error rates (i.e., sensitivity, specificity, positive and negative predictive values, per-family error rate, and per-comparison error rate). The methodology is illustrated by comparing different pooling algorithms for the detection of individuals recently infected with HIV in North Carolina and Malawi.
Collapse
Affiliation(s)
- Hae-Young Kim
- Department of Biostatistics, School of Public Health, University of North Carolina at Chapel Hill, 3107-E McGavran-Greenberg Hall, Chapel Hill, North Carolina 27599, USA
| | | | | | | | | |
Collapse
|
32
|
A BAC pooling strategy combined with PCR-based screenings in a large, highly repetitive genome enables integration of the maize genetic and physical maps. BMC Genomics 2007; 8:47. [PMID: 17291341 PMCID: PMC1821331 DOI: 10.1186/1471-2164-8-47] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2005] [Accepted: 02/09/2007] [Indexed: 11/24/2022] Open
Abstract
Background Molecular markers serve three important functions in physical map assembly. First, they provide anchor points to genetic maps facilitating functional genomic studies. Second, they reduce the overlap required for BAC contig assembly from 80 to 50 percent. Finally, they validate assemblies based solely on BAC fingerprints. We employed a six-dimensional BAC pooling strategy in combination with a high-throughput PCR-based screening method to anchor the maize genetic and physical maps. Results A total of 110,592 maize BAC clones (~ 6x haploid genome equivalents) were pooled into six different matrices, each containing 48 pools of BAC DNA. The quality of the BAC DNA pools and their utility for identifying BACs containing target genomic sequences was tested using 254 PCR-based STS markers. Five types of PCR-based STS markers were screened to assess potential uses for the BAC pools. An average of 4.68 BAC clones were identified per marker analyzed. These results were integrated with BAC fingerprint data generated by the Arizona Genomics Institute (AGI) and the Arizona Genomics Computational Laboratory (AGCoL) to assemble the BAC contigs using the FingerPrinted Contigs (FPC) software and contribute to the construction and anchoring of the physical map. A total of 234 markers (92.5%) anchored BAC contigs to their genetic map positions. The results can be viewed on the integrated map of maize [1,2]. Conclusion This BAC pooling strategy is a rapid, cost effective method for genome assembly and anchoring. The requirement for six replicate positive amplifications makes this a robust method for use in large genomes with high amounts of repetitive DNA such as maize. This strategy can be used to physically map duplicate loci, provide order information for loci in a small genetic interval or with no genetic recombination, and loci with conflicting hybridization-based information.
Collapse
|
33
|
Muñoz-Zanzi C, Thurmond M, Hietala S, Johnson W. Factors affecting sensitivity and specificity of pooled-sample testing for diagnosis of low prevalence infections. Prev Vet Med 2006; 74:309-22. [PMID: 16427711 DOI: 10.1016/j.prevetmed.2005.12.006] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2005] [Revised: 12/16/2005] [Accepted: 12/19/2005] [Indexed: 11/18/2022]
Abstract
Testing of pooled samples has been proposed as a low-cost alternative for diagnostic screening and surveillance for infectious agents in situations where the prevalence of infection is low and most samples can be expected to test negative. The present study extends our previous work in pooled-sample testing (PST) to evaluate effects of the following factors on the overall PST sensitivity (SE(k)) and specificity (SP(k)): dilution (pool size), cross-contamination, and cross-reaction. A probabilistic model, in conjunction with Monte Carlo simulations, was used to calculate SE(k) and SP(k), as applied to detection of bovine viral diarrhea virus (BVDV) persistently infected (PI) animals using RT-PCR. For an average prevalence of BVDV PI of 0.01 and viremia in each animal between 10(2) and 10(7)virusparticles/mL, the pool size associated with the lowest number of tests, and lowest cost, corresponded to eight samples/pool. However, the least-cost pool size (lowest number of tests) was associated with a SE(k) of 0.90 (0.75-1), which corresponded to a decrease of 0.04, relative to the assay sensitivity for a single sample. The SP(k) for the same pool size, considering the effect of detection of BVDV acutely infected animals and cross-contamination as source of false positive results, was 0.90 (0.85-0.95). The effect of a hypothetical cross-reacting agent was to markedly decrease SP(k), especially as the prevalence of the cross-reacting agent increased. For a pool size of eight samples and a prevalence of the cross-reacting agent of 0.3, SP(k) ranged from 0.67 to 0.86, depending on the probability that the assay would detect the cross-reacting agent. The methods presented offer a means of evaluating and understanding the various factors that can influence overall accuracy of PST procedures.
Collapse
Affiliation(s)
- Claudia Muñoz-Zanzi
- Department of Veterinary Population Medicine, College of Veterinary Medicine, University of Minnesota, 1365 Gortner Avenue, St. Paul, 55108, USA
| | | | | | | |
Collapse
|
34
|
Thierry-Mieg N. A new pooling strategy for high-throughput screening: the Shifted Transversal Design. BMC Bioinformatics 2006; 7:28. [PMID: 16423300 PMCID: PMC1409803 DOI: 10.1186/1471-2105-7-28] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2005] [Accepted: 01/19/2006] [Indexed: 11/25/2022] Open
Abstract
Background In binary high-throughput screening projects where the goal is the identification of low-frequency events, beyond the obvious issue of efficiency, false positives and false negatives are a major concern. Pooling constitutes a natural solution: it reduces the number of tests, while providing critical duplication of the individual experiments, thereby correcting for experimental noise. The main difficulty consists in designing the pools in a manner that is both efficient and robust: few pools should be necessary to correct the errors and identify the positives, yet the experiment should not be too vulnerable to biological shakiness. For example, some information should still be obtained even if there are slightly more positives or errors than expected. This is known as the group testing problem, or pooling problem. Results In this paper, we present a new non-adaptive combinatorial pooling design: the "shifted transversal design" (STD). It relies on arithmetics, and rests on two intuitive ideas: minimizing the co-occurrence of objects, and constructing pools of constant-sized intersections. We prove that it allows unambiguous decoding of noisy experimental observations. This design is highly flexible, and can be tailored to function robustly in a wide range of experimental settings (i.e., numbers of objects, fractions of positives, and expected error-rates). Furthermore, we show that our design compares favorably, in terms of efficiency, to the previously described non-adaptive combinatorial pooling designs. Conclusion This method is currently being validated by field-testing in the context of yeast-two-hybrid interactome mapping, in collaboration with Marc Vidal's lab at the Dana Farber Cancer Institute. Many similar projects could benefit from using the Shifted Transversal Design.
Collapse
Affiliation(s)
- Nicolas Thierry-Mieg
- Laboratoire Logiciels-Systèmes-Réseaux, IMAG Institute, BP53, 38041 Grenoble Cedex 9, France.
| |
Collapse
|
35
|
D'Yachkov A, Hwang F, Macula A, Vilenkin P, Weng CW. A construction of pooling designs with some happy surprises. J Comput Biol 2005; 12:1129-36. [PMID: 16241902 DOI: 10.1089/cmb.2005.12.1129] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The screening of data sets for "positive data objects" is essential to modern technology. A (group) test that indicates whether a positive data object is in a specific subset or pool of the dataset can greatly facilitate the identification of all the positive data objects. A collection of tested pools is called a pooling design. Pooling designs are standard experimental tools in many biotechnical applications. In this paper, we use the (linear) subspace relation coupled with the general concept of a "containment matrix" to construct pooling designs with surprisingly high degrees of error correction (detection.) Error-correcting pooling designs are important to biotechnical applications where error rates often are as high as 15%. What is also surprising is that the rank of the pooling design containment matrix is independent of the number of positive data objects in the dataset.
Collapse
Affiliation(s)
- A D'Yachkov
- Department of Probability Theory, Faculty of Mechanics and Mathematics, Moscow State University, Moscow, 119992, Russia.
| | | | | | | | | |
Collapse
|
36
|
Polizzi KM, Spencer CU, Dubey A, Matsumura I, Lee JH, Realff MJ, Bommarius AS. Simulation Modeling of Pooling for Combinatorial Protein Engineering. ACTA ACUST UNITED AC 2005; 10:856-64. [PMID: 16234344 DOI: 10.1177/1087057105280134] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Pooling in directed-evolution experiments will greatly increase the throughput of screening systems, but important parameters such as the number of good mutants created and the activity level increase of the good mutants will depend highly on the protein being engineered. The authors developed and validated a Monte Carlo simulation model of pooling that allows the testing of various scenarios in silico before starting experimentation. Using a simplified test system of 2 enzymes, ßgalactosidase (supermutant, or greatly improved enzyme) and •-glucuronidase (dud, or enzyme with ancestral level of activity), themodel accurately predicted the number of supermutants detected in experimentswithin a factor of 2. Additional simulations usingmore complex activity distributions showthe versatility of themodel. Pooling ismost suited to cases such as the directed evolution of newfunction in a protein, where the background level of activity is minimized, making it easier to detect small increases in activity level. Pooling ismost successful when a sensitive assay is employed. Using the modelwill increase the throughput of screening procedures for directed-evolution experiments and thus lead to speedier engineering of proteins.
Collapse
Affiliation(s)
- Karen M Polizzi
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta 30332-0100, USA
| | | | | | | | | | | | | |
Collapse
|
37
|
Csurös M, Milosavljevic A. Pooled Genomic Indexing (PGI): analysis and design of experiments. J Comput Biol 2005; 11:1001-21. [PMID: 15700414 DOI: 10.1089/cmb.2004.11.1001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Pooled Genomic Indexing (PGI) is a novel method for physical mapping of clones onto known sequences. PGI is carried out by pooling arrayed clones and generating shotgun sequence reads from the pools. The shotgun sequences are compared to a reference sequence. In the simplest case, clones are placed on an array and are pooled by rows and columns. If a shotgun sequence from a row pool and another shotgun sequence from a column pool match the reference sequence at a close distance, they are both assigned to the clone at the intersection of the two pools. Accordingly, the clone is mapped onto the region of the reference sequence between the two matches. A probabilistic model for PGI is developed, and several pooling designs are described and analyzed, including transversal designs and designs from linear codes. The probabilistic model and the pooling schemes are validated in simulated experiments where 625 rat bacterial artificial chromosome (BAC) clones and 207 mouse BAC clones are mapped onto homologous human sequence.
Collapse
Affiliation(s)
- Miklós Csurös
- Département d'informatique et de recherche opérationnelle, Université de Montréal, CP 6128 succ. Centre-Ville, Montréal, QC H3C 3J7, Canada.
| | | |
Collapse
|
38
|
Leveau JHJ, Gerards S, de Boer W, van Veen JA. Phylogeny-function analysis of (meta)genomic libraries: screening for expression of ribosomal RNA genes by large-insert library fluorescent in situ hybridization (LIL-FISH). Environ Microbiol 2004; 6:990-8. [PMID: 15305924 DOI: 10.1111/j.1462-2920.2004.00673.x] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
We assessed the utility of fluorescent in situ hybridization (FISH) in the screening of clone libraries of (meta)genomic or environmental DNA for the presence and expression of bacterial ribosomal RNA (rRNA) genes. To establish proof-of-principle, we constructed a fosmid-based library in Escherichia coli of large-sized genomic DNA fragments of the mycophagous soil bacterium Collimonas fungivorans, and hybridized 768 library clones with the Collimonas-specific fluorescent probe CTE998-1015. Critical to the success of this approach (which we refer to as large-insert library FISH or LIL-FISH) was the ability to induce fosmid copy number, the exponential growth status of library clones in the FISH assay and the use of a simple pooling strategy to reduce the number of hybridizations. Twelve out of 768 E. coli clones were suspected to harbour and express Collimonas 16S rRNA genes based on their hybridization to CTE998-1015. This was confirmed by the finding that all 12 clones were also identified in an independent polymerase chain reaction-based screening of the same 768 clones using a primer set for the specific detection of Collimonas 16S ribosomal DNA (rDNA). Fosmids isolated from these clones were grouped by restriction analysis into two distinct contigs, confirming that C. fungivorans harbours at least two 16S rRNA genes. For one contig, representing 1-2% of the genome, the nucleotide sequence was determined, providing us with a narrow but informative view of Collimonas genome structure and content.
Collapse
Affiliation(s)
- Johan H J Leveau
- Netherlands Institute of Ecology (NIOO-KNAW), Centre for Terrestrial Ecology, Boterhoeksestraat 48, 6666 GA Heteren, the Netherlands.
| | | | | | | |
Collapse
|
39
|
Gardiner J, Schroeder S, Polacco ML, Sanchez-Villeda H, Fang Z, Morgante M, Landewe T, Fengler K, Useche F, Hanafey M, Tingey S, Chou H, Wing R, Soderlund C, Coe EH. Anchoring 9,371 maize expressed sequence tagged unigenes to the bacterial artificial chromosome contig map by two-dimensional overgo hybridization. PLANT PHYSIOLOGY 2004; 134:1317-26. [PMID: 15020742 PMCID: PMC419808 DOI: 10.1104/pp.103.034538] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Our goal is to construct a robust physical map for maize (Zea mays) comprehensively integrated with the genetic map. We have used a two-dimensional 24 x 24 overgo pooling strategy to anchor maize expressed sequence tagged (EST) unigenes to 165,888 bacterial artificial chromosomes (BACs) on high-density filters. A set of 70,716 public maize ESTs seeded derivation of 10,723 EST unigene assemblies. From these assemblies, 10,642 overgo sequences of 40 bp were applied as hybridization probes. BAC addresses were obtained for 9,371 overgo probes, representing an 88% success rate. More than 96% of the successful overgo probes identified two or more BACs, while 5% identified more than 50 BACs. The majority of BACs identified (79%) were hybridized with one or two overgos. A small number of BACs hybridized with eight or more overgos, suggesting that these BACs must be gene rich. Approximately 5,670 overgos identified BACs assembled within one contig, indicating that these probes are highly locus specific. A total of 1,795 megabases (Mb; 87%) of the total 2,050 Mb in BAC contigs were associated with one or more overgos, which are serving as sequence-tagged sites for single nucleotide polymorphism development. Overgo density ranged from less than one overgo per megabase to greater than 20 overgos per megabase. The majority of contigs (52%) hit by overgos contained three to nine overgos per megabase. Analysis of approximately 1,022 Mb of genetically anchored BAC contigs indicates that 9,003 of the total 13,900 overgo-contig sites are genetically anchored. Our results indicate overgos are a powerful approach for generating gene-specific hybridization probes that are facilitating the assembly of an integrated genetic and physical map for maize.
Collapse
Affiliation(s)
- Jack Gardiner
- Department of Agronomy, University of Missouri, Columbia, Missouri 65211, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
De Bonis A, Gąsieniec L, Vaccaro U. Generalized Framework for Selectors with Applications in Optimal Group Testing. AUTOMATA, LANGUAGES AND PROGRAMMING 2003. [DOI: 10.1007/3-540-45061-0_8] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
41
|
Abstract
Genome-wide association studies may be necessary to identify genes underlying certain complex diseases. Because such studies can be extremely expensive, DNA pooling has been introduced, as it may greatly reduce the genotyping burden. Parallel to DNA pooling developments, the importance of haplotypes in genetic studies has been amply demonstrated in the literature. However, DNA pooling of a large number of samples may lose haplotype information among tightly linked genetic markers. Here, we examine the cost-effectiveness of DNA pooling in the estimation of haplotype frequencies from population data. When the maximum likelihood estimates of haplotype frequencies are obtained from pooled samples, we compare the overall cost of the study, including both DNA collection and marker genotyping, between the individual genotyping strategy and the DNA pooling strategy. We find that the DNA pooling of two individuals can be more cost-effective than individual genotypings, especially when a large number of haplotype systems are studied.
Collapse
Affiliation(s)
- Shuang Wang
- Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, Connecticut 06520-8034, USA
| | | | | |
Collapse
|
42
|
Klein PE, Klein RR, Cartinhour SW, Ulanch PE, Dong J, Obert JA, Morishige DT, Schlueter SD, Childs KL, Ale M, Mullet JE. A high-throughput AFLP-based method for constructing integrated genetic and physical maps: progress toward a sorghum genome map. Genome Res 2000; 10:789-807. [PMID: 10854411 PMCID: PMC310885 DOI: 10.1101/gr.10.6.789] [Citation(s) in RCA: 164] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Sorghum is an important target for plant genomic mapping because of its adaptation to harsh environments, diverse germplasm collection, and value for comparing the genomes of grass species such as corn and rice. The construction of an integrated genetic and physical map of the sorghum genome (750 Mbp) is a primary goal of our sorghum genome project. To help accomplish this task, we have developed a new high-throughput PCR-based method for building BAC contigs and locating BAC clones on the sorghum genetic map. This task involved pooling 24,576 sorghum BAC clones ( approximately 4x genome equivalents) in six different matrices to create 184 pools of BAC DNA. DNA fragments from each pool were amplified using amplified fragment length polymorphism (AFLP) technology, resolved on a LI-COR dual-dye DNA sequencing system, and analyzed using Bionumerics software. On average, each set of AFLP primers amplified 28 single-copy DNA markers that were useful for identifying overlapping BAC clones. Data from 32 different AFLP primer combinations identified approximately 2400 BACs and ordered approximately 700 BAC contigs. Analysis of a sorghum RIL mapping population using the same primer pairs located approximately 200 of the BAC contigs on the sorghum genetic map. Restriction endonuclease fingerprinting of the entire collection of sorghum BAC clones was applied to test and extend the contigs constructed using this PCR-based methodology. Analysis of the fingerprint data allowed for the identification of 3366 contigs each containing an average of 5 BACs. BACs in approximately 65% of the contigs aligned by AFLP analysis had sufficient overlap to be confirmed by DNA fingerprint analysis. In addition, 30% of the overlapping BACs aligned by AFLP analysis provided information for merging contigs and singletons that could not be joined using fingerprint data alone. Thus, the combination of fingerprinting and AFLP-based contig assembly and mapping provides a reliable, high-throughput method for building an integrated genetic and physical map of the sorghum genome.
Collapse
Affiliation(s)
- P E Klein
- Crop Biotechnology Center, Texas A & M University, College Station, Texas 77843 USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Amos CI, Frazier ML, Wang W. DNA pooling in mutation detection with reference to sequence analysis. Am J Hum Genet 2000; 66:1689-92. [PMID: 10733464 PMCID: PMC1378002 DOI: 10.1086/302894] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/1999] [Accepted: 02/09/2000] [Indexed: 11/03/2022] Open
Abstract
We discuss pooling methods of mutation detection for identifying rare mutations. We provide mathematical formulae for obtaining the optimal pool size as a function of the mutation frequency in the study population and the specificity of the test. The optimal pool size depends strongly on the specificity of the test. With a test that has 99% specificity, pooling can reduce the number of tests that need to be performed by 80%, whereas, with a test with 95% specificity, pooling reduces the number of samples that must be tested by only 50%. We used the software PHRED to call mutations after sequencing of pooled samples with known STK11 mutations. We found that, when the area under the curve for the less prominent peak was used to call mutations, we were able to pool pairs of samples and correctly identify mutations. Pooling of three samples did not lead to an adequately specific test for the basic automated allele-calling procedures that we used. We discuss methods by which the specificity may be improved to permit pooling of three or more samples when testing for mutations by sequencing.
Collapse
Affiliation(s)
- C I Amos
- Departments of Epidemiology and Biomathematics, University of Texas M.D. Anderson Cancer Center, Houston, TX 77030, USA.
| | | | | |
Collapse
|
44
|
Han CS, Sutherland RD, Jewett PB, Campbell ML, Meincke LJ, Tesmer JG, Mundt MO, Fawcett JJ, Kim UJ, Deaven LL, Doggett NA. Construction of a BAC contig map of chromosome 16q by two-dimensional overgo hybridization. Genome Res 2000; 10:714-21. [PMID: 10810094 PMCID: PMC310869 DOI: 10.1101/gr.10.5.714] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We have used sequence-based markers from an integrated YAC STS-content/somatic cell hybrid breakpoint physical map and radiation hybrid maps of human chromosome 16 to construct a new sequence-ready BAC map of the long arm of this chromosome. The integrated physical map was generated previously in our laboratory and contains 1150 STSs, providing a marker on average every 78 kb on the euchromatic arms of chromosome 16. The other two maps used for this effort were the radiation hybrid maps of chromosome 16 from Whitehead Institute and Stanford University. To create large sequenceable targets of this chromosome, we used a systematic approach to screen high-density BAC filters with probes generated from overlapping oligonucleotides (overgos). We first identified all available sequences in the three maps. These include sequences from genes, ESTs, STSs, and cosmid end sequences. We then used BLASTto identify 36-bp unique fragments of DNA for overgo probes. A total of 906 overgos were selected from the long arm of chromosome 16. Hybridizations occurred in three stages: (1) superpool hybridizations against the 12x coverage human BAC library (RPCI-11); (2) two-dimensional hybridizations against rearrayed positive BACs identified in the superpool hybridizations; and (3) pooled tertiary hybridizations for those overgos that had ambiguous positives remaining after the two-dimensional hybridization. For the superpool hybridizations, up to 236 overgos have been pooled in a single hybridization against the 12x BAC library. A total of 5187 positive BACs from chromosome 16q were identified as a result of five superpool hybridizations. These positive clones were rearrayed on membranes and hybridized with 161 two-dimensional subpools of overgos to determine which BAC clones were positive for individual overgos. An additional 46 tertiary hybridizations were required to resolve ambiguous overgo-BAC relationships. Thus, after a total of 212 hybridizations, we have constructed an initial probe-content BAC map of chromosome 16q consisting of 828 overgo markers and 3363 BACs providing >85% coverage of the long arm of this chromosome. The map has been confirmed by the fingerprinting data and BAC end PCR screening.
Collapse
Affiliation(s)
- C S Han
- DOE Joint Genome Institute, Bioscience Division and Center for Human Genome Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Millon LV, Skow LC, Honeycutt D, Murray JD, Bowling AT. Synteny and regional marker order assignment of 26 type I and microsatellite markers to the horse X- and Y-chromosomes. Chromosome Res 2000; 8:45-55. [PMID: 10730588 DOI: 10.1023/a:1009275102977] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The hypothesis that the conservation of sex-chromosome-linked genes among placental mammals could be extended to the horse genome was tested using the UCDavis horse-mouse somatic cell hybrid (SCH) panel. By exploiting the fluorescence in-situ hybridization (FISH) technique to localize an anchor locus, X-inactivation-specific transcript (XIST) on the horse X chromosome, together with the fragmentation and translocation of the X- and Y-chromosome fragments in a somatic cell hybrid panel, we regionally assigned 13 type I and 13 type II (microsatellite) markers to the horse X- and Y-chromosomes. The synteny groups that correspond to horse X- and Y-chromosomes were identified by synteny mapping of sex-specific loci zinc finger protein X-linked (ZFX), zinc finger protein Y-linked (ZFY) and sex-determining region Y (SRY) on the SCH panel. A non-pseudoautosomal gene in the human steroid sulfatase (STS) was identified in both X- and Y-chromosome-containing clones. The regional order of the X-linked type I markers examined in this study, from Xp- to Xq-distal, was [STS-X, the voltage-gated chloride channel 4 (CLCN4)], [ZFX, delta-aminolevulinate synthase 2 (ALAS2)], XIST, coagulation factor IX (F9) and [biglycan (BGN), equine F18, glucose-6-phosphate dehydrogenase (G6PD)] (precise marker order could not be determined for genes within the same brackets). The order of the Y-linked type I markers was STS-Y, SRY and ZFY These orders are the same arrangements as reported for the human X- and Y-chromosomes, supporting the conservation of genomic organization between the human and the horse sex chromosomes. Regional ordering of X-linked type I and microsatellite markers provides the first integration of type I and type II markers in the horse X chromosome.
Collapse
|
46
|
Tettelin H, Radune D, Kasif S, Khouri H, Salzberg SL. Optimized multiplex PCR: efficiently closing a whole-genome shotgun sequencing project. Genomics 1999; 62:500-7. [PMID: 10644449 DOI: 10.1006/geno.1999.6048] [Citation(s) in RCA: 107] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A new method has been developed for rapidly closing a large number of gaps in a whole-genome shotgun sequencing project. The method employs multiplex PCR and a novel pooling strategy to minimize the number of laboratory procedures required to sequence the unknown DNA that falls in between contiguous sequences. Multiplex sequencing, a novel procedure in which multiple PCR primers are used in a single sequencing reaction, is used to interpret the multiplex PCR results. Two protocols are presented, one that minimizes pipetting and another that minimizes the number of reactions. The pipette optimized multiplex PCR method has been employed in the final phases of closing the Streptococcus pneumoniae genome sequence, with excellent results.
Collapse
Affiliation(s)
- H Tettelin
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA.
| | | | | | | | | |
Collapse
|
47
|
Bottoli AP, Kertesz-Chaloupková K, Boulianne RP, Granado JD, Aebi M, Kües U. Rapid isolation of genes from an indexed genomic library of C. cinereus in a novel pab1+ cosmid. J Microbiol Methods 1999; 35:129-41. [PMID: 10192045 DOI: 10.1016/s0167-7012(98)00109-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In this study we present an indexed genomic library of homokaryon AmutBmut constructed within a novel cosmid carrying pab1+ as a selectable Coprinus marker. The average insert size per cosmid comprises 41 kb. We screened the library and detected copies of known (a1-2, beta-tub, cgl1, ras, trp1) and of new Coprinus genes (cac, lac1, lac2, lac3). Screening was performed either by Southern blot hybridisation or more efficiently by non-radioactive PCR amplification. We successfully applied PCR with specific and with degenerate primers, multiplex PCR and colony PCR in library screening. Our results suggest a new, more efficient pooling strategy for future high throughput screenings to be used in PCR with pooled cosmid DNAs, or in a less laborious approach using pooled Escherichia coli colonies for PCR.
Collapse
Affiliation(s)
- A P Bottoli
- Mikrobiologisches Institut, ETH Zürich, Switzerland
| | | | | | | | | | | |
Collapse
|
48
|
|
49
|
|
50
|
Abstract
We consider nonadaptive pooling designs for unique-sequence screening of a 1530-clone map of Aspergillus nidulans. The map has the properties that the clones are, with possibly a few exceptions, ordered and no more than 2 of them cover any point on the genome. We propose two subdesigns of the Steiner system S(3, 5, 65), one with 65 pools and approximately 118 clones per pool, the other with 54 pools and about 142 clones per pool. Each design allows 1 or 2 positive clones to be detected, even in the presence of substantial experimental error rates. More efficient designs are possible if the overlap information in the map is exploited, if there is no constraint on the number of clones in a pool, and if no error tolerance is required. An information theory lower bound requires at least 12 pools to satisfy these minimal criteria, and an "interleaved binary" design can be constructed on 20 pools, with about 380 clones per pool. However, the designs with more pools have important properties of robustness to various possible errors and general applicability to a wider class of pooling experiments.
Collapse
Affiliation(s)
- D J Balding
- Department of Applied Statistics, University of Reading, England
| | | |
Collapse
|