1
|
Sobczyk J, Pyne MT, Barker A, Mayer J, Hanson KE, Samore MH, Noriega R. Efficient and effective single-step screening of individual samples for SARS-CoV-2 RNA using multi-dimensional pooling and Bayesian inference. J R Soc Interface 2021; 18:20210155. [PMID: 34129787 PMCID: PMC8205536 DOI: 10.1098/rsif.2021.0155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Rapid and widespread implementation of infectious disease surveillance is a critical component in the response to novel health threats. Molecular assays are the preferred method to detect a broad range of viral pathogens with high sensitivity and specificity. The implementation of molecular assay testing in a rapidly evolving public health emergency, such as the ongoing COVID-19 pandemic, can be hindered by resource availability or technical constraints. We present a screening strategy that is easily scaled up to support a sustained large volume of testing over long periods of time. This non-adaptive pooled-sample screening protocol employs Bayesian inference to yield a reportable outcome for each individual sample in a single testing step (no confirmation of positive results required). The proposed method is validated using clinical specimens tested using a real-time reverse transcription polymerase chain reaction test for SARS-CoV-2. This screening protocol has substantial advantages for its implementation, including higher sample throughput, faster time to results, no need to retrieve previously screened samples from storage to undergo retesting, and excellent performance of the algorithm's sensitivity and specificity compared with the individual test's metrics.
Collapse
Affiliation(s)
- Juliana Sobczyk
- Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Michael T Pyne
- ARUP Institute for Clinical and Experimental Pathology®, Salt Lake City, UT, USA
| | - Adam Barker
- ARUP Institute for Clinical and Experimental Pathology®, Salt Lake City, UT, USA
| | - Jeanmarie Mayer
- Division of Epidemiology, University of Utah Health Sciences Center, Salt Lake City, UT, USA.,Division of Infectious Diseases, University of Utah Health Sciences Center, Salt Lake City, UT, USA
| | - Kimberly E Hanson
- ARUP Institute for Clinical and Experimental Pathology®, Salt Lake City, UT, USA.,Division of Infectious Disease, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Matthew H Samore
- Division of Epidemiology, Department of Internal Medicine, University of Utah School of Medicine, Salt Lake City, UT, USA.,Informatics, Decision Enhancement, and Analytic Science (IDEAS) Center of Innovation, Veterans Affairs Salt Lake City Health Care System, Salt Lake City, UT, USA
| | - Rodrigo Noriega
- Department of Chemistry, University of Utah, Salt Lake City, UT, USA
| |
Collapse
|
2
|
Seong JT. Group Testing-Based Robust Algorithm for Diagnosis of COVID-19. Diagnostics (Basel) 2020; 10:diagnostics10060396. [PMID: 32545224 PMCID: PMC7345105 DOI: 10.3390/diagnostics10060396] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 06/01/2020] [Accepted: 06/08/2020] [Indexed: 11/16/2022] Open
Abstract
At the time of writing, the COVID-19 infection is spreading rapidly. Currently, there is no vaccine or treatment, and researchers around the world are attempting to fight the infection. In this paper, we consider a diagnosis method for COVID-19, which is characterized by a very rapid rate of infection and is widespread. A possible method for avoiding severe infections is to stop the spread of the infection in advance by the prompt and accurate diagnosis of COVID-19. To this end, we exploit a group testing (GT) scheme, which is used to find a small set of confirmed cases out of a large population. For the accurate detection of false positives and negatives, we propose a robust algorithm (RA) based on the maximum a posteriori probability (MAP). The key idea of the proposed RA is to exploit iterative detection to propagate beliefs to neighbor nodes by exchanging marginal probabilities between input and output nodes. As a result, we show that our proposed RA provides the benefit of being robust against noise in the GT schemes. In addition, we demonstrate the performance of our proposal with a number of tests and successfully find a set of infected samples in both noiseless and noisy GT schemes with different COVID-19 incidence rates.
Collapse
Affiliation(s)
- Jin-Taek Seong
- Department of Convergence Software, Mokpo National University, Muan 58554, Korea
| |
Collapse
|
3
|
Asymptotics of pooling design performance. J Appl Probab 2016. [DOI: 10.1017/s0021900200017770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We analyse the expected performance of various group testing, or pooling, designs. The context is that of identifying characterized clones in a large collection of clones. Here we choose as performance criterion the expected number of unresolved ‘negative’ clones, and we aim to minimize this quantity. Technically, long inclusion–exclusion summations are encountered which, aside from being computationally demanding, give little inkling of the qualitative effect of parametric control on the pooling strategy. We show that readily-interpreted re-summation can be performed, leading to asymptotic forms and systematic corrections. We apply our results to randomized designs, illustrating how they might be implemented for approximating combinatorial formulae.
Collapse
|
4
|
Abstract
We analyse the expected performance of various group testing, or pooling, designs. The context is that of identifying characterized clones in a large collection of clones. Here we choose as performance criterion the expected number of unresolved ‘negative’ clones, and we aim to minimize this quantity. Technically, long inclusion–exclusion summations are encountered which, aside from being computationally demanding, give little inkling of the qualitative effect of parametric control on the pooling strategy. We show that readily-interpreted re-summation can be performed, leading to asymptotic forms and systematic corrections. We apply our results to randomized designs, illustrating how they might be implemented for approximating combinatorial formulae.
Collapse
|
5
|
Mourad R, Dawy Z, Morcos F. Designing pooling systems for noisy high-throughput protein-protein interaction experiments using boolean compressed sensing. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1478-1490. [PMID: 24407306 DOI: 10.1109/tcbb.2013.129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Group testing, also known as pooling, is a common technique used in high-throughput experiments in molecular biology to significantly reduce the number of tests required to identify rare biological interactions while correcting for experimental noise. Central to the group testing problem are 1) a pooling design that lays out how items are grouped together into pools for testing and 2) a decoder that interprets the results of the tested pools, identifying the active compounds. In this work, we take advantage of decoder guarantees from the field of compressed sensing (CS) to address the problem of efficient and reliable detection of biological interaction in noisy high-throughput experiments. We also use efficient combinatorial algorithms from group testing as well as established measurement matrices from CS to create pooling designs. First, we formulate the group testing problem in terms of a Boolean CS framework. We then propose a low-complexity l1-norm decoder to interpret pooling test results and identify active compounds. We demonstrate the robustness of the proposed l1-norm decoder in simulated experiments with false-positive and false-negative error rates typical of high-throughput experiments. When benchmarked against the current state-of-the-art methods, the proposed l1-norm decoder provides superior error correction for the majority of the cases considered while being notably faster computationally. Additionally, we test the performance of the l1-norm decoder against a real experimental data set, where 12,675 prey proteins were screened against 12 bait proteins. Lastly, we study the impact of different sparse pooling design matrices on decoder performance and show that the shifted transversal design (STD) is the most suitable among the pooling designs surveyed for biological applications of CS.
Collapse
|
6
|
Cao CC, Li C, Huang Z, Ma X, Sun X. Identifying rare variants with optimal depth of coverage and cost-effective overlapping pool sequencing. Genet Epidemiol 2013; 37:820-30. [PMID: 24166758 DOI: 10.1002/gepi.21769] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Revised: 09/09/2013] [Accepted: 09/27/2013] [Indexed: 01/19/2023]
Abstract
Genome-wide association studies have identified hundreds of genetic variants associated with complex diseases although most variants identified so far explain only a small proportion of heritability, suggesting that rare variants are responsible for missing heritability. Identification of rare variants through large-scale resequencing becomes increasing important but still prohibitively expensive despite the rapid decline in the sequencing costs. Nevertheless, group testing based overlapping pool sequencing in which pooled rather than individual samples are sequenced will greatly reduces the efforts of sample preparation as well as the costs to screen for rare variants. Here, we proposed an overlapping pool sequencing to screen rare variants with optimal sequencing depth and a corresponding cost model. We formulated a model to compute the optimal depth for sufficient observations of variants in pooled sequencing. Utilizing shifted transversal design algorithm, appropriate parameters for overlapping pool sequencing could be selected to minimize cost and guarantee accuracy. Due to the mixing constraint and high depth for pooled sequencing, results showed that it was more cost-effective to divide a large population into smaller blocks which were tested using optimized strategies independently. Finally, we conducted an experiment to screen variant carriers with frequency equaled 1%. With simulated pools and publicly available human exome sequencing data, the experiment achieved 99.93% accuracy. Utilizing overlapping pool sequencing, the cost for screening variant carriers with frequency equaled 1% in 200 diploid individuals dropped to at least 66% at which target sequencing region was set to 30 Mb.
Collapse
Affiliation(s)
- Chang-Chang Cao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | | | | | | | | |
Collapse
|
7
|
Kanamori T, Uehara H, Jimbo M. Pooling Design and Bias Correction in DNA Library Screening. JOURNAL OF STATISTICAL THEORY AND PRACTICE 2012. [DOI: 10.1080/15598608.2012.647585] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
8
|
Optimal decoding and minimal length for the non-unique oligonucleotide probe selection problem. Neurocomputing 2010. [DOI: 10.1016/j.neucom.2010.02.026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
9
|
Uehara H, Jimbo M. A positive detecting code and its decoding algorithm for DNA library screening. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:652-666. [PMID: 19875863 DOI: 10.1109/tcbb.2007.70266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The study of gene functions requires high-quality DNA libraries. However, a large number of tests and screenings are necessary for compiling such libraries. We describe an algorithm for extracting as much information as possible from pooling experiments for library screening. Collections of clones are called pools, and a pooling experiment is a group test for detecting all positive clones. The probability of positiveness for each clone is estimated according to the outcomes of the pooling experiments. Clones with high chance of positiveness are subjected to confirmatory testing. In this paper, we introduce a new positive clone detecting algorithm, called the Bayesian network pool result decoder (BNPD). The performance of BNPD is compared, by simulation, with that of the Markov chain pool result decoder (MCPD) proposed by Knill et al. in 1996. Moreover, the combinatorial properties of pooling designs suitable for the proposed algorithm are discussed in conjunction with combinatorial designs and d-disjunct matrices. We also show the advantage of utilizing packing designs or BIB designs for the BNPD algorithm.
Collapse
Affiliation(s)
- Hiroaki Uehara
- Department of Mathematics, Keio University, 3-14-1 Hiyoshi, Kouhoku-ku, Yokohama 223-8522, Japan.
| | | |
Collapse
|
10
|
Abstract
MOTIVATION In high-throughput projects aiming to identify rare positives using a binary assay, smart-pooling constitutes an appealing strategy liable of significantly reducing the number of tests while correcting for experimental noise. In order to perform simulations for choosing an appropriate set of pools, and later to interpret the experimental results, the pool outcomes must be 'decoded'. The intuitive aim is clearly to identify the positives that gave rise to an observation, whether real or simulated. However, this goal is not well-formalized and has been the focus of very few studies. RESULTS We first provide a clear combinatorial formalization of the 'decoding problem'. We then present interpool, an exact algorithm to solve this problem. An efficient implementation is freely available. Its usefulness is illustrated in the context of yeast-two-hybrid interactome mapping with the Shifted Transversal Design. AVAILABILITY The implementation, licensed under the GNU GPL, can be downloaded from http://www-timc.imag.fr/Nicolas.Thierry-Mieg/.
Collapse
|
11
|
Tettelin H, Radune D, Kasif S, Khouri H, Salzberg SL. Optimized multiplex PCR: efficiently closing a whole-genome shotgun sequencing project. Genomics 1999; 62:500-7. [PMID: 10644449 DOI: 10.1006/geno.1999.6048] [Citation(s) in RCA: 107] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A new method has been developed for rapidly closing a large number of gaps in a whole-genome shotgun sequencing project. The method employs multiplex PCR and a novel pooling strategy to minimize the number of laboratory procedures required to sequence the unknown DNA that falls in between contiguous sequences. Multiplex sequencing, a novel procedure in which multiple PCR primers are used in a single sequencing reaction, is used to interpret the multiplex PCR results. Two protocols are presented, one that minimizes pipetting and another that minimizes the number of reactions. The pipette optimized multiplex PCR method has been employed in the final phases of closing the Streptococcus pneumoniae genome sequence, with excellent results.
Collapse
Affiliation(s)
- H Tettelin
- The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, Maryland 20850, USA.
| | | | | | | | | |
Collapse
|
12
|
Abstract
We consider nonadaptive pooling designs for unique-sequence screening of a 1530-clone map of Aspergillus nidulans. The map has the properties that the clones are, with possibly a few exceptions, ordered and no more than 2 of them cover any point on the genome. We propose two subdesigns of the Steiner system S(3, 5, 65), one with 65 pools and approximately 118 clones per pool, the other with 54 pools and about 142 clones per pool. Each design allows 1 or 2 positive clones to be detected, even in the presence of substantial experimental error rates. More efficient designs are possible if the overlap information in the map is exploited, if there is no constraint on the number of clones in a pool, and if no error tolerance is required. An information theory lower bound requires at least 12 pools to satisfy these minimal criteria, and an "interleaved binary" design can be constructed on 20 pools, with about 380 clones per pool. However, the designs with more pools have important properties of robustness to various possible errors and general applicability to a wider class of pooling experiments.
Collapse
Affiliation(s)
- D J Balding
- Department of Applied Statistics, University of Reading, England
| | | |
Collapse
|