1
|
Song J, Doggett N, Wren M, Burr T, Fenimore PW, Hatcher EL, Bruno WJ, Li PE, Stubben C, Wolinsky M. Development of forensic assay signatures for ebolaviruses. J Forensic Sci 2015; 60:315-25. [PMID: 25677086 DOI: 10.1111/1556-4029.12655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Revised: 02/12/2014] [Accepted: 02/26/2014] [Indexed: 11/29/2022]
Abstract
Ebolaviruses are a diverse group of RNA viruses comprising five different species, four of which cause fatal hemorrhagic fever in humans. Because of their high infectivity and lethality, ebolaviruses are considered major biothreat agents. Although detection assays exist, no forensic assays are currently available. Here, we report the development of forensic assays that differentiate ebolaviruses. We performed phylogenetic analyses and identified canonical SNPs for all species, major clades and isolates. TaqMan-MGB allelic discrimination assays based on these SNPs were designed, screened against synthetic RNA templates, and validated against ebolavirus genomic RNAs. A total of 45 assays were validated to provide 100% coverage of the species and variants with additional resolution at the isolate level. These assays enabled accurate forensic analysis on 4 "unknown" ebolaviruses. Unknowns were correctly classified to species and variant. A goal of providing resolution below the isolate level was not successful. These high-resolution forensic assays allow rapid and accurate genotyping of ebolaviruses for forensic investigations.
Collapse
Affiliation(s)
- Jian Song
- Bioenergy and Biome Sciences (B-11), Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, 87545
| | | | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Bruno WJ, Ullah G, Mak DOD, Pearson JE. Automated maximum likelihood separation of signal from baseline in noisy quantal data. Biophys J 2014; 105:68-79. [PMID: 23823225 DOI: 10.1016/j.bpj.2013.02.060] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 01/03/2013] [Accepted: 02/25/2013] [Indexed: 10/26/2022] Open
Abstract
Data recordings often include high-frequency noise and baseline fluctuations that are not generated by the system under investigation, which need to be removed before analyzing the signal for the system's behavior. In the absence of an automated method, experimentalists fall back on manual procedures for removing these fluctuations, which can be laborious and prone to subjective bias. We introduce a maximum likelihood formalism for separating signal from a drifting baseline plus noise, when the signal takes on integer multiples of some value, as in ion channel patch-clamp current traces. Parameters such as the quantal step size (e.g., current passing through a single channel), noise amplitude, and baseline drift rate can all be optimized automatically using the expectation-maximization algorithm, taking the number of open channels (or molecules in the on-state) at each time point as a hidden variable. Our goal here is to reconstruct the signal, not model the (possibly highly complex) underlying system dynamics. Thus, our likelihood function is independent of those dynamics. This may be thought of as restricting to the simplest possible hidden Markov model for the underlying channel current, in which successive measurements of the state of the channel(s) are independent. The resulting method is comparable to an experienced human in terms of results, but much faster. FORTRAN 90, C, R, and JAVA codes that implement the algorithm are available for download from our website.
Collapse
Affiliation(s)
- William J Bruno
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, New Mexico, USA
| | | | | | | |
Collapse
|
3
|
Berendzen J, Bruno WJ, Cohn JD, Hengartner NW, Kuske CR, McMahon BH, Wolinsky MA, Xie G. Rapid phylogenetic and functional classification of short genomic fragments with signature peptides. BMC Res Notes 2012; 5:460. [PMID: 22925230 PMCID: PMC3772700 DOI: 10.1186/1756-0500-5-460] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 08/08/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Classification is difficult for shotgun metagenomics data from environments such as soils, where the diversity of sequences is high and where reference sequences from close relatives may not exist. Approaches based on sequence-similarity scores must deal with the confounding effects that inheritance and functional pressures exert on the relation between scores and phylogenetic distance, while approaches based on sequence alignment and tree-building are typically limited to a small fraction of gene families. We describe an approach based on finding one or more exact matches between a read and a precomputed set of peptide 10-mers. RESULTS At even the largest phylogenetic distances, thousands of 10-mer peptide exact matches can be found between pairs of bacterial genomes. Genes that share one or more peptide 10-mers typically have high reciprocal BLAST scores. Among a set of 403 representative bacterial genomes, some 20 million 10-mer peptides were found to be shared. We assign each of these peptides as a signature of a particular node in a phylogenetic reference tree based on the RNA polymerase genes. We classify the phylogeny of a genomic fragment (e.g., read) at the most specific node on the reference tree that is consistent with the phylogeny of observed signature peptides it contains. Using both synthetic data from four newly-sequenced soil-bacterium genomes and ten real soil metagenomics data sets, we demonstrate a sensitivity and specificity comparable to that of the MEGAN metagenomics analysis package using BLASTX against the NR database. Phylogenetic and functional similarity metrics applied to real metagenomics data indicates a signal-to-noise ratio of approximately 400 for distinguishing among environments. Our method assigns ~6.6 Gbp/hr on a single CPU, compared with 25 kbp/hr for methods based on BLASTX against the NR database. CONCLUSIONS Classification by exact matching against a precomputed list of signature peptides provides comparable results to existing techniques for reads longer than about 300 bp and does not degrade severely with shorter reads. Orders of magnitude faster than existing methods, the approach is suitable now for inclusion in analysis pipelines and appears to be extensible in several different directions.
Collapse
Affiliation(s)
- Joel Berendzen
- Physics Division, MS D454, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - William J Bruno
- Theoretical Division, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Judith D Cohn
- Computer, Computational, and Statistical Sciences Division, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Nicolas W Hengartner
- Computer, Computational, and Statistical Sciences Division, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Cheryl R Kuske
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Benjamin H McMahon
- Theoretical Division, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Murray A Wolinsky
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Gary Xie
- Bioscience Division, MS M888, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| |
Collapse
|
4
|
Bruno WJ. Reported Cellphone Effects on Brain Energetically Consistent with Electrostriction. Biophys J 2012. [DOI: 10.1016/j.bpj.2011.11.3246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
5
|
Dean D, Bruno WJ, Wan R, Gomes JP, Devignot S, Mehari T, de Vries HJC, Morré SA, Myers G, Read TD, Spratt BG. Predicting phenotype and emerging strains among Chlamydia trachomatis infections. Emerg Infect Dis 2010; 15:1385-94. [PMID: 19788805 PMCID: PMC2819883 DOI: 10.3201/eid1509.090272] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Single nucleotide polymorphisms can be used for epidemiologic and evolutionary studies worldwide. Chlamydia trachomatis is a global cause of blinding trachoma and sexually transmitted infections (STIs). We used comparative genomics of the family Chlamydiaceae to select conserved housekeeping genes for C. trachomatis multilocus sequencing, characterizing 19 reference and 68 clinical isolates from 6 continental/subcontinental regions. There were 44 sequence types (ST). Identical STs for STI isolates were recovered from different regions, whereas STs for trachoma isolates were restricted by continent. Twenty-nine of 52 alleles had nonuniform distributions of frequencies across regions (p<0.001). Phylogenetic analysis showed 3 disease clusters: invasive lymphogranuloma venereum strains, globally prevalent noninvasive STI strains (ompA genotypes D/Da, E, and F), and nonprevalent STI strains with a trachoma subcluster. Recombinant strains were observed among STI clusters. Single nucleotide polymorphisms (SNPs) were predictive of disease specificity. Multilocus and SNP typing can now be used to detect diverse and emerging C. trachomatis strains for epidemiologic and evolutionary studies of trachoma and STI populations worldwide.
Collapse
Affiliation(s)
- Deborah Dean
- Children's Global Health Initiativ, Children's Hospital Oakland Research Institute, Oakland, California 94609, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Berry IM, Athreya G, Kothari M, Daniels M, Bruno WJ, Korber B, Kuiken C, Ribeiro RM, Leitner T. The evolutionary rate dynamically tracks changes in HIV-1 epidemics: application of a simple method for optimizing the evolutionary rate in phylogenetic trees with longitudinal data. Epidemics 2009; 1:230-9. [PMID: 21352769 PMCID: PMC3053002 DOI: 10.1016/j.epidem.2009.10.003] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2009] [Revised: 10/06/2009] [Accepted: 10/30/2009] [Indexed: 12/24/2022] Open
Abstract
Large-sequence datasets provide an opportunity to investigate the dynamics of pathogen epidemics. Thus, a fast method to estimate the evolutionary rate from large and numerous phylogenetic trees becomes necessary. Based on minimizing tip height variances, we optimize the root in a given phylogenetic tree to estimate the most homogenous evolutionary rate between samples from at least two different time points. Simulations showed that the method had no bias in the estimation of evolutionary rates and that it was robust to tree rooting and topological errors. We show that the evolutionary rates of HIV-1 subtype B and C epidemics have changed over time, with the rate of evolution inversely correlated to the rate of virus spread. For subtype B, the evolutionary rate slowed down and tracked the start of the HAART era in 1996. Subtype C in Ethiopia showed an increase in the evolutionary rate when the prevalence increase markedly slowed down in 1995. Thus, we show that the evolutionary rate of HIV-1 on the population level dynamically tracks epidemic events.
Collapse
Affiliation(s)
- Irina Maljkovic Berry
- Theoretical Biology & Biophysics, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A
- Center for Nonlinear Studies (CNLS), Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A
- Department of Virology, Swedish Institute for Infectious Disease Control, SE-171 82 Solna, & Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, SE-171 77 Stockholm, Sweden
| | - Gayathri Athreya
- Theoretical Biology & Biophysics, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A
| | - Moulik Kothari
- Theoretical Biology & Biophysics, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A
| | - Marcus Daniels
- Theoretical Biology & Biophysics, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A
| | - William J. Bruno
- Theoretical Biology & Biophysics, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A
| | - Bette Korber
- Theoretical Biology & Biophysics, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A
| | - Carla Kuiken
- Theoretical Biology & Biophysics, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A
| | - Ruy M. Ribeiro
- Theoretical Biology & Biophysics, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A
| | - Thomas Leitner
- Theoretical Biology & Biophysics, MS K710, Los Alamos National Laboratory, Los Alamos, NM 87545, U.S.A
| |
Collapse
|
7
|
Berry IM, Athreya G, Kothari M, Daniels M, Bruno WJ, Korber B, Kuiken C, Ribeiro RM, Leitner T. WITHDRAWN: The evolutionary rate dynamically tracks changes in HIV-1 epidemics: Application of a simple method for optimizing the evolutionary rate in phylogenetic trees with longitudinal data. Epidemics 2009. [DOI: 10.1016/j.epidem.2009.11.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
8
|
Hraber P, Kuiken C, Waugh M, Geer S, Bruno WJ, Leitner T. Classification of hepatitis C virus and human immunodeficiency virus-1 sequences with the branching index. J Gen Virol 2008; 89:2098-2107. [PMID: 18753218 DOI: 10.1099/vir.0.83657-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Classification of viral sequences should be fast, objective, accurate and reproducible. Most methods that classify sequences use either pair-wise distances or phylogenetic relations, but cannot discern when a sequence is unclassifiable. The branching index (BI) combines distance and phylogeny methods to compute a ratio that quantifies how closely a query sequence clusters with a subtype clade. In the hypothesis-testing framework of statistical inference, the BI is compared with a threshold to test whether sufficient evidence exists for the query sequence to be classified among known sequences. If above the threshold, the null hypothesis of no support for the subtype relation is rejected and the sequence is taken as belonging to the subtype clade with which it clusters on the tree. This study evaluates statistical properties of the BI for subtype classification in hepatitis C virus (HCV) and human immunodeficiency virus-1 (HIV-1). Pairs of BI values with known positive- and negative-test results were computed from 10,000 random fragments of reference alignments. Sampled fragments were of sufficient length to contain phylogenetic signals that grouped reference sequences together properly into subtype clades. For HCV, a threshold BI of 0.71 yields 95.1% agreement with reference subtypes, with equal false-positive and false-negative rates. For HIV-1, a threshold of 0.66 yields 93.5% agreement. Higher thresholds can be used where lower false-positive rates are required. In synthetic recombinants, regions without breakpoints are recognized accurately; regions with breakpoints do not represent any known subtype uniquely. Web-based services for viral subtype classification with the BI are available online.
Collapse
Affiliation(s)
- Peter Hraber
- Theoretical Biology & Biophysics, T-10 MS K710, LANL, Los Alamos, NM 87545, USA
| | - Carla Kuiken
- Theoretical Biology & Biophysics, T-10 MS K710, LANL, Los Alamos, NM 87545, USA
| | - Mark Waugh
- Theoretical Biology & Biophysics, T-10 MS K710, LANL, Los Alamos, NM 87545, USA
| | - Shaun Geer
- Theoretical Biology & Biophysics, T-10 MS K710, LANL, Los Alamos, NM 87545, USA
| | - William J Bruno
- Theoretical Biology & Biophysics, T-10 MS K710, LANL, Los Alamos, NM 87545, USA
| | - Thomas Leitner
- Theoretical Biology & Biophysics, T-10 MS K710, LANL, Los Alamos, NM 87545, USA
| |
Collapse
|
9
|
Abstract
Aggregated Markov processes related by similarity transformation are equivalent in that they cannot be distinguished by steady-state experiments. We derive an explicit formula for the set of all detailed-balance preserving similarity transformations between such continuous time Markov chains with N states. The matrices that define the allowed similarity transformations are found to be a simple non-linear function applied to almost any element of the special orthogonal group in N dimensions. Since a model is identifiable only if there is no similarity transformations to an equivalent model, we expect this result to prove useful in the theory of identification of aggregated Markov chains, an enterprise of growing importance as more and more single molecules yield to observation.
Collapse
Affiliation(s)
- William J Bruno
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, New Mexico 87544, USA
| | | |
Collapse
|
10
|
Hraber PT, Fischer W, Bruno WJ, Leitner T, Kuiken C. Comparative analysis of hepatitis C virus phylogenies from coding and non-coding regions: the 5' untranslated region (UTR) fails to classify subtypes. Virol J 2006; 3:103. [PMID: 17169155 PMCID: PMC1764733 DOI: 10.1186/1743-422x-3-103] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2006] [Accepted: 12/14/2006] [Indexed: 01/06/2023] Open
Abstract
Background The duration of treatment for HCV infection is partly indicated by the genotype of the virus. For studies of disease transmission, vaccine design, and surveillance for novel variants, subtype-level classification is also needed. This study used the Shimodaira-Hasegawa test and related statistical techniques to compare phylogenetic trees obtained from coding and non-coding regions of a whole-genome alignment for the reliability of subtyping in different regions. Results Different regions of the HCV genome yield inconsistent phylogenies, which can lead to erroneous conclusions about classification of a given infection. In particular, the highly conserved 5' untranslated region (UTR) yields phylogenetic trees with topologies that differ from the HCV polyprotein and complete genome phylogenies. Phylogenetic trees from the NS5B gene reliably cluster related subtypes, and yield topologies consistent with those of the whole genome and polyprotein. Conclusion These results extend those from previous studies and indicate that, unlike the NS5B gene, the 5' UTR contains insufficient variation to resolve HCV classifications to the level of viral subtype, and fails to distinguish genotypes reliably. Use of the 5' UTR for clinical tests to characterize HCV infection should be replaced by a subtype-informative test.
Collapse
Affiliation(s)
- Peter T Hraber
- Theoretical Biology and Biophysics, T-10 MS K710, Los Alamos National Laboratory, Los Alamos NM 87545 USA
| | - William Fischer
- Theoretical Biology and Biophysics, T-10 MS K710, Los Alamos National Laboratory, Los Alamos NM 87545 USA
| | - William J Bruno
- Theoretical Biology and Biophysics, T-10 MS K710, Los Alamos National Laboratory, Los Alamos NM 87545 USA
| | - Thomas Leitner
- Theoretical Biology and Biophysics, T-10 MS K710, Los Alamos National Laboratory, Los Alamos NM 87545 USA
| | - Carla Kuiken
- Theoretical Biology and Biophysics, T-10 MS K710, Los Alamos National Laboratory, Los Alamos NM 87545 USA
| |
Collapse
|
11
|
Gomes JP, Bruno WJ, Nunes A, Santos N, Florindo C, Borrego MJ, Dean D. Evolution of Chlamydia trachomatis diversity occurs by widespread interstrain recombination involving hotspots. Genome Res 2006; 17:50-60. [PMID: 17090662 PMCID: PMC1716266 DOI: 10.1101/gr.5674706] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Chlamydia trachomatis is an obligate intracellular bacterium of major public health significance, infecting over one-tenth of the world's population and causing blindness and infertility in millions. Mounting evidence supports recombination as a key source of genetic diversity among free-living bacteria. Previous research shows that intracellular bacteria such as Chlamydiaceae may also undergo recombination but whether this plays a significant evolutionary role has not been determined. Here, we examine multiple loci dispersed throughout the chromosome to determine the extent and significance of recombination among 19 laboratory reference strains and 10 present-day ocular and urogenital clinical isolates using phylogenetic reconstructions, compatibility matrices, and statistically based recombination programs. Recombination is widespread; all clinical isolates are recombinant at multiple loci with no two belonging to the same clonal lineage. Several reference strains show nonconcordant phylogenies across loci; one strain is unambiguously identified as recombinantly derived from other reference strain lineages. Frequent recombination contrasts with a low level of point substitution; novel substitutions relative to reference strains occur less than one per kilobase. Hotspots for recombination are identified downstream from ompA, which encodes the major outer membrane protein. This widespread recombination, unexpected for an intracellular bacterium, explains why strain-typing using one or two genes, such as ompA, does not correlate with clinical phenotypes. Our results do not point to specific events that are responsible for different pathogenicities but, instead, suggest a new approach to dissect the genetic basis for clinical strain pathology with implications for evolution, host cell adaptation, and emergence of new chlamydial diseases.
Collapse
Affiliation(s)
- João P. Gomes
- Center for Immunobiology and Vaccine Development, Children’s Hospital Oakland Research Institute, Oakland California 94609, USA
- Centro de Bacteriologia, Instituto Nacional de Saúde, Lisboa 1649-016, Portugal
| | - William J. Bruno
- T-10 Theoretical Biology and Biophysics, MS-K710 Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Alexandra Nunes
- Centro de Bacteriologia, Instituto Nacional de Saúde, Lisboa 1649-016, Portugal
| | - Nicole Santos
- Department of Medicine and Biomedical Sciences, University of California at San Francisco School of Medicine, San Francisco, California 94143, USA
| | - Carlos Florindo
- Centro de Bacteriologia, Instituto Nacional de Saúde, Lisboa 1649-016, Portugal
| | - Maria J. Borrego
- Centro de Bacteriologia, Instituto Nacional de Saúde, Lisboa 1649-016, Portugal
| | - Deborah Dean
- Center for Immunobiology and Vaccine Development, Children’s Hospital Oakland Research Institute, Oakland California 94609, USA
- Department of Medicine and Biomedical Sciences, University of California at San Francisco School of Medicine, San Francisco, California 94143, USA
- Corresponding author.E-mail ; fax: (510) 450-7910
| |
Collapse
|
12
|
Abstract
Reassortment among the RNA segments of Influenza A virus caused the two most recent human influenza pandemics; recently, reassortment has generated viral genotypes associated with outbreaks of avian H5N1 influenza in Asia and Europe. A statistical analysis has been developed for the systematic identification and characterization of reassortant viruses. The analysis was applied to the genes of the replication complex of 152 avian influenza A viruses isolated between 1966 and 2004 from predominantly terrestrial and domestic aquatic avian species. The results indicated that reassortment among these genes was pervasive throughout this period and throughout both the Eurasian and North American lineages of the virus. Evidence is presented that the circulating genotypes of the replication complex are being replaced continually by novel genotypes created by reassortment. No constraints for coordinated reassortment among genes of the replication complex were evident; rather, reassortment almost always proceeded one segment at a time. A maximum-likelihood estimate of the rate of reassortment was derived. For significantly diverged Asian avian influenza A viruses from the period 1991-2004, it was estimated that the median duration between creation of a new genotype and its next segment reassortment was 3 years. Reassortments that introduced previously unobserved influenza genetic material were detected. These findings point to substantial potential for rapid generation of novel avian influenza A viruses, emphasizing the importance of intensive surveillance of these host species in preparation for a possible pandemic.
Collapse
Affiliation(s)
- Catherine A Macken
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, T-10 MS-K710, Los Alamos, NM 87545, USA
| | - Richard J Webby
- Department of Infectious Diseases, St Jude Children's Research Hospital, 332 N. Lauderdale, Memphis, TN 38105-2794, USA
| | - William J Bruno
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, T-10 MS-K710, Los Alamos, NM 87545, USA
| |
Collapse
|
13
|
Affiliation(s)
- Jin Yang
- Theoretical Biology and Biophysics Group, Theoretical Division, Los Alamos National Laboratory, New Mexico, USA
| | | | | | | |
Collapse
|
14
|
Gomes JP, Nunes A, Bruno WJ, Borrego MJ, Florindo C, Dean D. Polymorphisms in the nine polymorphic membrane proteins of Chlamydia trachomatis across all serovars: evidence for serovar Da recombination and correlation with tissue tropism. J Bacteriol 2006; 188:275-86. [PMID: 16352844 PMCID: PMC1317584 DOI: 10.1128/jb.188.1.275-286.2006] [Citation(s) in RCA: 115] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Chlamydia trachomatis is an intracellular bacterium responsible for ocular, respiratory, and sexually transmitted diseases. The genome contains a nine-member polymorphic membrane protein (Pmp) family unique to members of the order Chlamydiales. Genomic and molecular analyses were performed for the entire pmp gene family for the 18 reference serological variants (serovars) and genovariant Ja to identify specific gene and protein regions that differentiate chlamydial disease groups. The mean genetic distance among all serovars varied from 0.1% for pmpA to 7.0% for pmpF. Lymphogranuloma venereum (LGV) serovars were the most closely related for the pmp genes and were also the most divergent, compared to ocular and non-LGV urogenital disease groups. Phylogenetic reconstructions showed that for six of nine pmp genes (not pmpA, pmpD, or pmpE), the serovars clustered based on tissue tropism. The most globally successful serovars, E and F, clustered distantly from the urogenital group for five pmp genes. These pmp genes may confer a biologic advantage that may facilitate infection and transmission for E and F. Surprisingly, serovar Da clustered with the ocular group from pmpE to pmpI, which are located together in the chromosome, providing statistically significant evidence for intergenomic recombination and acquisition of a genetic composition that could hypothetically expand the host cell range of serovar Da. We also identified distinct domains for pmpE, pmpF, and pmpH where substitutions were concentrated and associated with a specific disease group. Thus, our data suggest a possible structural or functional role that may vary among pmp genes in promoting antigenic polymorphisms and/or diverse adhesions-receptors that may be involved in immune evasion and differential tissue tropism.
Collapse
Affiliation(s)
- João P Gomes
- Children's Hospital Oakland Research Institute, 5700 Martin Luther King Jr. Way, Oakland, CA 94609, USA
| | | | | | | | | | | |
Collapse
|
15
|
Bruno WJ, Yang J, Pearson JE. Using independent open-to-closed transitions to simplify aggregated Markov models of ion channel gating kinetics. Proc Natl Acad Sci U S A 2005; 102:6326-31. [PMID: 15843461 PMCID: PMC1088360 DOI: 10.1073/pnas.0409110102] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2004] [Indexed: 12/31/2022] Open
Abstract
Deducing plausible reaction schemes from single-channel current traces is time-consuming and difficult. The goal is to find the simplest scheme that fits the data, but there are many ways to connect even a small number of states (>2 million schemes with four open and four closed states). Many schemes make identical predictions. An exhaustive search over model space does not address the many equivalent schemes that will result. We have found a canonical form that can express all reaction schemes for binary channels. This form has the minimal number of rate constants for any rank (number of independent open-closed transitions), unlike other canonical forms such as the well established "uncoupled" scheme. Because all of the interconductance transitions in the new form are independent, we refer to it as the manifest interconductance rank (MIR) form. In the case of four open and four closed states, there are four MIR form schemes, corresponding to ranks 1-4. For many models proposed in the literature for specific ion channels, the equivalent MIR form has dramatically fewer links than the uncoupled form. By using the MIR form we prove that all rank 1 topologies with a given number of open and closed states make identical predictions in steady state, thus narrowing the search space for simple models. Moreover, we prove that fitting to canonical form preserves detailed balance. We also propose an efficient hierarchical algorithm for searching for the simplest possible model consistent with a given data set.
Collapse
Affiliation(s)
- William J Bruno
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | | | |
Collapse
|
16
|
Dutilh BE, Huynen MA, Bruno WJ, Snel B. The consistent phylogenetic signal in genome trees revealed by reducing the impact of noise. J Mol Evol 2004; 58:527-39. [PMID: 15170256 DOI: 10.1007/s00239-003-2575-6] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2003] [Accepted: 11/12/2003] [Indexed: 11/25/2022]
Abstract
Phylogenetic trees based on gene repertoires are remarkably similar to the current consensus of life history. Yet it has been argued that shared gene content is unreliable for phylogenetic reconstruction because of convergence in gene content due to horizontal gene transfer and parallel gene loss. Here we test this argument, by filtering out as noise those orthologous groups that have an inconsistent phylogenetic distribution, using two independent methods. The resulting phylogenies do indeed contain small but significant improvements. More importantly, we find that the majority of orthologous groups contain some phylogenetic signal and that the resulting phylogeny is the only detectable signal present in the gene distribution across genomes. Horizontal gene transfer or parallel gene loss does not cause systematic biases in the gene content tree.
Collapse
Affiliation(s)
- Bas E Dutilh
- Center for Molecular and Biomolecular Informatics/Nijmegen Center for Molecular Life Sciences, University of Nijmegen, Nijmegen, The Netherlands.
| | | | | | | |
Collapse
|
17
|
Martin J, Han C, Gordon LA, Terry A, Prabhakar S, She X, Xie G, Hellsten U, Chan YM, Altherr M, Couronne O, Aerts A, Bajorek E, Black S, Blumer H, Branscomb E, Brown NC, Bruno WJ, Buckingham JM, Callen DF, Campbell CS, Campbell ML, Campbell EW, Caoile C, Challacombe JF, Chasteen LA, Chertkov O, Chi HC, Christensen M, Clark LM, Cohn JD, Denys M, Detter JC, Dickson M, Dimitrijevic-Bussod M, Escobar J, Fawcett JJ, Flowers D, Fotopulos D, Glavina T, Gomez M, Gonzales E, Goodstein D, Goodwin LA, Grady DL, Grigoriev I, Groza M, Hammon N, Hawkins T, Haydu L, Hildebrand CE, Huang W, Israni S, Jett J, Jewett PB, Kadner K, Kimball H, Kobayashi A, Krawczyk MC, Leyba T, Longmire JL, Lopez F, Lou Y, Lowry S, Ludeman T, Manohar CF, Mark GA, McMurray KL, Meincke LJ, Morgan J, Moyzis RK, Mundt MO, Munk AC, Nandkeshwar RD, Pitluck S, Pollard M, Predki P, Parson-Quintana B, Ramirez L, Rash S, Retterer J, Ricke DO, Robinson DL, Rodriguez A, Salamov A, Saunders EH, Scott D, Shough T, Stallings RL, Stalvey M, Sutherland RD, Tapia R, Tesmer JG, Thayer N, Thompson LS, Tice H, Torney DC, Tran-Gyamfi M, Tsai M, Ulanovsky LE, Ustaszewska A, Vo N, White PS, Williams AL, Wills PL, Wu JR, Wu K, Yang J, Dejong P, Bruce D, Doggett NA, Deaven L, Schmutz J, Grimwood J, Richardson P, Rokhsar DS, Eichler EE, Gilna P, Lucas SM, Myers RM, Rubin EM, Pennacchio LA. The sequence and analysis of duplication-rich human chromosome 16. Nature 2004; 432:988-94. [PMID: 15616553 DOI: 10.1038/nature03187] [Citation(s) in RCA: 114] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2004] [Accepted: 11/15/2004] [Indexed: 01/30/2023]
Abstract
Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,670 aligned transcripts, 19 transfer RNA genes, 341 pseudogenes and three RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukaemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. Whereas the segmental duplications of chromosome 16 are enriched in the relatively gene-poor pericentromere of the p arm, some are involved in recent gene duplication and conversion events that are likely to have had an impact on the evolution of primates and human disease susceptibility.
Collapse
Affiliation(s)
- Joel Martin
- DOE Joint Genome Institute, 2800 Mitchell Avenue, Walnut Creek, California 94598, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Gomes JP, Bruno WJ, Borrego MJ, Dean D. Recombination in the genome of Chlamydia trachomatis involving the polymorphic membrane protein C gene relative to ompA and evidence for horizontal gene transfer. J Bacteriol 2004; 186:4295-306. [PMID: 15205432 PMCID: PMC421610 DOI: 10.1128/jb.186.13.4295-4306.2004] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Genome sequencing of Chlamydia trachomatis serovar D has identified polymorphic membrane proteins (Pmp) that are a newly recognized protein family unique to the Chlamydiaceae family. Cumulative data suggest that these diverse proteins are expressed on the cell surface and might be immunologically important. We performed phylogenetic analyses and statistical modeling with 18 reference serovars and 1 genovariant of C. trachomatis to examine the evolutionary characteristics and comparative genetics of PmpC and pmpC, the gene that encodes this protein. We also examined 12 recently isolated ocular and urogenital clinical samples, since reference serovars are laboratory adapted and may not represent strains that are presently responsible for human disease. Phylogenetic reconstructions revealed a clear distinction for disease groups, corresponding to levels of tissue specificity and virulence of the organism. Further, the most prevalent serovars, E, F, and Da, formed a distinct clade. According to the results of comparative genetic analyses, these three genital serovars contained two putative insertion sequence (IS)-like elements with 10- and 15-bp direct repeats, respectively, while all other genital serovars contained one IS-like element. Ocular trachoma serovars also contained both insertions. Previously, no IS-like elements have been identified for Chlamydiaceae. Surprisingly, 7 (58%) of 12 clinical isolates revealed pmpC sequences that were identical to the sequences of other serovars, providing clear evidence for a high rate of whole-gene recombination. Recombination and the differential presence of IS-like elements among distinct disease and prevalence groups may contribute to genome plasticity, which may lead to adaptive changes in tissue tropism and pathogenesis over the course of the organism's evolution.
Collapse
Affiliation(s)
- João P Gomes
- Department of Bacteriology, National Institute of Health, Lisbon, Portugal
| | | | | | | |
Collapse
|
19
|
Abstract
MOTIVATION We review proposed syntheses of probabilistic sequence alignment, profiling and phylogeny. We develop a multiple alignment algorithm for Bayesian inference in the links model proposed by Thorne et al. (1991, J. Mol. Evol., 33, 114-124). The algorithm, described in detail in Section 3, samples from and/or maximizes the posterior distribution over multiple alignments for any number of DNA or protein sequences, conditioned on a phylogenetic tree. The individual sampling and maximization steps of the algorithm require no more computational resources than pairwise alignment. METHODS We present a software implementation (Handel) of our algorithm and report test results on (i) simulated data sets and (ii) the structurally informed protein alignments of BAliBASE (Thompson et al., 1999, Nucleic Acids Res., 27, 2682-2690). RESULTS We find that the mean sum-of-pairs score (a measure of residue-pair correspondence) for the BAliBASE alignments is only 13% lower for Handelthan for CLUSTALW(Thompson et al., 1994, Nucleic Acids Res., 22, 4673-4680), despite the relative simplicity of the links model (CLUSTALW uses affine gap scores and increased penalties for indels in hydrophobic regions). With reference to these benchmarks, we discuss potential improvements to the links model and implications for Bayesian multiple alignment and phylogenetic profiling. AVAILABILITY The source code to Handelis freely distributed on the Internet at http://www.biowiki.org/Handel under the terms of the GNU Public License (GPL, 2000, http://www.fsf.org./copyleft/gpl.html).
Collapse
Affiliation(s)
- I Holmes
- Group T10, Los Alamos National Laboratory, NM 87545, USA.
| | | |
Collapse
|
20
|
Abstract
Rates of molecular evolution vary over time and, hence, among lineages. In contrast, widely used methods for estimating divergence times from molecular sequence data assume constancy of rates. Therefore, methods for estimation of divergence times that incorporate rate variation are attractive. Improvements on a previously proposed Bayesian technique for divergence time estimation are described. New parameterization more effectively captures the phylogenetic structure of rate evolution on a tree. Fossil information and other evidence can now be included in Bayesian analyses in the form of constraints on divergence times. Simulation results demonstrate that the accuracy of divergence time estimation is substantially enhanced when constraints are included.
Collapse
Affiliation(s)
- H Kishino
- Laboratory of Biometrics, Graduate School of Agriculture and Life Sciences, University of Tokyo, Tokyo, Japan.
| | | | | |
Collapse
|
21
|
Holmes I, Bruno WJ. Finding regulatory elements using joint likelihoods for sequence and expression profile data. Proc Int Conf Intell Syst Mol Biol 2001; 8:202-10. [PMID: 10977081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
A recent, popular method of finding promoter sequences is to look for conserved motifs upstream of genes clustered on the basis of expression data. This method presupposes that the clustering is correct. Theoretically, one should be better able to find promoter sequences and create more relevant gene clusters by taking a unified approach to these two problems. We present a likelihood function for a "sequence-expression" model giving a joint likelihood for a promoter sequence and its corresponding expression levels. An algorithm to estimate sequence-expression model parameters using Gibbs sampling and Expectation/Maximization is described. A program, called kimono, that implements this algorithm has been developed: the source code is freely available on the Internet.
Collapse
Affiliation(s)
- I Holmes
- Theoretical Biology & Biophysics, Los Alamos National Laboratory, NM 87545, USA.
| | | |
Collapse
|
22
|
Abstract
Assessment of the evolutionary process is crucial for understanding the effect of protein structure and function on sequence evolution and for many other analyses in molecular evolution. Here, we used simulations to study how taxon sampling affects accuracy of parameter estimation and topological inference in the absence of branch length asymmetry. With maximum-likelihood analysis, we find that adding taxa dramatically improves both support for the evolutionary model and accurate assessment of its parameters when compared with increasing the sequence length. Using a method we call "doppelgänger trees," we distinguish the contributions of two sources of improved topological inference: greater knowledge about internal nodes and greater knowledge of site-specific rate parameters. Surprisingly, highly significant support for the correct general model does not lead directly to improved topological inference. Instead, substantial improvement occurs only with accurate assessment of the evolutionary process at individual sites. Although these results are based on a simplified model of the evolutionary process, they indicate that in general, assuming processes are not independent and identically distributed among sites, more extensive sampling of taxonomic biodiversity will greatly improve analytical results in many current sequence data sets with moderate sequence lengths.
Collapse
Affiliation(s)
- D D Pollock
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, New Mexico, USA.
| | | |
Collapse
|
23
|
Abstract
We introduce a distance-based phylogeny reconstruction method called "weighted neighbor joining," or "Weighbor" for short. As in neighbor joining, two taxa are joined in each iteration; however, the Weighbor criterion for choosing a pair of taxa to join takes into account that errors in distance estimates are exponentially larger for longer distances. The criterion embodies a likelihood function on the distances, which are modeled as correlated Gaussian random variables with different means and variances, computed under a probabilistic model for sequence evolution. The Weighbor criterion consists of two terms, an additivity term and a positivity term, that quantify the implications of joining the pair. The first term evaluates deviations from additivity of the implied external branches, while the second term evaluates confidence that the implied internal branch has a positive branch length. Compared with maximum-likelihood phylogeny reconstruction, Weighbor is much faster, while building trees that are qualitatively and quantitatively similar. Weighbor appears to be relatively immune to the "long branches attract" and "long branch distracts" drawbacks observed with neighbor joining, BIONJ, and parsimony.
Collapse
Affiliation(s)
- W J Bruno
- Los Alamos National Laboratory, New Mexico 87545, USA.
| | | | | |
Collapse
|
24
|
|
25
|
|
26
|
Koshi JM, Bruno WJ. Major structural determinants of transmembrane proteins identified by principal component analysis. Proteins 1999; 34:333-40. [PMID: 10024020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
Abstract
We identify amino acid characteristics important in determining the secondary structures of transmembrane proteins, and compare them with characteristics important for cytoplasmic proteins. Using information derived from multiple sequence alignments, we perform a principal component analysis (PCA) to identify the directions in the 20-dimensional amino acid frequency space that comprise the most variance within each protein secondary structure. These vectors represent the important position-specific properties of the amino acids for coils, turns, beta sheets, and alpha helices. As expected, the most important axis for most of the datasets was hydrophobicity. Additional axes, distinct from hydrophobicity, are surprising, especially in the case of transmembrane alpha helices, where the effects of aromaticity and beta-branching are the next two most significant characteristics. The axis representing beta-branching also has equal importance in cytoplasmic and transmembrane helices, a finding that contrasts with some experimental results in membrane-like environments. In a further analysis, we examine trends for some of the PCA axes over averaged transmembrane alpha helices, and find interesting results for aromaticity.
Collapse
Affiliation(s)
- J M Koshi
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, New Mexico 87545, USA.
| | | |
Collapse
|
27
|
Abstract
Estimation of evolutionary distances from coding sequences must take into account protein-level selection to avoid relative underestimation of longer evolutionary distances. Current modeling of selection via site-to-site rate heterogeneity generally neglects another aspect of selection, namely position-specific amino acid frequencies. These frequencies determine the maximum dissimilarity expected for highly diverged but functionally and structurally conserved sequences, and hence are crucial for estimating long distances. We introduce a codon-level model of coding sequence evolution in which position-specific amino acid frequencies are free parameters. In our implementation, these are estimated from an alignment using methods described previously. We use simulations to demonstrate the importance and feasibility of modeling such behavior; our model produces linear distance estimates over a wide range of distances, while several alternative models underestimate long distances relative to short distances. Site-to-site differences in rates, as well as synonymous/nonsynonymous and first/second/third-codon-position differences, arise as a natural consequence of the site-to-site differences in amino acid frequencies.
Collapse
Affiliation(s)
- A L Halpern
- Los Alamos National Laboratory, New Mexico, USA.
| | | |
Collapse
|
28
|
Abstract
We present a method for estimating the most general reversible substitution matrix corresponding to a given collection of pairwise aligned DNA sequences. This matrix can then be used to calculate evolutionary distances between pairs of sequences in the collection. If only two sequences are considered, our method is equivalent to that of Lanave et al. (1984). The main novelty of our approach is in combining data from different sequence pairs. We describe a weighting method for pairs of taxa related by a known tree that results in uniform weights for all branches. Our method for estimating the rate matrix results in fast execution times, even on large data sets, and does not require knowledge of the phylogenetic relationships among sequences. In a test case on a primate pseudogene, the matrix we arrived at resembles one obtained using maximum likelihood, and the resulting distance measure is shown to have better linearity than is obtained in a less general model.
Collapse
Affiliation(s)
- L Arvestad
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, NM 87545, USA
| | | |
Collapse
|
29
|
Abstract
A computational method is presented for characterizing residue usage, i.e., site-specific residue frequencies, in aligned protein sequences. The method obtains frequency estimates that maximize the likelihood of the sequences in a simple model for sequence evolution, given a tree or a set of candidate trees computed by other methods. These maximum-likelihood frequencies constitute a profile of the sequences, and thus the method offers a rigorous alternative to sequence weighting for constructing such a profile. The ability of this method to discard misleading phylogenetic effects allows the biochemical propensities of different positions in a sequence to be more clearly observed and interpreted.
Collapse
Affiliation(s)
- W J Bruno
- Los Alamos National Laboratory, New Mexico 87545, USA.
| |
Collapse
|
30
|
Xie G, Lobb R, Bruno WJ, Torney DC, Gatewood JM. Single-base sequencing and similarity comparisons. Genomics 1995; 30:445-9. [PMID: 8825629 DOI: 10.1006/geno.1995.1263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
A "single-base sequence" is a DNA sequence in which the identities and locations of bases of only one type have been determined. We present experimental procedures for single-base sequencing and describe the effective use of existing software (FASTA) in similarity comparisons of single-base sequences. We determined the theoretical and experimental minimum sequence lengths required for identification of a sequence within a large dataset and optimized the FASTA parameters for use in single-base similarity comparisons. Single-base sequences have been used to identify cDNAs occurring in a database. Single-base sequencing could be used to reduce the redundancy of "shot-gun sequencing."
Collapse
Affiliation(s)
- G Xie
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, New Mexico 87545, USA
| | | | | | | | | |
Collapse
|
31
|
Abstract
We describe efficient methods for screening clone libraries, based on pooling schemes that we call "random k-sets designs." In these designs, the pools in which any clone occurs are equally likely to be any possible selection of k from the v pools. The values of k and v can be chosen to optimize desirable properties. Random k-sets designs have substantial advantages over alternative pooling schemes: they are efficient, flexible, and easy to specify, require fewer pools, and have error-correcting and error-detecting capabilities. In addition, screening can often be achieved in only one pass, thus facilitating automation. For design comparison, we assume a binomial distribution for the number of "positive" clones, with parameters n, the number of clones, and c, the coverage. We propose the expected number of resolved positive clones--clones that are definitely positive based upon the pool assays--as a criterion for the efficiency of a pooling design. We determine the value of k that is optimal, with respect to this criterion, as a function of v, n, and c. We also describe superior k-sets designs called k-sets packing designs. As an illustration, we discuss a robotically implemented design for a 2.5-fold-coverage, human chromosome 16 YAC library of n = 1298 clones. We also estimate the probability that each clone is positive, given the pool-assay data and a model for experimental errors.
Collapse
Affiliation(s)
- W J Bruno
- Center for Human Genome Studies, Life Sciences Division, Los Alamos National Laboratory, New Mexico 87545, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Abstract
A general N+Q component reaction-diffusion system is analyzed with regard to pattern forming instabilities (Turing bifurcations). The system consists of N mobile species and Q immobile species. The Q immobile species form in response to reactions between the N mobile species and an immobile substrate and allow the Turing instability to occur. These results are valid both for bifurcations from a spatially uniform state and for systems with an externally imposed gradient as in the experimental systems in which Turing patterns have been observed. It is shown that the critical wave number and the location of the instability in parameter space are independent of the substrate concentration. It is also found that the system necessarily undergoes a Hopf bifurcation as the total substrate concentration is decreased. Further, in the case that all the mobile species diffuse at identical rates we show that if the full system is at a point of Turing bifurcation then the N component mobile subsystem is at transition from an unstable focus to an unstable node, and the critical wave number is simply related to the degenerate positive eigenvalue of the mobile subsystem. A sequence of bifurcations that occur in the eigenspectra as the total substrate concentration is decreased to zero is also discussed.
Collapse
Affiliation(s)
- John E. Pearson
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545
| | | |
Collapse
|
33
|
Abstract
We present a theory of enzymatic hydrogen transfer in which hydrogen tunneling is mediated by thermal fluctuations of the enzyme's active site. These fluctuations greatly increase the tunneling rate by shortening the distance the hydrogen must tunnel. The average tunneling distance is shown to decrease when heavier isotopes are substituted for the hydrogen or when the temperature is increased, leading to kinetic isotope effects (KIEs)--defined as the factor by which the reaction slows down when isotopically substituted substrates are used--that need be no larger than KIEs for nontunneling mechanisms. Within this theory we derive a simple KIE expression for vibrationally enhanced ground state tunneling that is able to fit the data for the bovine serum amine oxidase (BSAO) system, correctly predicting the large temperature dependence of the KIEs. Because the KIEs in this theory can resemble those for nontunneling dynamics, distinguishing the two possibilities requires careful measurements over a range of temperatures, as has been done for BSAO.
Collapse
Affiliation(s)
- W J Bruno
- Department of Physics, University of California, Berkeley 94720
| | | |
Collapse
|
34
|
Abstract
The classic experiment of deVault and Chance touched off a long series of theoretical and experimental studies of the interplay between quantum and classical dynamics in photosynthetic electron transfer. More recently these issues have also been addressed in experiments on ligand binding reactions in heme proteins and through the study of kinetic isotope effects in enzymatic proton transfer. Theoretical effort has focused on a class of relatively simple models which display a surprisingly rich spectrum of dynamical behavior. Much less attention has been paid to a very important issue: Why are we allowed to use such simple models to describe such obviously complex molecules? Here we provide some tentative answers to this question, contrasting the cases of electron and proton transfer. We suggest that ideas based on simple models can inspire novel strategies for 'realistic' simulations, and that we can begin to think about the general problems of enzymatic catalysis in terms of dynamical pictures that previously have been applied only to the simpler case of electron transfer.
Collapse
Affiliation(s)
- W Bialek
- Department of Physics, University of California at Berkeley, 94720, Berkeley, California, USA
| | | | | | | |
Collapse
|