26
|
Takala SL, Coulibaly D, Thera MA, Batchelor AH, Cummings MP, Escalante AA, Ouattara A, Traoré K, Niangaly A, Djimdé AA, Doumbo OK, Plowe CV. Extreme polymorphism in a vaccine antigen and risk of clinical malaria: implications for vaccine development. Sci Transl Med 2010; 1:2ra5. [PMID: 20165550 DOI: 10.1126/scitranslmed.3000257] [Citation(s) in RCA: 138] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
Vaccines directed against the blood stages of Plasmodium falciparum malaria are intended to prevent the parasite from invading and replicating within host cells. No blood-stage malaria vaccine has shown clinical efficacy in humans. Most malaria vaccine antigens are parasite surface proteins that have evolved extensive genetic diversity, and this diversity could allow malaria parasites to escape vaccine-induced immunity. We examined the extent and within-host dynamics of genetic diversity in the blood-stage malaria vaccine antigen apical membrane antigen-1 in a longitudinal study in Mali. Two hundred and fourteen unique apical membrane antigen-1 haplotypes were identified among 506 human infections, and amino acid changes near a putative invasion machinery binding site were strongly associated with the development of clinical symptoms, suggesting that these residues may be important to consider in designing polyvalent apical membrane antigen-1 vaccines and in assessing vaccine efficacy in field trials. This extreme diversity may pose a serious obstacle to an effective polyvalent recombinant subunit apical membrane antigen-1 vaccine.
Collapse
|
27
|
Regier JC, Zwick A, Cummings MP, Kawahara AY, Cho S, Weller S, Roe A, Baixeras J, Brown JW, Parr C, Davis DR, Epstein M, Hallwachs W, Hausmann A, Janzen DH, Kitching IJ, Solis MA, Yen SH, Bazinet AL, Mitter C. Toward reconstructing the evolution of advanced moths and butterflies (Lepidoptera: Ditrysia): an initial molecular study. BMC Evol Biol 2009; 9:280. [PMID: 19954545 PMCID: PMC2796670 DOI: 10.1186/1471-2148-9-280] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2009] [Accepted: 12/02/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the mega-diverse insect order Lepidoptera (butterflies and moths; 165,000 described species), deeper relationships are little understood within the clade Ditrysia, to which 98% of the species belong. To begin addressing this problem, we tested the ability of five protein-coding nuclear genes (6.7 kb total), and character subsets therein, to resolve relationships among 123 species representing 27 (of 33) superfamilies and 55 (of 100) families of Ditrysia under maximum likelihood analysis. RESULTS Our trees show broad concordance with previous morphological hypotheses of ditrysian phylogeny, although most relationships among superfamilies are weakly supported. There are also notable surprises, such as a consistently closer relationship of Pyraloidea than of butterflies to most Macrolepidoptera. Monophyly is significantly rejected by one or more character sets for the putative clades Macrolepidoptera as currently defined (P < 0.05) and Macrolepidoptera excluding Noctuoidea and Bombycoidea sensu lato (P < or = 0.005), and nearly so for the superfamily Drepanoidea as currently defined (P < 0.08). Superfamilies are typically recovered or nearly so, but usually without strong support. Relationships within superfamilies and families, however, are often robustly resolved. We provide some of the first strong molecular evidence on deeper splits within Pyraloidea, Tortricoidea, Geometroidea, Noctuoidea and others.Separate analyses of mostly synonymous versus non-synonymous character sets revealed notable differences (though not strong conflict), including a marked influence of compositional heterogeneity on apparent signal in the third codon position (nt3). As available model partitioning methods cannot correct for this variation, we assessed overall phylogeny resolution through separate examination of trees from each character set. Exploration of "tree space" with GARLI, using grid computing, showed that hundreds of searches are typically needed to find the best-feasible phylogeny estimate for these data. CONCLUSION Our results (a) corroborate the broad outlines of the current working phylogenetic hypothesis for Ditrysia, (b) demonstrate that some prominent features of that hypothesis, including the position of the butterflies, need revision, and (c) resolve the majority of family and subfamily relationships within superfamilies as thus far sampled. Much further gene and taxon sampling will be needed, however, to strongly resolve individual deeper nodes.
Collapse
|
28
|
Regier JC, Shultz JW, Ganley ARD, Hussey A, Shi D, Ball B, Zwick A, Stajich JE, Cummings MP, Martin JW, Cunningham CW. Resolving arthropod phylogeny: exploring phylogenetic signal within 41 kb of protein-coding nuclear gene sequence. Syst Biol 2009; 57:920-38. [PMID: 19085333 DOI: 10.1080/10635150802570791] [Citation(s) in RCA: 148] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Abstract
This study attempts to resolve relationships among and within the four basal arthropod lineages (Pancrustacea, Myriapoda, Euchelicerata, Pycnogonida) and to assess the widespread expectation that remaining phylogenetic problems will yield to increasing amounts of sequence data. Sixty-eight regions of 62 protein-coding nuclear genes (approximately 41 kilobases (kb)/taxon) were sequenced for 12 taxonomically diverse arthropod taxa and a tardigrade outgroup. Parsimony, likelihood, and Bayesian analyses of total nucleotide data generally strongly supported the monophyly of each of the basal lineages represented by more than one species. Other relationships within the Arthropoda were also supported, with support levels depending on method of analysis and inclusion/exclusion of synonymous changes. Removing third codon positions, where the assumption of base compositional homogeneity was rejected, altered the results. Removing the final class of synonymous mutations--first codon positions encoding leucine and arginine, which were also compositionally heterogeneous--yielded a data set that was consistent with a hypothesis of base compositional homogeneity. Furthermore, under such a data-exclusion regime, all 68 gene regions individually were consistent with base compositional homogeneity. Restricting likelihood analyses to nonsynonymous change recovered trees with strong support for the basal lineages but not for other groups that were variably supported with more inclusive data sets. In a further effort to increase phylogenetic signal, three types of data exploration were undertaken. (1) Individual genes were ranked by their average rate of nonsynonymous change, and three rate categories were assigned--fast, intermediate, and slow. Then, bootstrap analysis of each gene was performed separately to see which taxonomic groups received strong support. Five taxonomic groups were strongly supported independently by two or more genes, and these genes mostly belonged to the slow or intermediate categories, whereas groups supported only by a single gene region tended to be from genes of the fast category, arguing that fast genes provide a less consistent signal. (2) A sensitivity analysis was performed in which increasing numbers of genes were excluded, beginning with the fastest. The number of strongly supported nodes increased up to a point and then decreased slightly. Recovery of Hexapoda required removal of fast genes. Support for Mandibulata (Pancrustacea + Myriapoda) also increased, at times to "strong" levels, with removal of the fastest genes. (3) Concordance selection was evaluated by clustering genes according to their ability to recover Pancrustacea, Euchelicerata, or Myriapoda and analyzing the three clusters separately. All clusters of genes recovered the three concordance clades but were at times inconsistent in the relationships recovered among and within these clades, a result that indicates that the a priori concordance criteria may bias phylogenetic signal in unexpected ways. In a further attempt to increase support of taxonomic relationships, sequence data from 49 additional taxa for three slow genes (i.e., EF-1 alpha, EF-2, and Pol II) were combined with the various 13-taxon data sets. The 62-taxon analyses supported the results of the 13-taxon analyses and provided increased support for additional pancrustacean clades found in an earlier analysis including only EF-1 alpha, EF-2, and Pol II.
Collapse
|
29
|
Abstract
We introduce a statistic, the genealogical sorting index (gsi), for quantifying the degree of exclusive ancestry of labeled groups on a rooted genealogy and demonstrate its application. The statistic is simple, intuitive, and easily calculated. It has a normalized range to facilitate comparisons among different groups, trees, or studies and it provides information on individual groups rather than a composite measure for all groups. It naturally handles polytomies and accommodates measures of uncertainty in phylogenetic relationships. We use coalescent simulations to explore the behavior of the gsi across a range of divergence times, with the mean value increasing to 1, the maximum value when exclusivity within a group reached monophyly. Simulations also demonstrate that the power to reject the null hypothesis of mixed genealogical ancestry increased markedly as sample size increased, and that the gsi provides a statistically more powerful measure of divergence than FST. Applications to data from published studies demonstrated that the gsi provides a useful way to detect significant exclusivity even when groups are not monophyletic. Although we describe this statistic in the context of divergence, it is more broadly applicable to quantify and assess the significance of clustering of observations in labeled groups on any tree.
Collapse
|
30
|
Parr CS, Cummings MP. Data sharing in ecology and evolution. Trends Ecol Evol 2007; 20:362-3. [PMID: 16701396 DOI: 10.1016/j.tree.2005.04.023] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2005] [Revised: 04/14/2005] [Accepted: 04/22/2005] [Indexed: 10/25/2022]
|
31
|
Grand J, Cummings MP, Rebelo TG, Ricketts TH, Neel MC. Biased data reduce efficiency and effectiveness of conservation reserve networks. Ecol Lett 2007; 10:364-74. [PMID: 17498135 DOI: 10.1111/j.1461-0248.2007.01025.x] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Complementarity-based reserve selection algorithms efficiently prioritize sites for biodiversity conservation, but they are data-intensive and most regions lack accurate distribution maps for the majority of species. We explored implications of basing conservation planning decisions on incomplete and biased data using occurrence records of the plant family Proteaceae in South Africa. Treating this high-quality database as 'complete', we introduced three realistic sampling biases characteristic of biodiversity databases: a detectability sampling bias and two forms of roads sampling bias. We then compared reserve networks constructed using complete, biased, and randomly sampled data. All forms of biased sampling performed worse than both the complete data set and equal-effort random sampling. Biased sampling failed to detect a median of 1-5% of species, and resulted in reserve networks that were 9-17% larger than those designed with complete data. Spatial congruence and the correlation of irreplaceability scores between reserve networks selected with biased and complete data were low. Thus, reserve networks based on biased data require more area to protect fewer species and identify different locations than those selected with randomly sampled or complete data.
Collapse
|
32
|
Cummings MP, Meyer A. Magic bullets and golden rules: data sampling in molecular phylogenetics. ZOOLOGY 2005; 108:329-36. [PMID: 16351981 DOI: 10.1016/j.zool.2005.09.006] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2005] [Revised: 09/22/2005] [Accepted: 09/23/2005] [Indexed: 11/23/2022]
Abstract
Data collection for molecular phylogenetic studies is based on samples of both genes and taxa. In an ideal world, with no limitations to resources, as many genes could be sampled as deemed necessary to address phylogenetic problems. Given limited resources in the real world, inadequate (in terms of choice of genes or number of genes) sequences or restricted taxon sampling can adversely affect the reliability or information gained in phylogenetics. Recent empirical and simulation-based studies of data sampling in molecular phylogenetics have reached differing conclusions on how to deal with these problems. Some advocated sampling more genes, others more taxa. There is certainly no 'magic bullet' that will fit all phylogenetic problems, and no specific 'golden rules' have been deduced, other than that single genes may not always contain sufficient phylogenetic information. However, several general conclusions and suggestions can be made. One suggestion is that the determination of a multiple, but moderate number (e.g., 6-10) of gene sequences might take precedence over sequencing a larger set of genes and thereby permit the sampling of more taxa for a phylogenetic study.
Collapse
|
33
|
Cummings MP, Segal MR. Few amino acid positions in rpoB are associated with most of the rifampin resistance in Mycobacterium tuberculosis. BMC Bioinformatics 2004; 5:137. [PMID: 15453919 PMCID: PMC524371 DOI: 10.1186/1471-2105-5-137] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2004] [Accepted: 09/28/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mutations in rpoB, the gene encoding the beta subunit of DNA-dependent RNA polymerase, are associated with rifampin resistance in Mycobacterium tuberculosis. Several studies have been conducted where minimum inhibitory concentration (MIC, which is defined as the minimum concentration of the antibiotic in a given culture medium below which bacterial growth is not inhibited) of rifampin has been measured and partial DNA sequences have been determined for rpoB in different isolates of M. tuberculosis. However, no model has been constructed to predict rifampin resistance based on sequence information alone. Such a model might provide the basis for quantifying rifampin resistance status based exclusively on DNA sequence data and thus eliminate the requirements for time consuming culturing and antibiotic testing of clinical isolates. RESULTS Sequence data for amino acid positions 511-533 of rpoB and associated MIC of rifampin for different isolates of M. tuberculosis were taken from studies examining rifampin resistance in clinical samples from New York City and throughout Japan. We used tree-based statistical methods and random forests to generate models of the relationships between rpoB amino acid sequence and rifampin resistance. The proportion of variance explained by a relatively simple tree-based cross-validated regression model involving two amino acid positions (526 and 531) is 0.679. The first partition in the data, based on position 531, results in groups that differ one hundredfold in mean MIC (1.596 micrograms/ml and 159.676 micrograms/ml). The subsequent partition based on position 526, the most variable in this region, results in a > 354-fold difference in MIC. When considered as a classification problem (susceptible or resistant), a cross-validated tree-based model correctly classified most (0.884) of the observations and was very similar to the regression model. Random forest analysis of the MIC data as a continuous variable, a regression problem, produced a model that explained 0.861 of the variance. The random forest analysis of the MIC data as discrete classes produced a model that correctly classified 0.942 of the observations with sensitivity of 0.958 and specificity of 0.885. CONCLUSIONS Highly accurate regression and classification models of rifampin resistance can be made based on this short sequence region. Models may be better with improved (and consistent) measurements of MIC and more sequence data.
Collapse
|
34
|
Cummings MP, Myers DS. Simple statistical models predict C-to-U edited sites in plant mitochondrial RNA. BMC Bioinformatics 2004; 5:132. [PMID: 15373947 PMCID: PMC521485 DOI: 10.1186/1471-2105-5-132] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2004] [Accepted: 09/16/2004] [Indexed: 11/10/2022] Open
Abstract
Background RNA editing is the process whereby an RNA sequence is modified from the sequence of the corresponding DNA template. In the mitochondria of land plants, some cytidines are converted to uridines before translation. Despite substantial study, the molecular biological mechanism by which C-to-U RNA editing proceeds remains relatively obscure, although several experimental studies have implicated a role for cis-recognition. A highly non-random distribution of nucleotides is observed in the immediate vicinity of edited sites (within 20 nucleotides 5' and 3'), but no precise consensus motif has been identified. Results Data for analysis were derived from the the complete mitochondrial genomes of Arabidopsis thaliana, Brassica napus, and Oryza sativa; additionally, a combined data set of observations across all three genomes was generated. We selected datasets based on the 20 nucleotides 5' and the 20 nucleotides 3' of edited sites and an equivalently sized and appropriately constructed null-set of non-edited sites. We used tree-based statistical methods and random forests to generate models of C-to-U RNA editing based on the nucleotides surrounding the edited/non-edited sites and on the estimated folding energies of those regions. Tree-based statistical methods based on primary sequence data surrounding edited/non-edited sites and estimates of free energy of folding yield models with optimistic re-substitution-based estimates of ~0.71 accuracy, ~0.64 sensitivity, and ~0.88 specificity. Random forest analysis yielded better models and more exact performance estimates with ~0.74 accuracy, ~0.72 sensitivity, and ~0.81 specificity for the combined observations. Conclusions Simple models do moderately well in predicting which cytidines will be edited to uridines, and provide the first quantitative predictive models for RNA edited sites in plant mitochondria. Our analysis shows that the identity of the nucleotide -1 to the edited C and the estimated free energy of folding for a 41 nt region surrounding the edited C are the most important variables that distinguish most edited from non-edited sites. However, the results suggest that primary sequence data and simple free energy of folding calculations alone are insufficient to make highly accurate predictions.
Collapse
|
35
|
Cummings MP. A book like its cover. Heredity (Edinb) 2004. [DOI: 10.1038/sj.hdy.6800475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
|
36
|
Neel MC, Cummings MP. Section-level relationships of North American Agalinis (Orobanchaceae) based on DNA sequence analysis of three chloroplast gene regions. BMC Evol Biol 2004; 4:15. [PMID: 15186507 PMCID: PMC446187 DOI: 10.1186/1471-2148-4-15] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2004] [Accepted: 06/08/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The North American Agalinis are representatives of a taxonomically difficult group that has been subject to extensive taxonomic revision from species level through higher sub-generic designations (e.g., subsections and sections). Previous presentations of relationships have been ambiguous and have not conformed to modern phylogenetic standards (e.g., were not presented as phylogenetic trees). Agalinis contains a large number of putatively rare taxa that have some degree of taxonomic uncertainty. We used DNA sequence data from three chloroplast genes to examine phylogenetic relationships among sections within the genus Agalinis Raf. (=Gerardia), and between Agalinis and closely related genera within Orobanchaceae. RESULTS Maximum likelihood analysis of sequences data from rbcL, ndhF, and matK gene regions (total aligned length 7323 bp) yielded a phylogenetic tree with high bootstrap values for most branches. Likelihood ratio tests showed that all but a few branch lengths were significantly greater than zero, and an additional likelihood ratio test rejected the molecular clock hypothesis. Comparisons of substitution rates between gene regions based on linear models of pairwise distance estimates between taxa show both ndhF and matK evolve more rapidly than rbcL, although the there is substantial rate heterogeneity within gene regions due in part to rate differences among codon positions. CONCLUSIONS Phylogenetic analysis supports the monophyly of Agalinis, including species formerly in Tomanthera, and this group is sister to a group formed by the genera Aureolaria, Brachystigma, Dasistoma, and Seymeria. Many of the previously described sections within Agalinis are polyphyletic, although many of the subsections appear to form natural groups. The analysis reveals a single evolutionary event leading to a reduction in chromosome number from n = 14 to n = 13 based on the sister group relationship of section Erectae and section Purpureae subsection Pedunculares. Our results establish the evolutionary distinctiveness of A. tenella from the more widespread and common A. obtusifolia. However, further data are required to clearly resolve the relationship between A. acuta and A. tenella.
Collapse
|
37
|
Mark Welch DB, Cummings MP, Hillis DM, Meselson M. Divergent gene copies in the asexual class Bdelloidea (Rotifera) separated before the bdelloid radiation or within bdelloid families. Proc Natl Acad Sci U S A 2004; 101:1622-5. [PMID: 14747660 PMCID: PMC341794 DOI: 10.1073/pnas.2136686100] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Rotifers of the asexual class Bdelloidea are unusual in possessing two or more divergent copies of every gene that has been examined. Phylogenetic analysis of the heat-shock gene hsp82 and the TATA-box-binding protein gene tbp in multiple bdelloid species suggested that for each gene, each copy belonged to one of two lineages that began to diverge before the bdelloid radiation. Such gene trees are consistent with the two lineages having descended from former alleles that began to diverge after meiotic segregation ceased or from subgenomes of an alloploid ancestor of the bdelloids. However, the original analyses of bdelloid gene-copy divergence used only a single outgroup species and were based on parsimony and neighbor joining. We have now used maximum likelihood and Bayesian inference methods and, for hsp82, multiple outgroups in an attempt to produce more robust gene trees. Here we report that the available data do not unambiguously discriminate between gene trees that root the origin of hsp82 and tbp copy divergence before the bdelloid radiation and those which indicate that the gene copies began to diverge within bdelloid families. The remarkable presence of multiple diverged gene copies in individual genomes is nevertheless consistent with the loss of sex in an ancient ancestor of bdelloids.
Collapse
|
38
|
Cummings MP, Handley SA, Myers DS, Reed DL, Rokas A, Winka K. Comparing bootstrap and posterior probability values in the four-taxon case. Syst Biol 2003; 52:477-87. [PMID: 12857639 DOI: 10.1080/10635150390218213] [Citation(s) in RCA: 230] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
Assessment of the reliability of a given phylogenetic hypothesis is an important step in phylogenetic analysis. Historically, the nonparametric bootstrap procedure has been the most frequently used method for assessing the support for specific phylogenetic relationships. The recent employment of Bayesian methods for phylogenetic inference problems has resulted in clade support being expressed in terms of posterior probabilities. We used simulated data and the four-taxon case to explore the relationship between nonparametric bootstrap values (as inferred by maximum likelihood) and posterior probabilities (as inferred by Bayesian analysis). The results suggest a complex association between the two measures. Three general regions of tree space can be identified: (1) the neutral zone, where differences between mean bootstrap and mean posterior probability values are not significant, (2) near the two-branch corner, and (3) deep in the two-branch corner. In the last two regions, significant differences occur between mean bootstrap and mean posterior probability values. Whether bootstrap or posterior probability values are higher depends on the data in support of alternative topologies. Examination of star topologies revealed that both bootstrap and posterior probability values differ significantly from theoretical expectations; in particular, there are more posterior probability values in the range 0.85-1 than expected by theory. Therefore, our results corroborate the findings of others that posterior probability values are excessively high. Our results also suggest that extrapolations from single topology branch-length studies are unlikely to provide any general conclusions regarding the relationship between bootstrap and posterior probability values.
Collapse
|
39
|
Cummings MP, Nugent JM, Olmstead RG, Palmer JD. Phylogenetic analysis reveals five independent transfers of the chloroplast gene rbcL to the mitochondrial genome in angiosperms. Curr Genet 2003; 43:131-8. [PMID: 12695853 DOI: 10.1007/s00294-003-0378-3] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2002] [Revised: 01/13/2003] [Accepted: 01/16/2003] [Indexed: 11/29/2022]
Abstract
We used the chloroplast gene rbcL as a model to study the frequency and relative timing of transfer of chloroplast sequences to the mitochondrial genome. Southern blot survey of 20 mitochondrial DNAs confirmed three previously reported groups of plants containing rbcL in their mitochondrion, while PCR studies identified a new mitochondrial rbcL. Published and newly determined mitochondrial and chloroplast rbcL sequences were used to reconstruct rbcL phylogeny. The results imply five or six separate interorganellar transfers of rbcL among the angiosperms examined, and hundreds of successful transfers across all flowering plants. By taxonomic criteria, the crucifer transfer is the most ancient, two separate transfers within the grass family are of intermediate ancestry, and the morning-glory transfer is most recent. All five mitochondrial copies of rbcL examined exhibit insertion and/or deletion events that disrupt the reading frame (three are grossly truncated); and all are elevated in the proportion of nonsynonymous substitutions, providing clear evidence that these sequences are pseudogenes.
Collapse
|
40
|
|
41
|
García-Varela M, Cummings MP, Pérez-Ponce de León G, Gardner SL, Laclette JP. Phylogenetic analysis based on 18S ribosomal RNA gene sequences supports the existence of class polyacanthocephala (acanthocephala). Mol Phylogenet Evol 2002; 23:288-92. [PMID: 12069558 DOI: 10.1016/s1055-7903(02)00020-9] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Members of phylum Acanthocephala are parasites of vertebrates and arthropods and are distributed worldwide. The phylum has traditionally been divided into three classes, Archiacanthocephala, Palaeacanthocephala, and Eoacanthocephala; a fourth class, Polyacanthocephala, has been recently proposed. However, erection of this new class, based on morphological characters, has been controversial. We sequenced the near complete 18S rRNA gene of Polyacanthorhynchus caballeroi (Polyacanthocephala) and Rhadinorhynchus sp. (Palaeacanthocephala); these sequences were aligned with another 21 sequences of acanthocephalans representing the three widely recognized classes of the phylum and with 16 sequences from outgroup taxa. Phylogenetic relationships inferred by maximum-likelihood and maximum-parsimony analyses showed Archiacanthocephala as the most basal group within the phylum, whereas classes Polyacanthocephala + Eoacanthocephala formed a monophyletic clade, with Palaeacanthocephala as its sister group. These results are consistent with the view of Polyacanthocephala representing an independent class within Acanthocephala.
Collapse
|
42
|
Segal MR, Cummings MP, Hubbard AE. Relating amino acid sequence to phenotype: analysis of peptide-binding data. Biometrics 2001; 57:632-42. [PMID: 11414594 DOI: 10.1111/j.0006-341x.2001.00632.x] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
We illustrate data analytic concerns that arise in the context of relating genotype, as represented by amino acid sequence, to phenotypes (outcomes). The present application examines whether peptides that bind to a particular major histocompatibility complex (MHC) class I molecule have characteristic amino acid sequences. However, the concerns identified and addressed are considerably more general. It is recognized that simple rules for predicting binding based solely on preferences for specific amino acids in certain (anchor) positions of the peptide's amino acid sequence are generally inadequate and that binding is potentially influenced by all sequence positions as well as between-position interactions. The desire to elucidate these more complex prediction rules has spawned various modeling attempts, the shortcomings of which provide motivation for the methods adopted here. Because of (i) this need to model between-position interactions, (ii) amino acids constituting a highly (20) multilevel unordered categorical covariate, and (iii) there frequently being numerous such covariates (i.e., positions) comprising the sequence, standard regression/classification techniques are problematic due to the proliferation of indicator variables required for encoding the sequence position covariates and attendant interactions. These difficulties have led to analyses based on (continuous) properties (e.g., molecular weights) of the amino acids. However, there is potential information loss in such an approach if the properties used are incomplete and/or do not capture the mechanism underlying association with the phenotype. Here we demonstrate that handling unordered categorical covariates with numerous levels and accompanying interactions can be done effectively using classification trees and recently devised bump-hunting methods. We further tackle the question of whether observed associations are attributable to amino acid properties as well as addressing the assessment and implications of between-position covariation.
Collapse
|
43
|
Pollock DD, Eisen JA, Doggett NA, Cummings MP. A case for evolutionary genomics and the comprehensive examination of sequence biodiversity. Mol Biol Evol 2000; 17:1776-88. [PMID: 11110893 DOI: 10.1093/oxfordjournals.molbev.a026278] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Comparative analysis is one of the most powerful methods available for understanding the diverse and complex systems found in biology, but it is often limited by a lack of comprehensive taxonomic sampling. Despite the recent development of powerful genome technologies capable of producing sequence data in large quantities (witness the recently completed first draft of the human genome), there has been relatively little change in how evolutionary studies are conducted. The application of genomic methods to evolutionary biology is a challenge, in part because gene segments from different organisms are manipulated separately, requiring individual purification, cloning, and sequencing. We suggest that a feasible approach to collecting genome-scale data sets for evolutionary biology (i.e., evolutionary genomics) may consist of combination of DNA samples prior to cloning and sequencing, followed by computational reconstruction of the original sequences. This approach will allow the full benefit of automated protocols developed by genome projects to be realized; taxon sampling levels can easily increase to thousands for targeted genomes and genomic regions. Sequence diversity at this level will dramatically improve the quality and accuracy of phylogenetic inference, as well as the accuracy and resolution of comparative evolutionary studies. In particular, it will be possible to make accurate estimates of normal evolution in the context of constant structural and functional constraints (i.e., site-specific substitution probabilities), along with accurate estimates of changes in evolutionary patterns, including pairwise coevolution between sites, adaptive bursts, and changes in selective constraints. These estimates can then be used to understand and predict the effects of protein structure and function on sequence evolution and to predict unknown details of protein structure, function, and functional divergence. In order to demonstrate the practicality of these ideas and the potential benefit for functional genomic analysis, we describe a pilot project we are conducting to simultaneously sequence large numbers of vertebrate mitochondrial genomes.
Collapse
|
44
|
García-Varela M, Pérez-Ponce de León G, de la Torre P, Cummings MP, Sarma SS, Laclette JP. Phylogenetic relationships of Acanthocephala based on analysis of 18S ribosomal RNA gene sequences. J Mol Evol 2000; 50:532-40. [PMID: 10835483 DOI: 10.1007/s002390010056] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Acanthocephala (thorny-headed worms) is a phylum of endoparasites of vertebrates and arthropods, included among the most phylogenetically basal tripoblastic pseudocoelomates. The phylum is divided into three classes: Archiacanthocephala, Palaeacanthocephala, and Eoacanthocephala. These classes are distinguished by morphological characters such as location of lacunar canals, persistence of ligament sacs in females, number and type of cement glands in males, number and size of proboscis hooks, host taxonomy, and ecology. To understand better the phylogenetic relationships within Acanthocephala, and between Acanthocephala and Rotifera, we sequenced the nearly complete 18S rRNA genes of nine species from the three classes of Acanthocephala and four species of Rotifera from the classes Bdelloidea and Monogononta. Phylogenetic relationships were inferred by maximum-likelihood analyses of these new sequences and others previously determined. The analyses showed that Acanthocephala is the sister group to a clade including Eoacanthocephala and Palaeacanthocephala. Archiacanthocephala exhibited a slower rate of evolution at the nucleotide level, as evidenced by shorter branch lengths for the group. We found statistically significant support for the monophyly of Rotifera, represented in our analysis by species from the clade Eurotatoria, which includes the classes Bdelloidea and Monogononta. Eurotatoria also appears as the sister group to Acanthocephala.
Collapse
|
45
|
Cummings MP, Otto SP, Wakeley J. Genes and other samples of DNA sequence data for phylogenetic inference. THE BIOLOGICAL BULLETIN 1999; 196:345-350. [PMID: 10447352 DOI: 10.2307/1542967] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
|
46
|
Campos A, Cummings MP, Reyes JL, Laclette JP. Phylogenetic relationships of platyhelminthes based on 18S ribosomal gene sequences. Mol Phylogenet Evol 1998; 10:1-10. [PMID: 9751913 DOI: 10.1006/mpev.1997.0483] [Citation(s) in RCA: 48] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Nucleotide sequences of 18S ribosomal RNA from 71 species of Platyhelminthes, the flatworms, were analyzed using maximum likelihood, and the resulting phylogenetic trees were compared with previous phylogenetic hypotheses. Analyses including 15 outgroup species belonging to eight other phyla show that Platyhelminthes are monophyletic with the exception of a sequence putatively from Acoela sp., Lecithoepitheliata, Polycladida, Tricladida, Trematoda (Aspidobothrii + Digenea), Monogenea, and Cestoda (Gyrocotylidea + Amphilinidea + Eucestoda) are monophyletic groups. Catenulids form the sister group to the rest of platyhelminths, whereas a complex clade formed by Acoela, Tricladida, "Dalyellioida", and perhaps "Typhloplanoida" is sister to Neodermata. "Typhloplanoida" does not appear to be monophyletic; Fecampiida does not appear to belong within "Dalyellioida," nor Kalyptorhynchia within "Typhloplanoida." Trematoda is the sister group to the rest of Neodermata, and Monogenea is sister group to Cestoda. Within Trematoda, Aspidobothrii is the sister group of Digenea and Heronimidae is the most basal family in Digenea. Our trees support the hypothesis that parasitism evolved at least twice in Platyhelminthes, once in the ancestor to Neodermata and again in the ancestor of Fecampiida, independently to the ancestor of putatively parasitic "Dalyellioida."
Collapse
|
47
|
Cummings MP, Clegg MT. Nucleotide sequence diversity at the alcohol dehydrogenase 1 locus in wild barley (Hordeum vulgare ssp. spontaneum): an evaluation of the background selection hypothesis. Proc Natl Acad Sci U S A 1998; 95:5637-42. [PMID: 9576936 PMCID: PMC20431 DOI: 10.1073/pnas.95.10.5637] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The background selection hypothesis predicts a reduction in nucleotide site diversity and an excess of rare variants, owing to linkage associations with deleterious alleles. This effect is expected to be amplified in species that are predominantly self-fertilizing. To examine the predictions of the background selection hypothesis in self-fertilizing species, we sequenced 1,362 bp of adh1, a gene for alcohol dehydrogenase (Adh; alcohol:NAD+ oxidoreductase, EC 1.1.1.1), in a sample of 45 accessions of wild barley, Hordeum vulgare ssp. spontaneum, drawn from throughout the species range. The region sequenced included 786 bp of exon sequence (part of exon 4, all of exons 5-9, and part of exon 10) and 576 bp of intron sequence (all of introns 4-9). There were 19 sites polymorphic for nucleotide substitutions, 8 in introns, and 11 in exons. Of the 11 nucleotide substitutions in codons, 4 were synonymous and 7 were nonsynonymous, occurring uniquely in the sample. There was no evidence of recombination in the region studied, and the estimated effective population size (Ne) based on synonymous sites was approximately 1.8-4.2 x 10(5). Several tests reveal that the pattern of nonsynonymous substitutions departs significantly from neutral expectations. However, the data do not appear to be consistent with recovery from a population bottleneck, recent population expansion, selective sweep, or strong positive selection. Though several features of the data are consistent with background selection, the distributions of polymorphic synonymous and intron sites are not perturbed toward a significant excess of rare alleles as would be predicted by background selection.
Collapse
|
48
|
King LM, Cummings MP. Satellite DNA repeat sequence variation is low in three species of burying beetles in the genus Nicrophorus (Coleoptera: Silphidae). Mol Biol Evol 1997; 14:1088-95. [PMID: 9364766 DOI: 10.1093/oxfordjournals.molbev.a025718] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Three satellite DNA families were identified in three species of burying beetles, Nicrophorus orbicollis, N. marginatus, and N. americanus. Southern hybridization and nucleotide sequence analysis of individual randomly cloned repeats shows that these satellite DNA families are highly abundant in the genome, are composed of unique repeats, and are species-specific. The repeats do not have identifiable core elements or substructures that are similar in all three families, and most interspecific sequence similarity is confined to homopolymeric runs of A and T. Satellite DNA from N. marginatus and N. americanus show single-base-pair indels among repeats, but single-nucleotide substitutions characterize most of the repeat variability. Although the repeat units are of similar lengths (342, 350, and 354 bp) and A + T composition (65%, 71%, and 71%, respectively), the average nucleotide divergence among sequenced repeats is very low (0.18%, 1.22%, and 0.71%, respectively). Transition/transversion ratios from the consensus sequence are 0.20, 0.69, and 0.70, respectively.
Collapse
|
49
|
Abstract
We analyze the evolutionary dynamics of three of the best-studied plant nuclear multigene families. The data analyzed derive from the genes that encode the small subunit of ribulose-1,5-bisphosphate carboxylase (rbcS), the gene family that encodes the enzyme chalcone synthase (Chs), and the gene family that encodes alcohol dehydrogenases (Adh). In addition, we consider the limited evolutionary data available on plant transposable elements. New Chs and rbcS genes appear to be recruited at about 10 times the rate estimated for Adh genes, and this is correlated with a much smaller average gene family size for Adh genes. In addition, duplication and divergence in function appears to be relatively common for Chs genes in flowering plant evolution. Analyses of synonymous nucleotide substitution rates for Adh genes in monocots reject a linear relationship with clock time. Replacement substitution rates vary with time in a complex fashion, which suggests that adaptive evolution has played an important role in driving divergence following gene duplication events. Molecular population genetic studies of Adh and Chs genes reveal high levels of molecular diversity within species. These studies also reveal that inter- and intralocus recombination are important forces in the generation allelic novelties. Moreover, illegitimate recombination events appear to be an important factor in transposable element loss in plants. When we consider the recruitment and loss of new gene copies, the generation of allelic diversity within plant species, and ectopic exchange among transposable elements, we conclude that recombination is a pervasive force at all levels of plant evolution.
Collapse
|
50
|
Mutasim DF, Cummings MP. Linear IgA disease with clinical and immunopathological features of epidermolysis bullosa acquisita. Pediatr Dermatol 1997; 14:303-6. [PMID: 9263315 DOI: 10.1111/j.1525-1470.1997.tb00964.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
A 10-year-old boy had a 3-month history of urticarial plaques and vesicles. Histologic and immunofluorescence testing confirmed the diagnosis of linear IgA disease. Immunoelectron microscopy revealed IgA deposits in the sublamina densa area similar to those seen in epidermolysis bullosa acquisita. Milia developed after resolution of the lesions, similar to lesions of epidermolysis bullosa acquisita.
Collapse
|