1
|
Systems genetics uncover new loci containing functional gene candidates in Mycobacterium tuberculosis-infected Diversity Outbred mice. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.21.572738. [PMID: 38187647 PMCID: PMC10769337 DOI: 10.1101/2023.12.21.572738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Mycobacterium tuberculosis, the bacillus that causes tuberculosis (TB), infects 2 billion people across the globe, and results in 8-9 million new TB cases and 1-1.5 million deaths each year. Most patients have no known genetic basis that predisposes them to disease. We investigated the complex genetic basis of pulmonary TB by modelling human genetic diversity with the Diversity Outbred mouse population. When infected with M. tuberculosis, one-third develop early onset, rapidly progressive, necrotizing granulomas and succumb within 60 days. The remaining develop non-necrotizing granulomas and survive longer than 60 days. Genetic mapping using clinical indicators of disease, granuloma histopathological features, and immune response traits identified five new loci on mouse chromosomes 1, 2, 4, 16 and three previously identified loci on chromosomes 3 and 17. Quantitative trait loci (QTLs) on chromosomes 1, 16, and 17, associated with multiple correlated traits and had similar patterns of allele effects, suggesting these QTLs contain important genetic regulators of responses to M. tuberculosis. To narrow the list of candidate genes in QTLs, we used a machine learning strategy that integrated gene expression signatures from lungs of M. tuberculosis-infected Diversity Outbred mice with gene interaction networks, generating functional scores. The scores were then used to rank candidates for each mapped trait in each locus, resulting in 11 candidates: Ncf2, Fam20b, S100a8, S100a9, Itgb5, Fstl1, Zbtb20, Ddr1, Ier3, Vegfa, and Zfp318. Importantly, all 11 candidates have roles in infection, inflammation, cell migration, extracellular matrix remodeling, or intracellular signaling. Further, all candidates contain single nucleotide polymorphisms (SNPs), and some but not all SNPs were predicted to have deleterious consequences on protein functions. Multiple methods were used for validation including (i) a statistical method that showed Diversity Outbred mice carrying PWH/PhJ alleles on chromosome 17 QTL have shorter survival; (ii) quantification of S100A8 protein levels, confirming predicted allele effects; and (iii) infection of C57BL/6 mice deficient for the S100a8 gene. Overall, this work demonstrates that systems genetics using Diversity Outbred mice can identify new (and known) QTLs and new functionally relevant gene candidates that may be major regulators of granuloma necrosis and acute inflammation in pulmonary TB.
Collapse
|
2
|
High-precision genetic mapping of behavioral traits in the diversity outbred mouse population. GENES BRAIN AND BEHAVIOR 2013; 12:424-37. [PMID: 23433259 PMCID: PMC3709837 DOI: 10.1111/gbb.12029] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2012] [Revised: 01/14/2013] [Accepted: 02/17/2013] [Indexed: 12/11/2022]
Abstract
Historically our ability to identify genetic variants underlying complex behavioral traits in mice has been limited by low mapping resolution of conventional mouse crosses. The newly developed Diversity Outbred (DO) population promises to deliver improved resolution that will circumvent costly fine-mapping studies. The DO is derived from the same founder strains as the Collaborative Cross (CC), including three wild-derived strains. Thus the DO provides more allelic diversity and greater potential for discovery compared to crosses involving standard mouse strains. We have characterized 283 male and female DO mice using open-field, light–dark box, tail-suspension and visual-cliff avoidance tests to generate 38 behavioral measures. We identified several quantitative trait loci (QTL) for these traits with support intervals ranging from 1 to 3 Mb in size. These intervals contain relatively few genes (ranging from 5 to 96). For a majority of QTL, using the founder allelic effects together with whole genome sequence data, we could further narrow the positional candidates. Several QTL replicate previously published loci. Novel loci were also identified for anxiety- and activity-related traits. Half of the QTLs are associated with wild-derived alleles, confirming the value to behavioral genetics of added genetic diversity in the DO. In the presence of wild-alleles we sometimes observe behaviors that are qualitatively different from the expected response. Our results demonstrate that high-precision mapping of behavioral traits can be achieved with moderate numbers of DO animals, representing a significant advance in our ability to leverage the mouse as a tool for behavioral genetics
Collapse
|
3
|
Discovery of blood transcriptomic markers for depression in animal models and pilot validation in subjects with early-onset major depression. Transl Psychiatry 2012; 2:e101. [PMID: 22832901 PMCID: PMC3337072 DOI: 10.1038/tp.2012.26] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Early-onset major depressive disorder (MDD) is a serious and prevalent psychiatric illness in adolescents and young adults. Current treatments are not optimally effective. Biological markers of early-onset MDD could increase diagnostic specificity, but no such biomarker exists. Our innovative approach to biomarker discovery for early-onset MDD combined results from genome-wide transcriptomic profiles in the blood of two animal models of depression, representing the genetic and the environmental, stress-related, etiology of MDD. We carried out unbiased analyses of this combined set of 26 candidate blood transcriptomic markers in a sample of 15-19-year-old subjects with MDD (N=14) and subjects with no disorder (ND, N=14). A panel of 11 blood markers differentiated participants with early-onset MDD from the ND group. Additionally, a separate but partially overlapping panel of 18 transcripts distinguished subjects with MDD with or without comorbid anxiety. Four transcripts, discovered from the chronic stress animal model, correlated with maltreatment scores in youths. These pilot data suggest that our approach can lead to clinically valid diagnostic panels of blood transcripts for early-onset MDD, which could reduce diagnostic heterogeneity in this population and has the potential to advance individualized treatment strategies.
Collapse
|
4
|
Gene expression patterns in the hippocampus and amygdala of endogenous depression and chronic stress models. Mol Psychiatry 2012; 17:49-61. [PMID: 21079605 PMCID: PMC3117129 DOI: 10.1038/mp.2010.119] [Citation(s) in RCA: 145] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/06/2010] [Revised: 10/05/2010] [Accepted: 10/11/2010] [Indexed: 12/24/2022]
Abstract
The etiology of depression is still poorly understood, but two major causative hypotheses have been put forth: the monoamine deficiency and the stress hypotheses of depression. We evaluate these hypotheses using animal models of endogenous depression and chronic stress. The endogenously depressed rat and its control strain were developed by bidirectional selective breeding from the Wistar-Kyoto (WKY) rat, an accepted model of major depressive disorder (MDD). The WKY More Immobile (WMI) substrain shows high immobility/despair-like behavior in the forced swim test (FST), while the control substrain, WKY Less Immobile (WLI), shows no depressive behavior in the FST. Chronic stress responses were investigated by using Brown Norway, Fischer 344, Lewis and WKY, genetically and behaviorally distinct strains of rats. Animals were either not stressed (NS) or exposed to chronic restraint stress (CRS). Genome-wide microarray analyses identified differentially expressed genes in hippocampi and amygdalae of the endogenous depression and the chronic stress models. No significant difference was observed in the expression of monoaminergic transmission-related genes in either model. Furthermore, very few genes showed overlapping changes in the WMI vs WLI and CRS vs NS comparisons, strongly suggesting divergence between endogenous depressive behavior- and chronic stress-related molecular mechanisms. Taken together, these results posit that although chronic stress may induce depressive behavior, its molecular underpinnings differ from those of endogenous depression in animals and possibly in humans, suggesting the need for different treatments. The identification of novel endogenous depression-related and chronic stress response genes suggests that unexplored molecular mechanisms could be targeted for the development of novel therapeutic agents.
Collapse
|
5
|
Identification of quantitative trait loci for locomotor activation and anxiety using closely related inbred strains. GENES BRAIN AND BEHAVIOR 2009; 7:761-9. [PMID: 19130624 DOI: 10.1111/j.1601-183x.2008.00415.x] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
We carried out a quantitative trait loci (QTL) mapping experiment in two phenotypically similar inbred mouse strains, C57BL/6J and C58/J, using the open-field assay, a well-established model of anxiety-related behavior in rodents. This intercross was initially carried out as a control cross for an ethylnitrosurea mutagenesis mapping study. Surprisingly, although open-field behavior is similar in the two strains, we identified significant QTL in their F2 progeny. Marker regression identified a locus on Chr 8 having associations with multiple open-field measures and a significant interaction between loci on Chr 13 and 17. Together, the Chr 8 locus and the interaction effect form the core set of QTL controlling these behaviors with additional loci on Chr 1 and 6 present in a subset of the behaviors.
Collapse
|
6
|
Activation of peroxisome proliferator-activated receptor gamma (PPARgamma) by rosiglitazone suppresses components of the insulin-like growth factor regulatory system in vitro and in vivo. Endocrinology 2007; 148:903-11. [PMID: 17122083 PMCID: PMC1851001 DOI: 10.1210/en.2006-1121] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Rosiglitazone (Rosi) belongs to the class of thiazolidinediones (TZDs) that are ligands for peroxisome proliferator-activated receptor gamma (PPARgamma). Stimulation of PPARgamma suppresses bone formation and enhances marrow adipogenesis. We hypothesized that activation of PPARgamma down-regulates components of the IGF regulatory system, leading to impaired osteoblast function. Rosi treatment (1 microm) of a marrow stromal cell line (UAMS-33) transfected with empty vector (U-33/c) or with PPARgamma2 (U-33/gamma2) were analyzed by microarray. Rosi reduced IGF-I, IGF-II, IGFBP-4, and the type I and II IGF receptor (IGF1R and IGF2R) expression at 72 h in U-33/gamma2 compared with U-33/c cells (P < 0.01); these findings were confirmed by RT-PCR. Rosi reduced secreted IGF-I from U-33/gamma2 cells by 75% (P < 0.05). Primary marrow stromal cells (MSCs) extracted from adult (8 months) and old (24 months) C57BL/6J (B6) mice were treated with Rosi (1 microm) for 48 h. IGF-I, IGFBP-4, and IGF1R transcripts were reduced in Rosi-treated MSCs compared with vehicle (P < 0.01) and secreted IGF-I was also suppressed (P < 0.05). B6 mice treated with Rosi (20 mg/kg.d) for short duration (i.e. 4 d), and long term (i.e. 7 wk) had reduced serum IGF-I; this was accompanied by markedly suppressed IGF-I transcripts in the liver and peripheral fat of treated animals. To determine whether Rosi affected circulating IGF-I in humans, we measured serum IGF-I, IGFBP-2, and IGFBP-3 at four time points in 50 postmenopausal women randomized to either Rosi (8 mg/d) or placebo. Rosi-treated subjects had significantly lower IGF-I at 8 wk than baseline (-25%, P < 0.05), and at 16 wk their levels were reduced 14% vs. placebo (P = 0.15). We conclude that Rosi suppresses IGF-I expression in bone and liver; these changes could affect skeletal acquisition through endocrine and paracrine pathways.
Collapse
|
7
|
|
8
|
A major quantitative trait locus on chromosome 3 controls colitis severity in IL-10-deficient mice. Proc Natl Acad Sci U S A 2001; 98:13820-5. [PMID: 11707574 PMCID: PMC61125 DOI: 10.1073/pnas.241258698] [Citation(s) in RCA: 90] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Colitic lesions are much more severe in C3H/HeJBir (C3H) than C57BL/6J (B6) mice after 10 backcrosses of a disrupted interleukin-10 (Il10) gene. This study identified cytokine deficiency-induced colitis susceptibility (Cdcs) modifiers by using quantitative trait locus (QTL) analysis. A segregating F(2) population (n = 408) of IL-10-deficient mice was genotyped and necropsied at 6 weeks of age. A major C3H-derived colitogenic QTL (Cdcs1) on chromosome (Chr.) 3 contributed to lesions in both cecum [logarithm of odds ratio (LOD) = 14.6)] and colon (LOD = 26.5) as well as colitis-related phenotypes such as spleen/body weight ratio, mesenteric lymph node/body weight ratio, and secretory IgA levels. Evidence for other C3H QTL on Chr. 1 (Cdcs2) and Chr. 2 (Cdcs3) was obtained. Cdcs1 interacted epistatically or contributed additively with loci on other chromosomes. The resistant B6 background also contributed colitogenic QTL: Cdcs4 (Chr. 8), Cdcs5 (Chr. 17, MHC), and Cdcs6 (Chr. 18). Epistatic interactions between B6 QTL on Chr. 8 and 18 contributing to cecum hyperplasia were particularly striking. In conclusion, a colitogenic susceptibility QTL on Chr. 3 has been shown to exacerbate colitis in combination with modifiers contributed from both parental genomes. The complex nature of interactions among loci in this mouse model system, coupled with separate deleterious contributions from both parental strains, illustrates why detection of human inflammatory bowel disease linkages has proven to be so difficult. A human ortholog of the Chr. 3 QTL, if one exists, would map to Chr. 4q or 1p.
Collapse
|
9
|
Abstract
We describe a general statistical framework for the genetic analysis of quantitative trait data in inbred line crosses. Our main result is based on the observation that, by conditioning on the unobserved QTL genotypes, the problem can be split into two statistically independent and manageable parts. The first part involves only the relationship between the QTL and the phenotype. The second part involves only the location of the QTL in the genome. We developed a simple Monte Carlo algorithm to implement Bayesian QTL analysis. This algorithm simulates multiple versions of complete genotype information on a genomewide grid of locations using information in the marker genotype data. Weights are assigned to the simulated genotypes to capture information in the phenotype data. The weighted complete genotypes are used to approximate quantities needed for statistical inference of QTL locations and effect sizes. One advantage of this approach is that only the weights are recomputed as the analyst considers different candidate models. This device allows the analyst to focus on modeling and model comparisons. The proposed framework can accommodate multiple interacting QTL, nonnormal and multivariate phenotypes, covariates, missing genotype data, and genotyping errors in any type of inbred line cross. A software tool implementing this procedure is available. We demonstrate our approach to QTL analysis using data from a mouse backcross population that is segregating multiple interacting QTL associated with salt-induced hypertension.
Collapse
|
10
|
Abstract
Gene expression microarrays are an innovative technology with enormous promise to help geneticists explore and understand the genome. Although the potential of this technology has been clearly demonstrated, many important and interesting statistical questions persist. We relate certain features of microarrays to other kinds of experimental data and argue that classical statistical techniques are appropriate and useful. We advocate greater attention to experimental design issues and a more prominent role for the ideas of statistical inference in microarray studies.
Collapse
|
11
|
Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. Proc Natl Acad Sci U S A 2001; 98:8961-5. [PMID: 11470909 PMCID: PMC55356 DOI: 10.1073/pnas.161273698] [Citation(s) in RCA: 220] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2001] [Accepted: 05/30/2001] [Indexed: 11/18/2022] Open
Abstract
We introduce a general technique for making statistical inference from clustering tools applied to gene expression microarray data. The approach utilizes an analysis of variance model to achieve normalization and estimate differential expression of genes across multiple conditions. Statistical inference is based on the application of a randomization technique, bootstrapping. Bootstrapping has previously been used to obtain confidence intervals for estimates of differential expression for individual genes. Here we apply bootstrapping to assess the stability of results from a cluster analysis. We illustrate the technique with a publicly available data set and draw conclusions about the reliability of clustering results in light of variation in the data. The bootstrapping procedure relies on experimental replication. We discuss the implications of replication and good design in microarray experiments.
Collapse
|
12
|
Abstract
Spotted cDNA microarrays are emerging as a powerful and cost-effective tool for large-scale analysis of gene expression. Microarrays can be used to measure the relative quantities of specific mRNAs in two or more tissue samples for thousands of genes simultaneously. While the power of this technology has been recognized, many open questions remain about appropriate analysis of microarray data. One question is how to make valid estimates of the relative expression for genes that are not biased by ancillary sources of variation. Recognizing that there is inherent "noise" in microarray data, how does one estimate the error variation associated with an estimated change in expression, i.e., how does one construct the error bars? We demonstrate that ANOVA methods can be used to normalize microarray data and provide estimates of changes in gene expression that are corrected for potential confounding effects. This approach establishes a framework for the general analysis and interpretation of microarray data.
Collapse
|
13
|
Quantitative trait loci for femoral and lumbar vertebral bone mineral density in C57BL/6J and C3H/HeJ inbred strains of mice. J Bone Miner Res 2001; 16:1195-206. [PMID: 11450694 DOI: 10.1359/jbmr.2001.16.7.1195] [Citation(s) in RCA: 206] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Significant differences in vertebral (9%) and femoral (50%) adult bone mineral density (BMD) between the C57BL/6J (B6) and C3H/HeJ (C3H) inbred strains of mice have been subjected to genetic analyses for quantitative trait loci (QTL). Nine hundred eighty-six B6C3F2 females were analyzed to gain insight into the number of genes that regulate peak BMD and their locations. Femurs and lumbar vertebrae were isolated from 4-month-old B6C3F2 females at skeletal maturity and then BMD was determined by peripheral quantitative computed tomography (pQCT). Estimates of BMD heritability were 83% for femurs and 72% for vertebrae. Genomic DNA from F2 progeny was screened for 107 polymerase chain reaction (PCR)-based markers discriminating B6 and C3H alleles on all 19 autosomes. The regression analyses of markers on BMD revealed ten chromosomes (1, 2, 4, 6, 11, 12, 13, 14, 16, and 18) carrying QTLs for femurs and seven chromosomes (1, 4, 7, 9, 11, 14, and 18) carrying QTLs for vertebrae, each with log10 of the odds ratio (LOD) scores of 2.8 or better. The QTLs on chromosomes (Chrs) 2, 6, 12, 13, and 16 were unique to femurs, whereas the QTLs on Chrs 7 and 9 were unique to vertebrae. When the two bone sites had a QTL on the same chromosome, the same marker had the highest, although different, LOD score. A pairwise comparison by analysis of variance (ANOVA) did not reveal significant gene x gene interactions between QTLs for either bone site. BMD variance accounted for by individual QTLs ranged from 1% to 10%. Collectively, the BMD QTLs for femurs accounted for 35.1% and for vertebrae accounted for 23.7 % of the F2 population variances in these bones. When mice were homozygous c3/c3 in the QTL region, 8 of the 10 QTLs increased, while the remaining two QTLs on Chrs 6 and 12 decreased, femoral BMD. Similarly, when mice were homozygous c3/c3 in the QTL region for the vertebrae, five of the seven QTLs increased, while two QTLs on Chrs 7 and 9 decreased, BMD. These findings show the genetic complexity of BMD with multiple genes participating in its regulation. Although 5 of the 12 QTLs are considered to be skeleton-wide loci and commonly affect both femurs and vertebrae, each of the bone sites also exhibited unique QTLs. Thus, the BMD phenotype can be partitioned into its genetic components and the effects of these loci on normal bone biology can be determined. Importantly, the BMD QTLs that we have identified are in regions of the mouse genome that have known human homology, and the QTLs will become useful experimental tools for mechanistic and therapeutic analyses of bone regulatory genes.
Collapse
|
14
|
Abstract
The TallyHo (TH) mouse strain is a newly established model for non-insulin-dependent diabetes mellitus (NIDDM). TH mice show obesity, hyperinsulinemia, hyperlipidemia, and male-limited hyperglycemia. A genetic dissection of the diabetes syndrome has been carried out using male backcross 1 progeny obtained from crosses between (C57BL/6J x TH)F1 and TH mice or (CAST/Ei x TH)F1 and TH mice. A genome-wide scan reveals three quantitative trait loci (QTLs), Tanidd1-3 (TH-associated NIDDM) linked to hyperglycemia. The major QTL (common in both crosses), Tanidd1, maps to chromosome (Chr) 19. Additionally, gene-gene interactions contributing to hyperglycemia have been observed between Tanidd1 and a locus on Chr 18 as well as between Tanidd2 and a locus on Chr 16. The overt hyperglycemia in TH mice is, therefore, likely due to a mutation in a major diabetes susceptibility locus on Chr 19, which interacts with additional genes to lead to an observable phenotype.
Collapse
|
15
|
Abstract
We examine experimental design issues arising with gene expression microarray technology. Microarray experiments have multiple sources of variation, and experimental plans should ensure that effects of interest are not confounded with ancillary effects. A commonly used design is shown to violate this principle and to be generally inefficient. We explore the connection between microarray designs and classical block design and use a family of ANOVA models as a guide to choosing a design. We combine principles of good design and A-optimality to give a general set of recommendations for design with microarrays. These recommendations are illustrated in detail for one kind of experimental objective, where we also give the results of a computer search for good designs.
Collapse
|
16
|
Genome-wide epistatic interaction analysis reveals complex genetic determinants of circadian behavior in mice. Genome Res 2001; 11:959-80. [PMID: 11381025 DOI: 10.1101/gr.171601] [Citation(s) in RCA: 196] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Genetic heterogeneity underlies many phenotypic variations observed in circadian rhythmicity. Continuous distributions in measures of circadian behavior observed among multiple inbred strains of mice suggest that the inherent contributions to variability are polygenic in nature. To identify genetic loci that underlie this complex behavior, we have carried out a genome-wide complex trait analysis in 196 (C57BL/6J X BALB/cJ)F(2) hybrid mice. We have characterized variation in this panel of F(2) mice among five circadian phenotypes: free-running circadian period, phase angle of entrainment, amplitude of the circadian rhythm, circadian activity level, and dissociation of rhythmicity. Our genetic analyses of these phenotypes have led to the identification of 14 loci having significant effects on this behavior, including significant main effect loci that contribute to three of these phenotypic measures: period, phase, and amplitude. We describe an additional locus detection method, genome-wide genetic interaction analysis, developed to identify locus pairs that may interact epistatically to significantly affect phenotype. Using this analysis, we identified two additional pairs of loci that have significant effects on dissociation and activity level; we also detected interaction effects in loci contributing to differences of period, phase, and amplitude. Although single gene mutations can affect circadian rhythms, the analysis of interstrain variants demonstrates that significant genetic complexity underlies this behavior. Importantly, most of the loci that we have detected by these methods map to locations that differ from the nine known clock genes, indicating the presence of additional clock-relevant genes in the mammalian circadian system. These data demonstrate the analytical value of both genome-wide complex trait and epistatic interaction analyses in further understanding complex phenotypes, and point to promising approaches for genetic analysis of such phenotypes in other mammals, including humans.
Collapse
|
17
|
Abstract
Our purpose in this investigation was to determine if we could reduce cage changing frequency without adversely affecting the health of mice. We housed mice at three different cage changing frequencies: 7, 14, and 21 days, each at three different cage ventilation rates: 30, 60 and 100 air changes per hour (ACH), for a total of nine experimental conditions. For each condition, we evaluated the health of 12 breeding pairs and 12 breeding trios of C57BL/6J mice for 7 months. Health was assessed by breeding performance, weanling weight and growth, plasma corticosterone levels, immune function, and histological examination of selected organs. Over a period of 4 months, we monitored the cage microenvironment for ammonia and carbon dioxide concentrations, relative humidity, and temperature one day prior to changing the cage. The relative humidity, carbon dioxide concentrations, and temperature of the cages at all conditions were within acceptable levels. Ammonia concentrations remained below 25 ppm (parts per million) in most cages, but, even at higher concentrations, did not adversely affect the health of mice. Frequency of cage changing had only one significant effect; pup mortality with pair matings was greater at the cage changing frequency of 7 days compared with 14 or 21 days. In addition, pup mortality with pair matings was higher at 30 ACH compared with other ventilation rates. In conclusion, under the conditions of this study, cage changes once every 14 days and ventilation rates of 60 ACH provide optimum conditions for animal health and practical husbandry.
Collapse
|
18
|
Abstract
To investigate the genetic control of salt-induced hypertension, we performed a quantitative trait locus analysis on male mice from a reciprocal backcross between the salt-sensitive C57BL/6J and the normotensive A/J inbred mouse strains after they were provided with water containing 1% salt for 2 weeks. Genome-wide scans performed on these mice and analyzed with a combination of conventional marker-based regressions and a novel simultaneous search for pairs revealed six significant quantitative trait loci associated with salt-induced blood pressure, two of which were interacting loci. These six loci, named Bpq1-6 for blood pressure quantitative trait loci, mapped to D1Mit334, D1Mit14, D4Mit164, D5Mit31, D6Mit15, and D15Mit13. Furthermore, five of these six loci were concordant with hypertension loci in rats, and four were concordant with hypertension loci in humans, suggesting that quantitative trait loci mapping in model organisms can be used to guide the search for human blood pressure genes.
Collapse
|
19
|
Abstract
The rate of evolutionary change associated with a character determines its utility for the reconstruction of phylogenetic history. For a given age of lineage splits, we examine the information content of a character to assess the magnitude and range of an optimal rate of substitution. On the one hand an optimal transition rate must provide sufficiently many character changes to distinguish subclades, whereas on the other hand changes must be sufficiently rare that reversals on a single branch (and hence homoplasy) are uncommon. In this study, we evolve binary characters over three tree topologies with fixed branch lengths, while varying transition rate as a parameter. We use the character state distribution obtained to measure the "information content" of a character given a transition rate. This is done with respect to several criteria-the probability of obtaining the correct tree using parsimony, the probability of infering the correct ancestral state, and Shannon-Weaver and Fisher information measures on the configuration of probability distributions. All of the information measures suggest the intuitive result of the existence of optimal rates for phylogeny reconstruction. This nonzero optimum is less pronounced if one conditions on there having been a change, in which case the parsimony-based results of minimum change being the most informative tends to hold.
Collapse
|
20
|
Abstract
Serum insulin-like growth factor-1 (IGF-1) and femoral bone mineral density (BMD) differ between two inbred strains of mice, C3H/HeJ (C3H) and C57BL/6J (B6), by approximately 30% and 50%, respectively. Similarly, skeletal IGF-1 content, bone formation, mineral apposition, and marrow stromal cell numbers are higher in C3H than in B6 mice. Because IGF-1 and several bone parameters cosegregate, we hypothesize that the serum IGF-1 phenotype has a strong heritable component and that genetic determinants for serum IGF-1 are involved in the regulation of bone mass. We intercrossed (B6 x C3H)F1 hybrids and analyzed 682 F2 female offspring at 4 months of age for serum IGF-1 by radioimmunoassay and femoral BMD by peripheral quantitative computerized tomography (pQCT). Genomic DNA was assayed by polymerase chain reaction (PCR) to determine alleles for 114 Mit markers inherited in F2 mice at average distances of 14 centimorgans (cM) along each chromosome (Chr). Serum IGF-1 levels in the F2 progeny were relatively normal in distribution, but showed a greater range than either progenitor, indicating that serum IGF-1 level is a polygenic trait with an estimated heritability of 52%. Serum IGF-1 correlated with femoral length (r = 0.266, p < 0.0001) and femoral BMD (r = 0.267, p < 0.0001). Whole genome scans for main effects associated with serum IGF-1 levels revealed three significant QTLs (in order of significance) on mouse Chrs 6, 15, and 10. The QTL on Chr 6 showed a significant reduction in IGF-1 associated with increasing C3H allele number, whereas the Chr 15 and Chr 10 loci showed additive effects with increasing C3H allele number. A genome-wide search for interacting marker pairs identified a significant interaction between the Chr 6 QTL and a locus on Chr 11. This interactive effect suggested that when the Chr 11 locus was homozygous for C3H, there was no effect of the Chr 6 locus on serum IGF-1; however, the combination of C3H alleles on Chr 6 with B6 alleles on Chr 11 was associated with reduced serum IGF-1 concentrations. To test this in vivo, we tested congenic mice carrying the Chr 6 QTL region from C3H on a B6 background (B6.C3H-6). Both serum IGF-1 and femoral BMD were significantly lower in female congenic than progenitor B6 mice. In summary, we identified three major QTLs on mouse Chrs 6, 10, and 15, and noted a major locus-locus interaction between Chrs 6 and 11. We named these QTLs IGF-1 serum levels (Igf1sl1 to Igf1sl4). Functional isolation of the Igf1sl1 QTL on Chr 6 for IGF-1 in B6.C3H-6 congenic mice demonstrated effects on both the IGF-1 and BMD phenotypes. The genetic determinants of these Igf1sl QTLs will provide much insight into the regulation of IGF-1 and the subsequent acquisition of peak bone mass.
Collapse
|
21
|
Abstract
Genetic analyses for loci regulating bone mineral density have been conducted in a cohort of F(2) mice derived from intercross matings of (C57BL/6J x CAST/EiJ)F(1) parents. Femurs were isolated from 714 4-month-old females when peak adult bone density had been achieved. Bone mineral density (BMD) data were obtained by peripheral quantitative computed tomography (pQCT), and genotype data were obtained by Polymerase Chain Reaction (PCR) assays for polymorphic markers carried in genomic DNA of each mouse. Genome-wide scans for co-segregation of genetic marker data with high or low BMD revealed loci on eight different chromosomes, four of which (Chrs 1, 5, 13, and 15) achieved conservative statistical criteria for suggestive, significant, or highly significant linkage with BMD. These four quantitative trait loci (QTLs) were confirmed by a linear regression model developed to describe the main effects; none of the loci exhibited significant interaction effects by ANOVA. The four QTLs have been named Bmd1 (Chr 1), Bmd2 (Chr 5), Bmd3 (Chr 13), and Bmd4 (Chr 15). Additive effects were observed for Bmd1, recessive for Bmd3, and dominant effects for Bmd2 and Bmd4. The current large size of the QTL regions (6-->31 cM) renders premature any discussion of candidate genes at this time. Fine mapping of these QTLs is in progress to refine their genetic positions and to evaluate human homologies.
Collapse
|
22
|
Abstract
Hidden Markov models (HMMs) are a class of stochastic models that have proven to be powerful tools for the analysis of molecular sequence data. A hidden Markov model can be viewed as a black box that generates sequences of observations. The unobservable internal state of the box is stochastic and is determined by a finite state Markov chain. The observable output is stochastic with distribution determined by the state of the hidden Markov chain. We present a Bayesian solution to the problem of restoring the sequence of states visited by the hidden Markov chain from a given sequence of observed outputs. Our approach is based on a Monte Carlo Markov chain algorithm that allows us to draw samples from the full posterior distribution of the hidden Markov chain paths. The problem of estimating the probability of individual paths and the associated Monte Carlo error of these estimates is addressed. The method is illustrated by considering a problem of DNA sequence multiple alignment. The special structure for the hidden Markov model used in the sequence alignment problem is considered in detail. In conclusion, we discuss certain interesting aspects of biological sequence alignments that become accessible through the Bayesian approach to HMM restoration.
Collapse
|
23
|
Abstract
Many plant species of agriculture importance are polyploid, having more than two copies of each chromosome per cell. In this paper, we describe statistical methods for genetic map construction in autopolyploid species with particular reference to the use of molecular markers. The first step is to determine the dosage of each DNA fragment (electrophoretic band) from its segregation ratio. Fragments present in a single dose can be used to construct framework maps for individual chromosomes. Fragments present in multiple doses can often be used to link the single chromosome maps into homologous groups and provide additional ordering information. Marker phenotype probabilities were calculated for pairs of markers arranged in different configurations among the homologous chromosomes. These probabilities were used to compute a maximum likelihood estimator of the recombination fraction between pairs of markers. A likelihood ratio test for linkage of multidose markers was derived. The information provided by each configuration and power and sample size considerations are also discussed. A set of 294 RFLP markers scored on 90 plants of the species Saccharum spontaneum L. was used to illustrate the construction of an autopolyploid map. Previous studies conducted on the same data revealed that this species of sugar cane is an autooctaploid with 64 chromosomes arranged into eight homologous groups. The methodology described permitted consolidation of 54 linkage groups into ten homologous groups.
Collapse
|
24
|
Abstract
The genetic basis for differential sensitivity of inbred mice to inflammatory bowel disease induced by dextran sulfate sodium (DSS) is unknown. Susceptible C3H/HeJ were outcrossed to partially resistant C57BL/6J mice. F2 and N2 progeny were phenotyped by evaluating histopathologic lesions in large intestine detected 16 days after a 5-day period of feeding 3.5% DSS. Screening for DSS colitis (Dssc) loci revealed quantitative trait loci (QTL) on Chr 5 (Dssc1) and Chr 2 (Dssc2). These traits contributed additively, explaining 17.5% of the variation in total colonic lesions. Additional QTL on Chr 18 and 1 that collectively explained 11% of the variation in total colon lesions were indicated. In the cecum, only a putative QTL on Chr 11 was associated with pathology (lesion severity) in the cecum. Reduced DSS susceptibility was observed in congenic stocks in which the highly susceptible NOD/Lt strain carried putative resistance alleles from either B6 on Chr 2 or from the highly resistant NON/Lt strain on Chr 9. We conclude that multiple genes control susceptibility to DSS colitis in mice. Possible Dssc candidate genes are discussed in terms of current knowledge of inflammatory bowel disease susceptibility loci in humans.
Collapse
|
25
|
Multigenic and imprinting control of ovarian granulosa cell tumorigenesis in mice. Cancer Res 1998; 58:3694-9. [PMID: 9721880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Spontaneous juvenile ovarian granulosa cell (GC) tumors that occur in young girls are similar to GC carcinomas that develop in SWR-derived inbred mice. We analyzed female offspring from a series of matings among SWR and SJL inbred mice for chromosomal loci underlying tumor susceptibility. Intercross F2 female mice were produced by reciprocal matings of (SWR x SJL)F1 and (SJL x SWR)F1 parents. Tumorigenesis in these F2 mice as well as in SWXJ recombinant inbred and congenic strains of mice derived from SWR and SJL showed significant (P < 0.001) association with Gct1, a dominant susceptibility locus on chromosome (CHR) 4 and with Gct2 on CHR 12. Suggestive (P < 0.01) association was found with Gct3 on CHR 15. A fourth susceptibility locus, Gct4 on CHR X, was demonstrated with a strong parent-of-origin effect associated with the paternal genotype. Imprinting and complex interactions among these four loci combine to establish the probability for GC tumorigenesis in this mouse model.
Collapse
|
26
|
Identification of quantitative trait loci associated with acylsugar accumulation using intraspecific populations of the wild tomato, Lycopersicon pennellii. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 1998; 96:458-67. [PMID: 24710885 DOI: 10.1007/s001220050762] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Lycopersicon pennellii LA716, a wild relative of tomato, is resistant to a number of insect pests due to the accumulation of acylsugars exuded from type IV trichomes. These acylsugars are a class of compounds including both acylglucoses and acylsucroses. Intraspecific populations between L. pennellii LA716 and L. pennellii LA1912, the latter an accession that assorts for low-level acylsugar accumulation, were created to study the inheritance of type IV trichome density, acylsugar accumulation levels, percentage of acylsugars that are acylglucoses, and leaf area. The F2 population was subsequently used to determine genomic regions associated with these traits. The relative proportion of acylglucoses and acylsucroses was found to be largely controlled by a single locus near TG549 on chromosome 3. One locus on chromosome 10 showed significant associations with acylsugar levels. In addition, 1 locus on chromosome 4 showed significant associations with leaf area. Ten additional loci showed modest associations with one or more of the traits examined, 5 of which have been previously reported.
Collapse
|
27
|
Abstract
Trees that describe the ancestry of DNA sequences sampled from a population may differ between loci because of genetic recombination. We seek to understand the relationship between such trees for loci that are linked with non-zero recombination rate. We consider a coalescent process model with recombination, as described by Hudson (1983; 1990). For two loci and a sample size of two sequences, a detailed analysis of this process yields the joint distribution of the two trees (one at each locus). A number of interesting results follow from this analysis, including the distribution of the number of recombination events in the history of the sample. For the general case of m loci and samples of size n, we describe an algorithm for simulating the tree building process. Because analytic results are difficult to obtain in this case, we use simulation to study properties of trees at multiple linked loci such as total tree time and number of recombination events. Copyright 1997 Academic Press
Collapse
|
28
|
Abstract
If loci are randomly distributed on a physical map, the density of markers on a genetic map will be inversely proportional to recombination rate. First, proposed by Mary Lyon, we have used this idea to estimate recombination rates from the Drosophila melanogaster linkage map. These results were compared with results of two other studies that estimated regional recombination rates in D. melanogaster using both physical and genetic maps. The three methods were largely concordant in identifying large-scale genomic patterns of recombination. The marker density method was then applied to the Mus musculus microsatellite linkage map. The distribution of microsatellites provided evidence for heterogeneity in recombination rates. Centromeric regions for several mouse chromosomes had significantly greater numbers of markers than expected, suggesting that recombination rates were lower in these regions. In contrast, most telomeric regions contained significantly fewer markers than expected. This indicates that recombination rates are elevated at the telomeres of many mouse chromosomes and is consistent with a comparison of the genetic and cytogenetic maps in these regions. The density of markers on a genetic map may provide a generally useful way to estimate regional recombination rates in species for which genetic, but not physical, maps are available.
Collapse
|
29
|
Biases in amino acid replacement matrices and alignment scores due to rate heterogeneity. J Comput Biol 1996; 3:307-18. [PMID: 8811489 DOI: 10.1089/cmb.1996.3.307] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Empirically derived amino acid replacement matrices are widely used in sequence comparison and database searches. We consider an extension of the usual Markov process model of protein evolution that admits site to site rate heterogeneity and demonstrates that rate heterogeneity can introduce a bias in estimated replacement probabilities and the corresponding alignment scores derived from these matrices. We suggest an approach to obtain unbiased estimates of replacement probabilities and alignment scores and derive the details for the case where rates are assumed to vary according to a gamma distribution.
Collapse
|
30
|
Abstract
The method of Hidden Markov Models is used to allow for unequal and unknown evolutionary rates at different sites in molecular sequences. Rates of evolution at different sites are assumed to be drawn from a set of possible rates, with a finite number of possibilities. The overall likelihood of phylogeny is calculated as a sum of terms, each term being the probability of the data given a particular assignment of rates to sites, times the prior probability of that particular combination of rates. The probabilities of different rate combinations are specified by a stationary Markov chain that assigns rate categories to sites. While there will be a very large number of possible ways of assigning rates to sites, a simple recursive algorithm allows the contributions to the likelihood from all possible combinations of rates to be summed, in a time proportional to the number of different rates at a single site. Thus with three rates, the effort involved is no greater than three times that for a single rate. This "Hidden Markov Model" method allows for rates to differ between sites and for correlations between the rates of neighboring sites. By summing over all possibilities it does not require us to know the rates at individual sites. However, it does not allow for correlation of rates at nonadjacent sites, nor does it allow for a continuous distribution of rates over sites. It is shown how to use the Newton-Raphson method to estimate branch lengths of a phylogeny and to infer from a phylogeny what assignment of rates to sites has the largest posterior probability. An example is given using beta-hemoglobin DNA sequences in eight mammal species; the regions of high and low evolutionary rates are inferred and also the average length of patches of similar rates.
Collapse
|
31
|
Abstract
The problem of detecting minor quantitative trait loci (QTL) responsible for genetic variation not explained by major QTL is of importance in the complete dissection of quantitative characters. Two extensions of the permutation-based method for estimating empirical threshold values are presented. These methods, the conditional empirical threshold (CET) and the residual empirical threshold (RET), yield critical values that can be used to construct tests for the presence of minor QTL effects while accounting for effects of known major QTL. The CET provides a completely nonparametric test through conditioning on markers linked to major QTL. It allows for general nonadditive interactions among QTL, but its practical application is restricted to regions of the genome that are unlinked to the major QTL. The RET assumes a structural model for the effect of major QTL, and a threshold is constructed using residuals from this structural model. The search space for minor QTL is unrestricted, and RET-based tests may be more powerful than the CET-based test when the structural model is approximately true.
Collapse
|
32
|
Abstract
A class of statistical tests based on molecular polymorphism data is studied to determine size and power properties. The class includes Tajima's D statistic as well as the D* and F* tests proposed by Fu and Li. A new method of constructing critical values for these tests is described. Simulations indicate that Tajima's test is generally most powerful against the alternative hypotheses of selective sweep, population bottleneck, and population subdivision, among tests within this class. However, even Tajima's test can detect a selective sweep or bottleneck only if it has occurred within a specific interval of time in the recent past or population subdivision only when it has persisted for a very long time. For greatest power against the particular alternatives studied here, it is better to sequence more alleles than more sites.
Collapse
|
33
|
Estimation and reliability of molecular sequence alignments. Biometrics 1995; 51:100-13. [PMID: 7766767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The problem of estimating the relatedness of a pair of biological sequences is addressed. A stochastic model of sequence evolution is described that allows insertion and deletion as well as replacement of amino acid residues (or substitution of nucleotides) over time. An expectation-maximization (EM) algorithm that obtains maximum likelihood estimates of the model parameters is introduced. The method assumes that the sequences are related by descent from a common ancestor but the alignment (i.e., the precise evolutionary correspondence between residues in each sequence) is unknown. Results from the E-step of the EM algorithm are used to assess the likelihood that any two residues are related by direct descent from a common ancestor.
Collapse
|
34
|
Abstract
The detection of genes that control quantitative characters is a problem of great interest to the genetic mapping community. Methods for locating these quantitative trait loci (QTL) relative to maps of genetic markers are now widely used. This paper addresses an issue common to all QTL mapping methods, that of determining an appropriate threshold value for declaring significant QTL effects. An empirical method is described, based on the concept of a permutation test, for estimating threshold values that are tailored to the experimental data at hand. The method is demonstrated using two real data sets derived from F(2) and recombinant inbred plant populations. An example using simulated data from a backcross design illustrates the effect of marker density on threshold values.
Collapse
|
35
|
Abstract
We introduce a general class of models for sequence evolution that includes network phylogenies. Networks, a generalization of strictly tree-like phylogenies, are proposed to model situations where multiple lineages contribute to the observed sequences. An algorithm to compute the probability distribution of binary character-state configurations is presented and statistical inference for this model is developed in a likelihood framework. A stepwise procedure based on likelihood ratios is used to explore the space of models. Starting with a star phylogeny, new splits (nontrivial bipartitions of the sequence set) are successively added to the model until no significant change in the likelihood is observed. A novel feature of our approach is that the new splits are not necessarily constrained to be consistent with a treelike mode of evolution. The fraction of invariable sites is estimated by maximum likelihood simultaneously with other model parameters and is essential to obtain a good fit to the data. The effect of finite sequence length on the inference methods is discussed. Finally, we provide an illustrative example using aligned VP1 genes from the foot and mouth disease viruses (FMDV). The different serotypes of the FMDV exhibit a range of treelike and network evolutionary relationships.
Collapse
|
36
|
|
37
|
Phylogenetic inference: linear invariants and maximum likelihood. Biometrics 1993; 49:543-55. [PMID: 7690255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
We develop a new statistical method for inferring phylogenies, based on a likelihood ratio test. This method does not require parameter constraints but does require identical evolutionary processes in the sites considered. Another method of phylogenetic inference is the method of linear invariants, described by Cavender (1989, Molecular Biology and Evolution 6, 301-316), based on a notion of Lake (1987, Molecular Biology and Evolution 4, 167-191). We describe a sound mathematical basis for the use of linear invariants. We show that the validity of the method requires parameter constraints, but does not require that the evolutionary processes in differing sites be identical. We show that the method of linear invariants is asymptotically equivalent to a less powerful version of our likelihood ratio test, and is thus essentially a maximum likelihood technique.
Collapse
|
38
|
Abstract
Genetic linkage maps based on restriction fragment length polymorphisms are useful for many purposes; however, different populations are required to fulfill different objectives. Clones from the linkage map(s) are subsequently probed onto populations developed for special purposes such as gene tagging. Therefore, clones contained on the initial map(s) must be polymorphic on a wide range of genotypes to have maximum utility. The objectives of this research were to (i) calculate polymorphism information content values of 51 low-copy DNA clones and (ii) use the resulting values to choose potential mapping parents. Polymorphism information content was calculated using gene diversity by classifying restriction fragment patterns on a diverse set of 18 wheat genotypes. Combinations of potential parents were then compared by examining both the proportion of polymorphic clones and the likelihood that those mapped clones would give a polymorphism when used on other populations. Genotype pairs were identified that would map more highly informative DNA clones compared with a population derived from the most polymorphic potential parents. The methodologies used to characterize clones and rank potential parents should be applicable to other species and types of markers as well.Key words: restriction fragment length polymorphism, mapping, Triticum aestivum.
Collapse
|
39
|
Abstract
A pooled-sample approach to the construction of high-resolution genetic maps is described. The strategy depends on the existence of an easily selectable target locus and the ability to produce large segregating populations. If these requirements are met, the pooled-sample mapping approach allows tightly linked markers (e.g., restriction fragment length polymorphisms) to be mapped relative to the target with a great economy of effort. The recombination fractions among loci can be estimated by the maximum likelihood method and a simple approximate estimator is derived. The order of loci is deduced using a Bayesian statistical framework to yield posterior probabilities for all possible orderings of a marker set. Optimal pooling strategies and the effects of misclassification of selected individuals are discussed and studied by computer simulation. The feasibility of this method is demonstrated by the high-resolution mapping of a region on chromosome 5 of tomato that contains a gene regulating fruit ripening.
Collapse
|
40
|
Abstract
In this paper we describe a method for the statistical reconstruction of a large DNA sequence from a set of sequenced fragments. We assume that the fragments have been assembled and address the problem of determining the degree to which the reconstructed sequence is free from errors, i.e., its accuracy. A consensus distribution is derived from the assembled fragment configuration based upon the rates of sequencing errors in the individual fragments. The consensus distribution can be used to find a minimally redundant consensus sequence that meets a prespecified confidence level, either base by base or across any region of the sequence. A likelihood-based procedure for the estimation of the sequencing error rates, which utilizes an iterative EM algorithm, is described. Prior knowledge of the error rates is easily incorporated into the estimation procedure. The methods are applied to a set of assembled sequence fragments from the human G6PD locus. We close the paper with a brief discussion of the relevance and practical implications of this work.
Collapse
|
41
|
Abstract
The objective of this work is to describe sample-size calculations for the inference of a nonzero central branch length in an unrooted four-species phylogeny. Attention is restricted to independent binary characters, such as might be obtained from an alignment of the purine-pyrimidine sequences of a nucleic acid molecule. A statistical test based on a multinomial model for character-state configurations is described. The importance of including invariable sites in models for sequence change is demonstrated, and their effect on sample size is quantified. The methods are applied to a four-species alignment of small-subunit rRNA sequences derived from two archaebacteria, a eubacteria and a eukaryote. We conclude that the information in these sequences is not sufficient to resolve the branching order of this tree. Estimates of the number of aligned nucleotide positions required to provide a reasonably powerful test are given.
Collapse
|
42
|
Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. Mol Biol Evol 1991; 8:128-43. [PMID: 2002762 DOI: 10.1093/oxfordjournals.molbev.a040633] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Likelihood methods and methods using invariants are procedures for inferring the evolutionary relationships among species through statistical analysis of nucleic acid sequences. A likelihood-ratio test may be used to determine the feasibility of any tree for which the maximum likelihood can be computed. The method of linear invariants described by Cavender, which includes Lake's method of evolutionary parsimony as a special case, is essentially a form of the likelihood-ratio method. In the case of a small number of species (four or five), these methods may be used to find a confidence set for the correct tree. An exact version of Lake's asymptotic chi 2 test has been mentioned by Holmquist et al. Under very general assumptions, a one-sided exact test is appropriate, which greatly increases power.
Collapse
|
43
|
Abstract
A statistical analysis of physical map data for eight restriction enzymes covering nearly the entire genome of E. coli is presented. The methods of analysis are based on a top-down modeling approach which requires no knowledge of the statistical properties of the base sequence. For most enzymes, the distribution of mapped sites is found to be fairly homogeneous. Some heterogeneity in the distribution of sites is observed for the enzymes Pstl and HindIII. In addition, BamHI sites are found to be more evenly dispersed than we would expect for random placement and we speculate on a possible mechanism. A consistent departure from a uniform distribution, observed for each of the eight enzymes, is found to be due to a lack of closely spaced sites. We conclude from our analysis that this departure can be accounted for by deficiencies in the physical map data rather than non-random placement of actual restriction sites. Estimates of the numbers of sites missing from the map are given, based both on the map data itself and on the site frequencies in a sample of sequenced E. coli DNA. We conclude that 5 to 15% of the mapped sites represent multiple sites in the DNA sequence.
Collapse
|
44
|
Abstract
The composition of naturally occurring DNA sequences is often strikingly heterogeneous. In this paper, the DNA sequence is viewed as a stochastic process with local compositional properties determined by the states of a hidden Markov chain. The model used is a discrete-state, discrete-outcome version of a general model for non-stationary time series proposed by Kitagawa (1987). A smoothing algorithm is described which can be used to reconstruct the hidden process and produce graphic displays of the compositional structure of a sequence. The problem of parameter estimation is approached using likelihood methods and an EM algorithm for approximating the maximum likelihood estimate is derived. The methods are applied to sequences from yeast mitochondrial DNA, human and mouse mitochondrial DNAs, a human X chromosomal fragment and the complete genome of bacteriophage lambda.
Collapse
|