1
|
Lucia-Sanz A, Peng S, Leung CY(J, Gupta A, Meyer JR, Weitz JS. Inferring strain-level mutational drivers of phage-bacteria interaction phenotypes. bioRxiv 2024:2024.01.08.574707. [PMID: 38260415 PMCID: PMC10802490 DOI: 10.1101/2024.01.08.574707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
The enormous diversity of bacteriophages and their bacterial hosts presents a significant challenge to predict which phages infect a focal set of bacteria. Infection is largely determined by complementary -and largely uncharacterized- genetics of adsorption, injection, and cell take-over. Here we present a machine learning (ML) approach to predict phage-bacteria interactions trained on genome sequences of and phenotypic interactions amongst 51 Escherichia coli strains and 45 phage λ strains that coevolved in laboratory conditions for 37 days. Leveraging multiple inference strategies and without a priori knowledge of driver mutations, this framework predicts both who infects whom and the quantitative levels of infections across a suite of 2,295 potential interactions. The most effective ML approach inferred interaction phenotypes from independent contributions from phage and bacteria mutations, predicting phage host range with 86% mean classification accuracy while reducing the relative error in the estimated strength of the infection phenotype by 40%. Further, transparent feature selection in the predictive model revealed 18 of 176 phage λ and 6 of 18 E. coli mutations that have a significant influence on the outcome of phage-bacteria interactions, corroborating sites previously known to affect phage λ infections, as well as identifying mutations in genes of unknown function not previously shown to influence bacterial resistance. While the genetic variation studied was limited to a focal, coevolved phage-bacteria system, the method's success at recapitulating strain-level infection outcomes provides a path forward towards developing strategies for inferring interactions in non-model systems, including those of therapeutic significance.
Collapse
Affiliation(s)
- Adriana Lucia-Sanz
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
| | | | | | - Animesh Gupta
- Department of Physics, University of California San Diego, La Jolla, California, USA
| | - Justin R. Meyer
- Department of Ecology, Behavior and Evolution, University of California San Diego, La Jolla, California, USA
| | - Joshua S. Weitz
- Department of Biology, University of Maryland, College Park, MD, USA
- Department of Physics, University of Maryland, College Park, MD, USA
- Institut d’Biologie, École Normale Supérieure, Paris, France
| |
Collapse
|
2
|
Alves K, Brito LF, Sargolzaei M, Schenkel FS. Genome-wide association studies for epistatic genetic effects on fertility and reproduction traits in Holstein cattle. J Anim Breed Genet 2023; 140:624-637. [PMID: 37350080 DOI: 10.1111/jbg.12813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 05/29/2023] [Accepted: 06/09/2023] [Indexed: 06/24/2023]
Abstract
Non-additive genetic effects are well known to play an important role in the phenotypic expression of complex traits, such as fertility and reproduction. In this study, a genome scan was performed using 41,640 single nucleotide polymorphism (SNP) markers to identify genomic regions associated with epistatic (additive-by-additive) effects in fertility and reproduction traits in Holstein cattle. Nine fertility and reproduction traits were analysed on 5825 and 6090 Holstein heifers and cows with phenotypes and genotypes, respectively. The Marginal Epistasis Test (MAPIT) was used to identify SNPs with significant marginal epistatic effects at a chromosome-wise 5% and 10% false discovery rate (FDR) level. The -log10 (p) values were adjusted by the genomic inflation factor (λ) to correct for the potential bias on the p-values and minimize the possible effects of population stratification. After adjustments, MAPIT enabled the identification of genomic regions with significant marginal epistatic effects for heifers on BTA5 for age at first insemination, BTA3 and BTA24 for non-return rate (NRR); BTA16 and BTA28 for gestation length (GL); BTA1, BTA4 and BTA17 for stillbirth (SB). For the cow traits, MAPIT enabled the identification of regions on BTA11 for GL, BTA11 and BTA16 for SB and BTA19 for calf size (CZ). An additional approach for mapping epistasis in a genome-wide association study was also proposed, in which the genome scan was performed using estimates of epistatic values as the input pseudo-phenotypes, computed using single-trait animal models. Significant SNPs were identified at the chromosome-wise 5% and 10% FDR levels for all traits. For the heifer traits, significant regions were found on BTA7 for AFS; BTA12 for NRR; BTA14 and BTA19 for GL; BTA19 for calving ease (CE); BTA5, BTA24, BTA25 and in the X chromosome for SB; BTA23 and in the X chromosome for CZ and in the X chromosome for the number of services (NS). For the cow traits, significant regions were found on BTA29 and in the X chromosome for NRR, BTA11, BTA16 and in the X chromosome for SB, BTA2 for GL, BTA28 for CZ, BTA19 for calving to first insemination, and in the X chromosome for NS and first insemination to conception. The results suggest that the epistatic genetic effects are likely due to many loci with a small effect rather than few loci with a large effect and/or a single SNP marker alone do not capture the epistatic effects well. The genomic architecture of fertility and reproduction traits is complex, and these results should be validated in independent dairy cattle populations and using alternative statistical models.
Collapse
Affiliation(s)
- Kristen Alves
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, Ontario, Canada
- Bayer CropScience Inc., Guelph, Ontario, Canada
| | - Luiz F Brito
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, Ontario, Canada
- Department of Animal Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Mehdi Sargolzaei
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, Ontario, Canada
| | - Flavio S Schenkel
- Centre for Genetic Improvement of Livestock, Department of Animal Biosciences, University of Guelph, Guelph, Ontario, Canada
| |
Collapse
|
3
|
Reinwald M, Silva JT, Mueller NJ, Fortún J, Garzoni C, de Fijter JW, Fernández-Ruiz M, Grossi P, Aguado JM. ESCMID Study Group for Infections in Compromised Hosts (ESGICH) Consensus Document on the safety of targeted and biological therapies: an infectious diseases perspective (Intracellular signaling pathways: tyrosine kinase and mTOR inhibitors). Clin Microbiol Infect 2018; 24 Suppl 2:S53-S70. [PMID: 29454849 DOI: 10.1016/j.cmi.2018.02.009] [Citation(s) in RCA: 110] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 02/08/2018] [Accepted: 02/11/2018] [Indexed: 12/11/2022]
Abstract
BACKGROUND The present review is part of the European Society of Clinical Microbiology and Infectious Diseases (ESCMID) Study Group for Infections in Compromised Hosts (ESGICH) Consensus Document on the safety of targeted and biologic therapies. AIMS To review, from an infectious diseases perspective, the safety profile of therapies targeting different intracellular signaling pathways and to suggest preventive recommendations. SOURCES Computer-based Medline searches with MeSH terms pertaining to each agent or therapeutic family. CONTENT Although BCR-ABL tyrosine kinase inhibitors modestly increase the overall risk of infection, dasatinib has been associated with cytomegalovirus and hepatitis B virus reactivation. BRAF/MEK kinase inhibitors do not significantly affect infection susceptibility. The effect of Bruton tyrosine kinase inhibitors (ibrutinib) among patients with B-cell malignancies is difficult to distinguish from that of previous immunosuppression. However, cases of Pneumocystis jirovecii pneumonia (PCP), invasive fungal infection and progressive multifocal leukoencephalopathy have been occasionally reported. Because phosphatidylinositol-3-kinase inhibitors (idelalisib) may predispose to opportunistic infections, anti-Pneumocystis prophylaxis and prevention strategies for cytomegalovirus are recommended. No increased rates of infection have been observed with venetoclax (antiapoptotic protein Bcl-2 inhibitor). Therapy with Janus kinase inhibitors markedly increases the incidence of infection. Pretreatment screening for chronic hepatitis B virus and latent tuberculosis infection must be performed, and anti-Pneumocystis prophylaxis should be considered for patients with additional risk factors. Cancer patients receiving mTOR inhibitors face an increased incidence of overall infection, especially those with additional risk factors (prior therapies or delayed wound healing). IMPLICATIONS Specific preventive approaches are warranted in view of the increased risk of infection associated with some of the reviewed agents.
Collapse
Affiliation(s)
- M Reinwald
- Department of Hematology and Oncology, Klinikum Brandenburg, Medizinische Hochschule Brandenburg Theodor Fontane, Brandenburg an der Havel, Germany.
| | - J T Silva
- Department of Infectious Diseases, University Hospital of Badajoz, Fundación para la Formación e Investigación de los Profesionales de la Salud (FundeSalud), Badajoz, Spain
| | - N J Mueller
- Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - J Fortún
- Department of Infectious Diseases, Hospital Universitario 'Ramon y Cajal', Madrid, Spain; Spanish Network for Research in Infectious Diseases (REIPI RD16/0016), Instituto de Salud Carlos III, Madrid, Spain
| | - C Garzoni
- Department of Internal Medicine, Clinica Luganese, Lugano, Switzerland; Department of Infectious Disease, Clinica Luganese, Lugano, Switzerland
| | - J W de Fijter
- Department of Medicine, Division of Nephrology, Leiden University Medical Centre, Leiden, The Netherlands
| | - M Fernández-Ruiz
- Unit of Infectious Diseases, Hospital Universitario '12 de Octubre', Instituto de Investigación Hospital '12 de Octubre' (i+12), School of Medicine, Universidad Complutense, Madrid, Spain; Spanish Network for Research in Infectious Diseases (REIPI RD16/0016), Instituto de Salud Carlos III, Madrid, Spain
| | - P Grossi
- Department of Infectious and Tropical Diseases, University of Insubria, Ospedale di Circolo-Fondazioni Macchi, Varese, Italy
| | - J M Aguado
- Unit of Infectious Diseases, Hospital Universitario '12 de Octubre', Instituto de Investigación Hospital '12 de Octubre' (i+12), School of Medicine, Universidad Complutense, Madrid, Spain; Spanish Network for Research in Infectious Diseases (REIPI RD16/0016), Instituto de Salud Carlos III, Madrid, Spain
| |
Collapse
|
4
|
DeLisi LE. A Case for Returning to Multiplex Families for Further Understanding the Heritability of Schizophrenia: A Psychiatrist's Perspective. Mol Neuropsychiatry 2016; 2:15-9. [PMID: 27606317 PMCID: PMC4996023 DOI: 10.1159/000442820] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 11/26/2015] [Indexed: 11/19/2022]
Abstract
The genetic mechanism for schizophrenia still remains unknown despite decades of research. A tremendous amount of investigator time and effort has gone into ascertainment of clinical samples for genetic studies over the years. Most recently, a large international effort of unprecedented collaboration has occurred to combine data worldwide in pursuit of uncovering the relevant genetic risk factors. However, in the process, the use of multiplex families to understand the genetics has waned, and it has been presumed that large resources of unrelated patients and controls are more efficient to find risk alleles than families. This commentary is a call to return to the use of this largely abandoned resource for further understanding the underlying biological mechanism of this serious mental illness.
Collapse
Affiliation(s)
- Lynn E. DeLisi
- VA Boston Healthcare System, Brockton, Mass., and Harvard Medical School, Boston, Mass., USA
| |
Collapse
|
5
|
Rao TJ, Province MA. A Framework for Interpreting Type I Error Rates from a Product-Term Model of Interaction Applied to Quantitative Traits. Genet Epidemiol 2015; 40:144-53. [PMID: 26659945 PMCID: PMC4738444 DOI: 10.1002/gepi.21944] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Revised: 10/05/2015] [Accepted: 10/26/2015] [Indexed: 11/11/2022]
Abstract
Adequate control of type I error rates will be necessary in the increasing genome-wide search for interactive effects on complex traits. After observing unexpected variability in type I error rates from SNP-by-genome interaction scans, we sought to characterize this variability and test the ability of heteroskedasticity-consistent standard errors to correct it. We performed 81 SNP-by-genome interaction scans using a product-term model on quantitative traits in a sample of 1,053 unrelated European Americans from the NHLBI Family Heart Study, and additional scans on five simulated datasets. We found that the interaction-term genomic inflation factor (lambda) showed inflation and deflation that varied with sample size and allele frequency; that similar lambda variation occurred in the absence of population substructure; and that lambda was strongly related to heteroskedasticity but not to minor non-normality of phenotypes. Heteroskedasticity-consistent standard errors narrowed the range of lambda, with HC3 outperforming HC0, but in individual scans tended to create new P-value outliers related to sparse two-locus genotype classes. We explain the lambda variation as a result of non-independence of test statistics coupled with stochastic biases in test statistics due to a failure of the test to reach asymptotic properties. We propose that one way to interpret lambda is by comparison to an empirical distribution generated from data simulated under the null hypothesis and without population substructure. We further conclude that the interaction-term lambda should not be used to adjust test statistics and that heteroskedasticity-consistent standard errors come with limitations that may outweigh their benefits in this setting.
Collapse
Affiliation(s)
- Tara J Rao
- Division of Statistical Genomics, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | - Michael A Province
- Division of Statistical Genomics, Department of Genetics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| |
Collapse
|
6
|
Abstract
Genome-wide association studies (GWASs) have become the focus of the statistical analysis of complex traits in humans, successfully shedding light on several aspects of genetic architecture and biological aetiology. Single-nucleotide polymorphisms (SNPs) are usually modelled as having additive, cumulative and independent effects on the phenotype. Although evidently a useful approach, it is often argued that this is not a realistic biological model and that epistasis (that is, the statistical interaction between SNPs) should be included. The purpose of this Review is to summarize recent directions in methodology for detecting epistasis and to discuss evidence of the role of epistasis in human complex trait variation. We also discuss the relevance of epistasis in the context of GWASs and potential hazards in the interpretation of statistical interaction terms.
Collapse
|
7
|
Kim J, Lee T, Lee HJ, Kim H. Genotype-environment interactions for quantitative traits in Korea Associated Resource (KARE) cohorts. BMC Genet 2014; 15:18. [PMID: 24491211 PMCID: PMC3922112 DOI: 10.1186/1471-2156-15-18] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Accepted: 01/27/2014] [Indexed: 01/11/2023] Open
Abstract
Background Due to the lack of statistical power and confounding effects of population structure in human population data, genotype-environment interaction studies have not yielded promising results and have provided only limited knowledge for exploring how genotype and environmental factors interact to in their influence onto risk. Results We analyzed 49 human quantitative traits in 7,170 unrelated Korean individuals on 326,262 autosomal single nucleotide polymorphisms (SNPs) collected from the KARE (Korean Association Resource) project, and we estimated the statistically significant proportion of variance that could be explained by genotype-area interactions in the supra-iliac skinfold thickness trait (hGE2 = 0.269 and P = 0.00032), which is related to abdominal obesity. Data suggested that the genotypes could have different effects on the phenotype (supra-iliac skinfold thickness) in different environmental settings (rural vs. urban areas). We then defined the genotype groups of individuals with similar genetic profiles based on the additive genetic relationships among individuals using SNPs. We observed the norms of reaction, and the differential phenotypic response of a genotype to a change in environmental exposure. Interestingly, we also found that the gene clusters responsible for cell-cell and cell-extracellular matrix interactions were enriched significantly for genotype-area interaction. Conclusions This significant heritability estimate of genotype-environment interactions will lead to conceptual advances in our understanding of the mechanisms underlying genotype-environment interactions, and could be ultimately applied to personalized preventative treatments based on environmental exposures.
Collapse
Affiliation(s)
| | | | - Hyun-Jeong Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-742, Republic of Korea.
| | | |
Collapse
|
8
|
Liu Y, Li X, Liu Z, Chen L, Ng MK. Construction and analysis of single nucleotide polymorphism-single nucleotide polymorphism interaction networks. IET Syst Biol 2013; 7:170-81. [PMID: 24067417 PMCID: PMC8687305 DOI: 10.1049/iet-syb.2012.0055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Revised: 02/17/2013] [Accepted: 03/25/2013] [Indexed: 11/19/2022] Open
Abstract
The study of gene regulatory network and protein-protein interaction network is believed to be fundamental to the understanding of molecular processes and functions in systems biology. In this study, the authors are interested in single nucleotide polymorphism (SNP) level and construct SNP-SNP interaction network to understand genetic characters and pathogenetic mechanisms of complex diseases. The authors employ existing methods to mine, model and evaluate a SNP sub-network from SNP-SNP interactions. In the study, the authors employ the two SNP datasets: Parkinson disease and coronary artery disease to demonstrate the procedure of construction and analysis of SNP-SNP interaction networks. Experimental results are reported to demonstrate the procedure of construction and analysis of such SNP-SNP interaction networks can recover some existing biological results and related disease genes.
Collapse
Affiliation(s)
- Yang Liu
- Bioinformatics ProgramBoston University24 Cummington StreetBostonMA02215USA
| | - Xutao Li
- Department of Computer ScienceShenzhen Graduate School, Harbin Institute of TechnologyPeople's Republic of China
| | - Zhiping Liu
- Shanghai Institutes for Biological Sciences, Chinese Academy of SciencesShanghaiPeople's Republic of China
| | - Luonan Chen
- Shanghai Institutes for Biological Sciences, Chinese Academy of SciencesShanghaiPeople's Republic of China
| | - Michael K. Ng
- Department of MathematicsHong Kong Baptist UniversityKowloon TongHong Kong
| |
Collapse
|
9
|
Lanfear DE, Sunkara B, Li J, Rastogi S, Gupta RC, Padhukasahasram B, Williams LK, Sabbah HN. Association of genetic variation with gene expression and protein abundance within the natriuretic peptide pathway. J Cardiovasc Transl Res 2013; 6:826-33. [PMID: 23835779 DOI: 10.1007/s12265-013-9491-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2013] [Accepted: 06/17/2013] [Indexed: 01/09/2023]
Abstract
The natriuretic peptide (NP) system is a critical physiologic pathway in heart failure with wide individual variability in functioning. We investigated the genetic component by testing the association of single nucleotide polymorphisms (SNP) with RNA and protein expression. Samples of DNA, RNA, and tissue from human kidney (n = 103) underwent genotyping, RT-PCR, and protein quantitation (in lysates), for four candidate genes [NP receptor 1 (NPR1), NPR2, and NPR3 and membrane metalloendopeptidase]. The association of genetic variation with expression was tested using linear regression for individual SNPs, and a principal components (PC) method for overall gene variation. Eleven SNPs in NPR2 were significantly associated with protein expression (false discovery rate ≤0.05), but not RNA quantity. RNA and protein quantity correlated poorly with each other. The PC analysis showed only NPR2 as significant. Assessment of the clinical impact of NPR2 genetic variation is needed.
Collapse
|
10
|
Pinelli M, Scala G, Amato R, Cocozza S, Miele G. Simulating gene-gene and gene-environment interactions in complex diseases: Gene-Environment iNteraction Simulator 2. BMC Bioinformatics 2012; 13:132. [PMID: 22698142 PMCID: PMC3538511 DOI: 10.1186/1471-2105-13-132] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2011] [Accepted: 05/10/2012] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND The analysis of complex diseases is an important problem in human genetics. Because multifactoriality is expected to play a pivotal role, many studies are currently focused on collecting information on the genetic and environmental factors that potentially influence these diseases. However, there is still a lack of efficient and thoroughly tested statistical models that can be used to identify implicated features and their interactions. Simulations using large biologically realistic data sets with known gene-gene and gene-environment interactions that influence the risk of a complex disease are a convenient and useful way to assess the performance of statistical methods. RESULTS The Gene-Environment iNteraction Simulator 2 (GENS2) simulates interactions among two genetic and one environmental factor and also allows for epistatic interactions. GENS2 is based on data with realistic patterns of linkage disequilibrium, and imposes no limitations either on the number of individuals to be simulated or on number of non-predisposing genetic/environmental factors to be considered. The GENS2 tool is able to simulate gene-environment and gene-gene interactions. To make the Simulator more intuitive, the input parameters are expressed as standard epidemiological quantities. GENS2 is written in Python language and takes advantage of operators and modules provided by the simuPOP simulation environment. It can be used through a graphical or a command-line interface and is freely available from http://sourceforge.net/projects/gensim. The software is released under the GNU General Public License version 3.0. CONCLUSIONS Data produced by GENS2 can be used as a benchmark for evaluating statistical tools designed for the identification of gene-gene and gene-environment interactions.
Collapse
Affiliation(s)
- Michele Pinelli
- Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università di Napoli “Federico II” - Università di Salerno, Italy
- Dipartimento di Biologia e Patologia Cellulare e Molecolare “L. Califano”, Università di Napoli “Federico II”, Napoli, Italy
| | - Giovanni Scala
- Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università di Napoli “Federico II” - Università di Salerno, Italy
- Dipartimento di Scienze Fisiche, Università di Napoli “Federico II”, Complesso Universitario di Monte S.Angelo, Napoli, Italy
| | - Roberto Amato
- Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università di Napoli “Federico II” - Università di Salerno, Italy
- Dipartimento di Scienze Fisiche, Università di Napoli “Federico II”, Complesso Universitario di Monte S.Angelo, Napoli, Italy
- INFN Sezione di Napoli, Napoli, Italy
| | - Sergio Cocozza
- Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università di Napoli “Federico II” - Università di Salerno, Italy
- Dipartimento di Biologia e Patologia Cellulare e Molecolare “L. Califano”, Università di Napoli “Federico II”, Napoli, Italy
| | - Gennaro Miele
- Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università di Napoli “Federico II” - Università di Salerno, Italy
- Dipartimento di Scienze Fisiche, Università di Napoli “Federico II”, Complesso Universitario di Monte S.Angelo, Napoli, Italy
- INFN Sezione di Napoli, Napoli, Italy
| |
Collapse
|
11
|
Abstract
Epistatic genetic interactions are key for understanding the genetic contribution to complex traits. Epistasis is always defined with respect to some trait such as growth rate or fitness. Whereas most existing epistasis screens explicitly test for a trait, it is also possible to implicitly test for fitness traits by searching for the over- or under-representation of allele pairs in a given population. Such analysis of imbalanced allele pair frequencies of distant loci has not been exploited yet on a genome-wide scale, mostly due to statistical difficulties such as the multiple testing problem. We propose a new approach called Imbalanced Allele Pair frequencies (ImAP) for inferring epistatic interactions that is exclusively based on DNA sequence information. Our approach is based on genome-wide SNP data sampled from a population with known family structure. We make use of genotype information of parent-child trios and inspect 3×3 contingency tables for detecting pairs of alleles from different genomic positions that are over- or under-represented in the population. We also developed a simulation setup which mimics the pedigree structure by simultaneously assuming independence of the markers. When applied to mouse SNP data, our method detected 168 imbalanced allele pairs, which is substantially more than in simulations assuming no interactions. We could validate a significant number of the interactions with external data, and we found that interacting loci are enriched for genes involved in developmental processes.
Collapse
Affiliation(s)
- Marit Ackermann
- Cellular Networks and Systems Biology, Biotechnology Center, Technische Universität Dresden, Dresden, Germany
| | - Andreas Beyer
- Cellular Networks and Systems Biology, Biotechnology Center, Technische Universität Dresden, Dresden, Germany
- * E-mail:
| |
Collapse
|
12
|
Zhang L, Liu R, Wang Z, Culver DA, Wu R. Modeling haplotype-haplotype interactions in case-control genetic association studies. Front Genet 2012; 3:2. [PMID: 22303409 PMCID: PMC3260479 DOI: 10.3389/fgene.2012.00002] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2011] [Accepted: 01/02/2012] [Indexed: 01/30/2023] Open
Abstract
Haplotype analysis has been increasingly used to study the genetic basis of human diseases, but models for characterizing genetic interactions between haplotypes from different chromosomal regions have not been well developed in the current literature. In this article, we describe a statistical model for testing haplotype-haplotype interactions for human diseases with a case-control genetic association design. The model is formulated on a contingency table in which cases and controls are typed for the same set of molecular markers. By integrating well-established quantitative genetic principles, the model is equipped with a capacity to characterize physiologically meaningful epistasis arising from interactions between haplotypes from different chromosomal regions. The model allows the partition of epistasis into different components due to additive × additive, additive × dominance, dominance × additive, and dominance × dominance interactions. We derive the EM algorithm to estimate and test the effects of each of these components on differences in the pattern of genetic variation between cases and controls and, therefore, examine their role in the pathogenesis of human diseases. The method was further extended to investigate gene-environment interactions expressed at the haplotype level. The statistical properties of the models were investigated through simulation studies and its usefulness and utilization validated by analyzing the genetic association of sarcoidosis from a human genetics project.
Collapse
Affiliation(s)
- Li Zhang
- Department of Quantitative Health Sciences, Cleveland Clinic Cleveland, OH, USA
| | | | | | | | | |
Collapse
|
13
|
Liu T, Thalamuthu A, Liu JJ, Chen C, Wang Z, Wu R. Asymptotic distribution for epistatic tests in case-control studies. Genomics 2011; 98:145-51. [PMID: 21620949 DOI: 10.1016/j.ygeno.2011.05.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2010] [Revised: 04/18/2011] [Accepted: 05/10/2011] [Indexed: 12/01/2022]
Abstract
We propose a statistical model for dissecting a multilocus genotypic value into its main (additive and dominant) effects and epistatic effects between different loci in a case-control association study. The model can discern four different kinds of epistasis, additive × additive, additive × dominant, dominant × additive, and dominant × dominant interactions. To test each kind of epistasis, a χ(2) test statistic was computed for a two by two contingency table derived from combined genotypes in both case and control groups. We derived an analytical approach for estimating the asymptotic distribution of the χ(2) test statistic for epistatic tests under the null hypothesis, with the result being consistent with that from Monte Carlo simulations. The new model was used to analyze a case-control data set for candidate gene studies of stroke, leading to the identification of several significant interactions between causal SNPs on this disease.
Collapse
Affiliation(s)
- Tian Liu
- Center for Computational Biology, Beijing Forestry University, Beijing 100083, China.
| | | | | | | | | | | |
Collapse
|
14
|
Wu X, Dong H, Luo L, Zhu Y, Peng G, Reveille JD, Xiong M. A novel statistic for genome-wide interaction analysis. PLoS Genet 2010; 6:e1001131. [PMID: 20885795 DOI: 10.1371/journal.pgen.1001131] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2010] [Accepted: 08/20/2010] [Indexed: 12/25/2022] Open
Abstract
Although great progress in genome-wide association studies (GWAS) has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked). The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001<FDR<0.003, respectively, which were seen in two independent studies of psoriasis. These included five interacting pairs of SNPs in genes LST1/NCR3, CXCR5/BCL9L, and GLS2, some of which were located in the target sites of miR-324-3p, miR-433, and miR-382, as well as 15 pairs of interacting SNPs that had nonsynonymous substitutions. Our results demonstrated that genome-wide interaction analysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genome-wide interaction analysis can be replicated in two independent studies.
Collapse
|
15
|
|
16
|
Abstract
Genome-wide association studies of discrete traits generally use simple methods of analysis based on chi(2) tests for contingency tables or logistic regression, at least for an initial scan of the entire genome. Nevertheless, more power might be obtained by using various methods that analyze multiple markers in combination. Methods based on sliding windows, wavelets, Bayesian shrinkage, or penalized likelihood methods, among others, were explored by various participants of Genetic Analysis Workshop 16 Group 1 to combine information across multiple markers within a region, while others used Bayesian variable selection methods for genome-wide multivariate analyses of all markers simultaneously. Imputation can be used to fill in missing markers on individual subjects within a study or in a meta-analysis of studies using different panels. Although multiple imputation theoretically should give more robust tests of association, one participant contribution found little difference between results of single and multiple imputation. Careful control of population stratification is essential, and two contributions found that previously reported associations with two genes disappeared after more precise control. Other issues considered by this group included subgroup analysis, gene-gene interactions, and the use of biomarkers.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California, Los Angeles, California 90089-9011, USA.
| |
Collapse
|