1
|
Lazebnik T, Simon-Keren L. Cancer-inspired genomics mapper model for the generation of synthetic DNA sequences with desired genomics signatures. Comput Biol Med 2023; 164:107221. [PMID: 37478715 DOI: 10.1016/j.compbiomed.2023.107221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/16/2023] [Accepted: 06/30/2023] [Indexed: 07/23/2023]
Abstract
Genome data are crucial in modern medicine, offering significant potential for diagnosis and treatment. Thanks to technological advancements, many millions of healthy and diseased genomes have already been sequenced; however, obtaining the most suitable data for a specific study, and specifically for validation studies, remains challenging with respect to scale and access. Therefore, in silico genomics sequence generators have been proposed as a possible solution. However, the current generators produce inferior data using mostly shallow (stochastic) connections, detected with limited computational complexity in the training data. This means they do not take the appropriate biological relations and constraints, that originally caused the observed connections, into consideration. To address this issue, we propose cancer-inspired genomics mapper model (CGMM), that combines genetic algorithm (GA) and deep learning (DL) methods to tackle this challenge. CGMM mimics processes that generate genetic variations and mutations to transform readily available control genomes into genomes with the desired phenotypes. We demonstrate that CGMM can generate synthetic genomes of selected phenotypes such as ancestry and cancer that are indistinguishable from real genomes of such phenotypes, based on unsupervised clustering. Our results show that CGMM outperforms four current state-of-the-art genomics generators on two different tasks, suggesting that CGMM will be suitable for a wide range of purposes in genomic medicine, especially for much-needed validation studies.
Collapse
Affiliation(s)
- Teddy Lazebnik
- Department of Cancer Biology, Cancer Institute, University College London, London, UK.
| | | |
Collapse
|
2
|
Ruiz-Arenas C, Cáceres A, López M, Pelegrí-Sisó D, González J, González JR. Identifying chromosomal subpopulations based on their recombination histories advances the study of the genetic basis of phenotypic traits. Genome Res 2020; 30:1802-1814. [PMID: 33203765 PMCID: PMC7706724 DOI: 10.1101/gr.258301.119] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 10/22/2020] [Indexed: 02/06/2023]
Abstract
Recombination is a main source of genetic variability. However, the potential role of the variation generated by recombination in phenotypic traits, including diseases, remains unexplored because there is currently no method to infer chromosomal subpopulations based on recombination pattern differences. We developed recombClust, a method that uses SNP-phased data to detect differences in historic recombination in a chromosome population. We validated our method by performing simulations and by using real data to accurately predict the alleles of well-known recombination modifiers, including common inversions in Drosophila melanogaster and human, and the chromosomes under selective pressure at the lactase locus in humans. We then applied recombClust to the complex human 1q21.1 region, where nonallelic homologous recombination produces deleterious phenotypes. We discovered and validated the presence of two different recombination histories in these regions that significantly associated with the differential expression of ANKRD35 in whole blood and that were in high linkage with variants previously associated with hypertension. By detecting differences in historic recombination, our method opens a way to assess the influence of recombination variation in phenotypic traits.
Collapse
Affiliation(s)
- Carlos Ruiz-Arenas
- Genetics Unit, Universitat Pompeu Fabra, Barcelona 08003, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Barcelona 08003, Spain
| | - Alejandro Cáceres
- Instituto de Salud Global de Barcelona, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Spain
| | - Marcos López
- Genetics Unit, Universitat Pompeu Fabra, Barcelona 08003, Spain.,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Barcelona 08003, Spain
| | - Dolors Pelegrí-Sisó
- Instituto de Salud Global de Barcelona, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Spain
| | - Josefa González
- Institute of Evolutionary Biology (CSIC-UPF), Barcelona 08003, Spain
| | - Juan R González
- Instituto de Salud Global de Barcelona, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain.,CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona 08003, Spain
| |
Collapse
|
3
|
Ruiz-Arenas C, Cáceres A, López-Sánchez M, Tolosana I, Pérez-Jurado L, González JR. scoreInvHap: Inversion genotyping for genome-wide association studies. PLoS Genet 2019; 15:e1008203. [PMID: 31269027 PMCID: PMC6608898 DOI: 10.1371/journal.pgen.1008203] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 05/17/2019] [Indexed: 02/02/2023] Open
Abstract
Polymorphic inversions contribute to adaptation and phenotypic variation. However, large multi-centric association studies of inversions remain challenging. We present scoreInvHap, a method to genotype inversions from SNP data for genome-wide association studies (GWASs), overcoming important limitations of current methods and outperforming them in accuracy and applicability. scoreInvHap calls individual inversion-genotypes from a similarity score to the SNPs of experimentally validated references. It can be used on different sources of SNP data, including those with low SNP coverage such as exome sequencing, and is easily adaptable to genotype new inversions, either in humans or in other species. We present 20 human inversions that can be reliably and easily genotyped with scoreInvHap to discover their role in complex human traits, and illustrate a first genome-wide association study of experimentally-validated human inversions. scoreInvHap is implemented in R and it is freely available from Bioconductor. Chromosomal inversions are structural variants consisting on an orientation change of a chromosome segment. Inversions have been linked to some phenotypic differences between individuals and to genetic divergence. However, their overall contribution to complex diseases is largely underdetermined as there are no high-throughput methods to call inversion-genotypes in large cohort studies. Here, we propose a new method, scoreInvHap, to call individual inversion genotypes from their haplotype similarity. We show that scoreInvHap has a high performance when analyzing heterogeneous sources of SNP data. Our current implementation contains 20 human inversions that can be readily genotyped in existing GWAS datasets. We exemplify the utility of scoreInvHap by running the first-genome wide association of experimentally validated inversions and a multi-centric inversion association study. All in all, scoreInvHap can substantially contribute to increase our knowledge of the role of chromosomal inversions in complex diseases by re-analyzing data from existing genetic association studies.
Collapse
Affiliation(s)
- Carlos Ruiz-Arenas
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| | - Alejandro Cáceres
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| | - Marcos López-Sánchez
- Genetics Unit, Universitat Pompeu Fabra, Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), Barcelona, Spain
| | - Ignacio Tolosana
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
| | - Luis Pérez-Jurado
- Genetics Unit, Universitat Pompeu Fabra, Barcelona, Spain
- Hospital del Mar Research Institute (IMIM), Barcelona, Spain
- SA Clinical Genetics, Women's and Children's Hospital & University of Adelaide, Adelaide, South Australia Australia
- South Australian Health and Medical Research Institute, Adelaide, South Australia Australia
| | - Juan R. González
- ISGlobal, Centre for Research in Environmental Epidemiology (CREAL), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBERESP), Barcelona, Spain
- * E-mail:
| |
Collapse
|
4
|
Alves JM, Lima AC, Pais IA, Amir N, Celestino R, Piras G, Monne M, Comas D, Heutink P, Chikhi L, Amorim A, Lopes AM. Reassessing the Evolutionary History of the 17q21 Inversion Polymorphism. Genome Biol Evol 2015; 7:3239-48. [PMID: 26560338 PMCID: PMC4700947 DOI: 10.1093/gbe/evv214] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
A polymorphic inversion that lies on chromosome 17q21 comprises two major haplotype families (H1 and H2) that not only differ in orientation but also in copy-number. Although the processes driving the spread of the inversion-associated lineage (H2) in humans remain unclear, a selective advantage has been proposed for one of its subtypes. Here, we genotyped a large panel of individuals from previously overlooked populations using a custom array with a unique panel of H2-specific single nucleotide polymorphisms and found a patchy distribution of H2 haplotypes in Africa, with North Africans displaying a higher frequency of inverted subtypes, when compared with Sub-Saharan groups. Interestingly, North African H2s were found to be closer to "non-African" chromosomes further supporting that these populations may have diverged more recently from groups outside Africa. Our results uncovered higher diversity within the H2 family than previously described, weakening the hypothesis of a strong selective sweep on all inverted chromosomes and suggesting a rather complex evolutionary history at this locus.
Collapse
Affiliation(s)
- Joao M Alves
- Doctoral Program in Areas of Basic and Applied Biology (GABBA), University of Porto, Portugal Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal Instituto de Patologia e Imunologia Molecular da Universidade do Porto-IPATIMUP, Portugal Instituto Gulbenkian de Ciência (IGC), Oeiras, Portugal Present address: Department of Biochemistry, Genetics and Immunology and Institute of Biomedical Research of Vigo (IBIV), University of Vigo, Vigo, Spain
| | - Ana C Lima
- Doctoral Program in Areas of Basic and Applied Biology (GABBA), University of Porto, Portugal Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal Instituto de Patologia e Imunologia Molecular da Universidade do Porto-IPATIMUP, Portugal Department of Genetics, Washington University School of Medicine, St. Louis
| | - Isa A Pais
- Instituto Gulbenkian de Ciência (IGC), Oeiras, Portugal
| | - Nadir Amir
- Laboratoire de Biochimie Appliquée, Faculté des Sciences de la Nature et de la Vie, Université Abedrrahmane Mira de Bejaia, Algerie
| | - Ricardo Celestino
- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal Instituto de Patologia e Imunologia Molecular da Universidade do Porto-IPATIMUP, Portugal School of Allied Health Technologies, Polytechnic of Porto, Porto, Portugal
| | - Giovanna Piras
- Department of Hematology, Centro di Diagnostica Biomoleculare et Citogenetica Emato-Oncologica, San Francesco Hospital-ASL, Nuoro, Italy
| | - Maria Monne
- Department of Hematology, Centro di Diagnostica Biomoleculare et Citogenetica Emato-Oncologica, San Francesco Hospital-ASL, Nuoro, Italy
| | - David Comas
- Departament de Ciències Experimentals i de la Salut, Institut de Biologia Evolutiva (CSIC-UPF), Universitat Pompeu Fabra, Barcelona, Spain
| | - Peter Heutink
- German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | - Lounès Chikhi
- Instituto Gulbenkian de Ciência (IGC), Oeiras, Portugal CNRS (Centre National de la Recherche Scientifique), Université Paul Sabatier, École Nationale de Formation Agronomique, Unité Mixte de Recherche 5174 EDB (Laboratoire Évolution & Diversit Biologique), Toulouse, France
| | - António Amorim
- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal Instituto de Patologia e Imunologia Molecular da Universidade do Porto-IPATIMUP, Portugal Faculdade de Ciências da Universidade do Porto, Portugal
| | - Alexandra M Lopes
- Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal Instituto de Patologia e Imunologia Molecular da Universidade do Porto-IPATIMUP, Portugal
| |
Collapse
|
5
|
Cáceres A, González JR. Following the footprints of polymorphic inversions on SNP data: from detection to association tests. Nucleic Acids Res 2015; 43:e53. [PMID: 25672393 PMCID: PMC4417146 DOI: 10.1093/nar/gkv073] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Accepted: 01/20/2015] [Indexed: 11/12/2022] Open
Abstract
Inversion polymorphisms have important phenotypic and evolutionary consequences in humans. Two different methodologies have been used to infer inversions from SNP dense data, enabling the use of large cohorts for their study. One approach relies on the differences in linkage disequilibrium across breakpoints; the other one captures the internal haplotype groups that tag the inversion status of chromosomes. In this article, we assessed the convergence of the two methods in the detection of 20 human inversions that have been reported in the literature. The methods converged in four inversions including inv-8p23, for which we studied its association with low-BMI in American children. Using a novel haplotype tagging method with control on inversion ancestry, we computed the frequency of inv-8p23 in two American cohorts and observed inversion haplotype admixture. Accounting for haplotype ancestry, we found that the European inverted allele in children carries a recessive risk of underweight, validated in an independent Spanish cohort (combined: OR= 2.00, P = 0.001). While the footprints of inversions on SNP data are complex, we show that systematic analyses, such as convergence of different methods and controlling for ancestry, can reveal the contribution of inversions to the ancestral composition of populations and to the heritability of human disease.
Collapse
Affiliation(s)
- Alejandro Cáceres
- Center for Research in Environmental Epidemiology (CREAL), Doctor Aiguader 88, Barcelona 08003, Spain IMIM (Hospital del Mar Research Institute), Doctor Aiguader 88, Barcelona 08003, Spain
| | - Juan R González
- Center for Research in Environmental Epidemiology (CREAL), Doctor Aiguader 88, Barcelona 08003, Spain IMIM (Hospital del Mar Research Institute), Doctor Aiguader 88, Barcelona 08003, Spain Centro de Investigacion Biomedica en Red en Epidemiologia y Salud Publica (CIBERESP), Barcelona 08036, Spain Department of Mathematics, Universitat Autonoma de Barcelona (UAB), Barcelona 08193, Spain
| |
Collapse
|
6
|
Alves JM, Lopes AM, Chikhi L, Amorim A. On the structural plasticity of the human genome: chromosomal inversions revisited. Curr Genomics 2013; 13:623-32. [PMID: 23730202 PMCID: PMC3492802 DOI: 10.2174/138920212803759703] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Revised: 09/23/2012] [Accepted: 09/24/2012] [Indexed: 01/02/2023] Open
Abstract
With the aid of novel and powerful molecular biology techniques, recent years have witnessed a dramatic increase in the number of studies reporting the involvement of complex structural variants in several genomic disorders. In fact, with the discovery of Copy Number Variants (CNVs) and other forms of unbalanced structural variation, much attention has been directed to the detection and characterization of such rearrangements, as well as the identification of the mechanisms involved in their formation. However, it has long been appreciated that chromosomes can undergo other forms of structural changes - balanced rearrangements - that do not involve quantitative variation of genetic material. Indeed, a particular subtype of balanced rearrangement – inversions – was recently found to be far more common than had been predicted from traditional cytogenetics. Chromosomal inversions alter the orientation of a specific genomic sequence and, unless involving breaks in coding or regulatory regions (and, disregarding complex trans effects, in their close vicinity), appear to be phenotypically silent. Such a surprising finding, which is difficult to reconcile with the classical interpretation of inversions as a mechanism causing subfertility (and ultimately reproductive isolation), motivated a new series of theoretical and empirical studies dedicated to understand their role in human genome evolution and to explore their possible association to complex genetic disorders. With this review, we attempt to describe the latest methodological improvements to inversions detection at a genome wide level, while exploring some of the possible implications of inversion rearrangements on the evolution of the human genome.
Collapse
Affiliation(s)
- Joao M Alves
- Doctoral Program in Areas of Basic and Applied Biology (GABBA), University of Porto, Portugal ; IPATIMUP - Instituto de Patologia e Imunologia Molecular da Universidade do Porto, Porto, Portugal ; Instituto Gulbenkian de Ciência (IGC), Oeiras, Portugal
| | | | | | | |
Collapse
|
7
|
A sequential coalescent algorithm for chromosomal inversions. Heredity (Edinb) 2013; 111:200-9. [PMID: 23632894 DOI: 10.1038/hdy.2013.38] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 02/04/2013] [Accepted: 03/25/2013] [Indexed: 01/06/2023] Open
Abstract
Chromosomal inversions are common in natural populations and are believed to be involved in many important evolutionary phenomena, including speciation, the evolution of sex chromosomes and local adaptation. While recent advances in sequencing and genotyping methods are leading to rapidly increasing amounts of genome-wide sequence data that reveal interesting patterns of genetic variation within inverted regions, efficient simulation methods to study these patterns are largely missing. In this work, we extend the sequential Markovian coalescent, an approximation to the coalescent with recombination, to include the effects of polymorphic inversions on patterns of recombination. Results show that our algorithm is fast, memory-efficient and accurate, making it feasible to simulate large inversions in large populations for the first time. The SMC algorithm enables studies of patterns of genetic variation (for example, linkage disequilibria) and tests of hypotheses (using simulation-based approaches) that were previously intractable.
Collapse
|
8
|
The effect of genomic inversions on estimation of population genetic parameters from SNP data. Genetics 2012; 193:243-53. [PMID: 23150602 DOI: 10.1534/genetics.112.145599] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
In recent years it has emerged that structural variants have a substantial impact on genomic variation. Inversion polymorphisms represent a significant class of structural variant, and despite the challenges in their detection, data on inversions in the human genome are increasing rapidly. Statistical methods for inferring parameters such as the recombination rate and the selection coefficient have generally been developed without accounting for the presence of inversions. Here we exploit new software for simulating inversions in population genetic data, invertFREGENE, to assess the potential impact of inversions on such methods. Using data simulated by invertFREGENE, as well as real data from several sources, we test whether large inversions have a disruptive effect on widely applied population genetics methods for inferring recombination rates, for detecting selection, and for controlling for population structure in genome-wide association studies (GWAS). We find that recombination rates estimated by LDhat are biased downward at inversion loci relative to the true contemporary recombination rates at the loci but that recombination hotspots are not falsely inferred at inversion breakpoints as may have been expected. We find that the integrated haplotype score (iHS) method for detecting selection appears robust to the presence of inversions. Finally, we observe a strong bias in the genome-wide results of principal components analysis (PCA), used to control for population structure in GWAS, in the presence of even a single large inversion, confirming the necessity to thin SNPs by linkage disequilibrium at large physical distances to obtain unbiased results.
Collapse
|
9
|
Ma J, Amos CI. Investigation of inversion polymorphisms in the human genome using principal components analysis. PLoS One 2012; 7:e40224. [PMID: 22808122 PMCID: PMC3392271 DOI: 10.1371/journal.pone.0040224] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Accepted: 06/02/2012] [Indexed: 11/18/2022] Open
Abstract
Despite the significant advances made over the last few years in mapping inversions with the advent of paired-end sequencing approaches, our understanding of the prevalence and spectrum of inversions in the human genome has lagged behind other types of structural variants, mainly due to the lack of a cost-efficient method applicable to large-scale samples. We propose a novel method based on principal components analysis (PCA) to characterize inversion polymorphisms using high-density SNP genotype data. Our method applies to non-recurrent inversions for which recombination between the inverted and non-inverted segments in inversion heterozygotes is suppressed due to the loss of unbalanced gametes. Inside such an inversion region, an effect similar to population substructure is thus created: two distinct “populations” of inversion homozygotes of different orientations and their 1∶1 admixture, namely the inversion heterozygotes. This kind of substructure can be readily detected by performing PCA locally in the inversion regions. Using simulations, we demonstrated that the proposed method can be used to detect and genotype inversion polymorphisms using unphased genotype data. We applied our method to the phase III HapMap data and inferred the inversion genotypes of known inversion polymorphisms at 8p23.1 and 17q21.31. These inversion genotypes were validated by comparing with literature results and by checking Mendelian consistency using the family data whenever available. Based on the PCA-approach, we also performed a preliminary genome-wide scan for inversions using the HapMap data, which resulted in 2040 candidate inversions, 169 of which overlapped with previously reported inversions. Our method can be readily applied to the abundant SNP data, and is expected to play an important role in developing human genome maps of inversions and exploring associations between inversions and susceptibility of diseases.
Collapse
Affiliation(s)
- Jianzhong Ma
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, Texas, United States of America.
| | | |
Collapse
|
10
|
Cáceres A, Sindi SS, Raphael BJ, Cáceres M, González JR. Identification of polymorphic inversions from genotypes. BMC Bioinformatics 2012; 13:28. [PMID: 22321652 PMCID: PMC3296650 DOI: 10.1186/1471-2105-13-28] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2011] [Accepted: 02/09/2012] [Indexed: 01/19/2023] Open
Abstract
Background Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies. Results We present a novel method to both identify polymorphic inversions from genome-wide genotype data and classify individuals as containing a normal or inverted allele. Our method, a generalization of a published method for haplotype data [1], utilizes linkage between groups of SNPs to partition a set of individuals into normal and inverted subpopulations. We employ a sliding window scan to identify regions likely to have an inversion, and accumulation of evidence from neighboring SNPs is used to accurately determine the inversion status of each subject. Further, our approach detects inversions directly from genotype data, thus increasing its usability to current genome-wide association studies (GWAS). Conclusions We demonstrate the accuracy of our method to detect inversions and classify individuals on principled-simulated genotypes, produced by the evolution of an inversion event within a coalescent model [2]. We applied our method to real genotype data from HapMap Phase III to characterize the inversion status of two known inversions within the regions 17q21 and 8p23 across 1184 individuals. Finally, we scan the full genomes of the European Origin (CEU) and Yoruba (YRI) HapMap samples. We find population-based evidence for 9 out of 15 well-established autosomic inversions, and for 52 regions previously predicted by independent experimental methods in ten (9+1) individuals [3,4]. We provide efficient implementations of both genotype and haplotype methods as a unified R package inveRsion.
Collapse
Affiliation(s)
- Alejandro Cáceres
- Center for Research in Environmental Epidemiology, and Institut Municipal d'Investigació Mèdica, Barcelona 08003, Spain.
| | | | | | | | | |
Collapse
|