1
|
Mbo Nkoulou LF, Ngalle HB, Cros D, Adje COA, Fassinou NVH, Bell J, Achigan-Dako EG. Perspective for genomic-enabled prediction against black sigatoka disease and drought stress in polyploid species. FRONTIERS IN PLANT SCIENCE 2022; 13:953133. [PMID: 36388523 PMCID: PMC9650417 DOI: 10.3389/fpls.2022.953133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 09/28/2022] [Indexed: 06/16/2023]
Abstract
Genomic selection (GS) in plant breeding is explored as a promising tool to solve the problems related to the biotic and abiotic threats. Polyploid plants like bananas (Musa spp.) face the problem of drought and black sigatoka disease (BSD) that restrict their production. The conventional plant breeding is experiencing difficulties, particularly phenotyping costs and long generation interval. To overcome these difficulties, GS in plant breeding is explored as an alternative with a great potential for reducing costs and time in selection process. So far, GS does not have the same success in polyploid plants as with diploid plants because of the complexity of their genome. In this review, we present the main constraints to the application of GS in polyploid plants and the prospects for overcoming these constraints. Particular emphasis is placed on breeding for BSD and drought-two major threats to banana production-used in this review as a model of polyploid plant. It emerges that the difficulty in obtaining markers of good quality in polyploids is the first challenge of GS on polyploid plants, because the main tools used were developed for diploid species. In addition to that, there is a big challenge of mastering genetic interactions such as dominance and epistasis effects as well as the genotype by environment interaction, which are very common in polyploid plants. To get around these challenges, we have presented bioinformatics tools, as well as artificial intelligence approaches, including machine learning. Furthermore, a scheme for applying GS to banana for BSD and drought has been proposed. This review is of paramount impact for breeding programs that seek to reduce the selection cycle of polyploids despite the complexity of their genome.
Collapse
Affiliation(s)
- Luther Fort Mbo Nkoulou
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
- Institute of Agricultural Research for Development, Centre de Recherche Agricole de Mbalmayo (CRAM), Mbalmayo, Cameroon
| | - Hermine Bille Ngalle
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
| | - David Cros
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Unité Mixte de Recherche (UMR) Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP) Institut, Montpellier, France
- Unité Mixte de Recherche (UMR) Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP) Institut, University of Montpellier, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Institut Agro, Montpellier, France
| | - Charlotte O. A. Adje
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| | - Nicodeme V. H. Fassinou
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| | - Joseph Bell
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
| | - Enoch G. Achigan-Dako
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| |
Collapse
|
2
|
Voorrips RE, Tumino G. PolyHaplotyper: haplotyping in polyploids based on bi-allelic marker dosage data. BMC Bioinformatics 2022; 23:442. [PMID: 36274121 PMCID: PMC9590153 DOI: 10.1186/s12859-022-04989-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 10/16/2022] [Indexed: 11/18/2022] Open
Abstract
Background For genetic analyses, multi-allelic markers have an advantage over bi-allelic markers like SNPs (single nucleotide polymorphisms) in that they carry more information about the genetic constitution of individuals. This is especially the case in polyploids, where individuals carry more than two alleles at each locus. Haploblocks are multi-allelic markers that can be derived by phasing sets of closely-linked SNP markers. Phased haploblocks, similarly to other multi-allelic markers, will therefore be advantageous in genetic tasks like linkage mapping, QTL mapping and genome-wide association studies. Results We present a new method to reconstruct haplotypes from SNP dosages derived from genotyping arrays, which is applicable to polyploids. This method is implemented in the software package PolyHaplotyper. In contrast to existing packages for polyploids it makes use of full-sib families among the samples to guide the haplotyping process. We show that in this situation it is much more accurate than other available software, using experimental hexaploid data and simulated tetraploid data. Conclusions Our method and the software package PolyHaplotyper in which it is implemented extend the available tools for haplotyping in polyploids. They perform especially well in situations where one or more full-sib families are present. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04989-0.
Collapse
|
3
|
Concordance rate between copy number variants detected using either high- or medium-density single nucleotide polymorphism genotype panels and the potential of imputing copy number variants from flanking high density single nucleotide polymorphism haplotypes in cattle. BMC Genomics 2020; 21:205. [PMID: 32131735 PMCID: PMC7057620 DOI: 10.1186/s12864-020-6627-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 02/26/2020] [Indexed: 12/01/2022] Open
Abstract
Background The trading of individual animal genotype information often involves only the exchange of the called genotypes and not necessarily the additional information required to effectively call structural variants. The main aim here was to determine if it is possible to impute copy number variants (CNVs) using the flanking single nucleotide polymorphism (SNP) haplotype structure in cattle. While this objective was achieved using high-density genotype panels (i.e., 713,162 SNPs), a secondary objective investigated the concordance of CNVs called with this high-density genotype panel compared to CNVs called from a medium-density panel (i.e., 45,677 SNPs in the present study). This is the first study to compare CNVs called from high-density and medium-density SNP genotypes from the same animals. High (and medium-density) genotypes were available on 991 Holstein-Friesian, 1015 Charolais, and 1394 Limousin bulls. The concordance between CNVs called from the medium-density and high-density genotypes were calculated separately for each animal. A subset of CNVs which were called from the high-density genotypes was selected for imputation. Imputation was carried out separately for each breed using a set of high-density SNPs flanking the midpoint of each CNV. A CNV was deemed to be imputed correctly when the called copy number matched the imputed copy number. Results For 97.0% of CNVs called from the high-density genotypes, the corresponding genomic position on the medium-density of the animal did not contain a called CNV. The average accuracy of imputation for CNV deletions was 0.281, with a standard deviation of 0.286. The average accuracy of imputation of the CNV normal state, i.e. the absence of a CNV, was 0.982 with a standard deviation of 0.022. Two CNV duplications were imputed in the Charolais, a single CNV duplication in the Limousins, and a single CNV duplication in the Holstein-Friesians; in all cases the CNV duplications were incorrectly imputed. Conclusion The vast majority of CNVs called from the high-density genotypes were not detected using the medium-density genotypes. Furthermore, CNVs cannot be accurately predicted from flanking SNP haplotypes, at least based on the imputation algorithms routinely used in cattle, and using the SNPs currently available on the high-density genotype panel.
Collapse
|
4
|
Dou Y, Gold HD, Luquette LJ, Park PJ. Detecting Somatic Mutations in Normal Cells. Trends Genet 2018; 34:545-557. [PMID: 29731376 PMCID: PMC6029698 DOI: 10.1016/j.tig.2018.04.003] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Revised: 04/03/2018] [Accepted: 04/05/2018] [Indexed: 01/12/2023]
Abstract
Somatic mutations have been studied extensively in the context of cancer. Recent studies have demonstrated that high-throughput sequencing data can be used to detect somatic mutations in non-tumor cells. Analysis of such mutations allows us to better understand the mutational processes in normal cells, explore cell lineages in development, and examine potential associations with age-related disease. We describe here approaches for characterizing somatic mutations in normal and non-tumor disease tissues. We discuss several experimental designs and common pitfalls in somatic mutation detection, as well as more recent developments such as phasing and linked-read technology. With the dramatically increasing numbers of samples undergoing genome sequencing, bioinformatic analysis will enable the characterization of somatic mutations and their impact on non-cancer tissues.
Collapse
Affiliation(s)
- Yanmei Dou
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Equal contributions
| | - Heather D Gold
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Bioinformatics and Integrative Genomics PhD Program, Harvard Medical School, Boston, MA, USA; Equal contributions
| | - Lovelace J Luquette
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Bioinformatics and Integrative Genomics PhD Program, Harvard Medical School, Boston, MA, USA; Equal contributions
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA.
| |
Collapse
|
5
|
Zuccherato LW, Schneider S, Tarazona-Santos E, Hardwick RJ, Berg DE, Bogle H, Gouveia MH, Machado LR, Machado M, Rodrigues-Soares F, Soares-Souza GB, Togni DL, Zamudio R, Gilman RH, Duarte D, Hollox EJ, Rodrigues MR. Population genetics of immune-related multilocus copy number variation in Native Americans. J R Soc Interface 2017; 14:rsif.2017.0057. [PMID: 28356540 DOI: 10.1098/rsif.2017.0057] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Accepted: 03/02/2017] [Indexed: 12/22/2022] Open
Abstract
While multiallelic copy number variation (mCNV) loci are a major component of genomic variation, quantifying the individual copy number of a locus and defining genotypes is challenging. Few methods exist to study how mCNV genetic diversity is apportioned within and between populations (i.e. to define the population genetic structure of mCNV). These inferences are critical in populations with a small effective size, such as Amerindians, that may not fit the Hardy-Weinberg model due to inbreeding, assortative mating, population subdivision, natural selection or a combination of these evolutionary factors. We propose a likelihood-based method that simultaneously infers mCNV allele frequencies and the population structure parameter f, which quantifies the departure of homozygosity from the Hardy-Weinberg expectation. This method is implemented in the freely available software CNVice, which also infers individual genotypes using information from both the population and from trios, if available. We studied the population genetics of five immune-related mCNV loci associated with complex diseases (beta-defensins, CCL3L1/CCL4L1, FCGR3A, FCGR3B and FCGR2C) in 12 traditional Native American populations and found that the population structure parameters inferred for these mCNVs are comparable to but lower than those for single nucleotide polymorphisms studied in the same populations.
Collapse
Affiliation(s)
- Luciana W Zuccherato
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Silvana Schneider
- Departamento de Estatística, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Eduardo Tarazona-Santos
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | | | - Douglas E Berg
- Department of Molecular Microbiology, Washington University in Saint Louis School of Medicine, St Louis, MO, USA.,Department of Medicine, University of California San Diego, CA, USA
| | - Helen Bogle
- Department of Genetics, University of Leicester, Leicester, UK
| | - Mateus H Gouveia
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Lee R Machado
- Department of Genetics, University of Leicester, Leicester, UK.,School of Health, University of Northampton, Northampton, UK
| | - Moara Machado
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Fernanda Rodrigues-Soares
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Giordano B Soares-Souza
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Diego L Togni
- Departamento de Estatística, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Roxana Zamudio
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Robert H Gilman
- Johns Hopkins School of Public Health, Johns Hopkins University, Baltimore, MD, USA.,Asociación Benéfica PRISMA, Lima, Peru.,Universidade Peruana Cayetano Heredia, Lima, Peru
| | - Denise Duarte
- Departamento de Estatística, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Edward J Hollox
- Department of Genetics, University of Leicester, Leicester, UK
| | - Maíra R Rodrigues
- Departamento de Biologia Geral, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| |
Collapse
|
6
|
da Silva VH, Regitano LCDA, Geistlinger L, Pértille F, Giachetto PF, Brassaloti RA, Morosini NS, Zimmer R, Coutinho LL. Genome-Wide Detection of CNVs and Their Association with Meat Tenderness in Nelore Cattle. PLoS One 2016; 11:e0157711. [PMID: 27348523 PMCID: PMC4922624 DOI: 10.1371/journal.pone.0157711] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Accepted: 06/03/2016] [Indexed: 12/20/2022] Open
Abstract
Brazil is one of the largest beef producers and exporters in the world with the Nelore breed representing the vast majority of Brazilian cattle (Bos taurus indicus). Despite the great adaptability of the Nelore breed to tropical climate, meat tenderness (MT) remains to be improved. Several factors including genetic composition can influence MT. In this article, we report a genome-wide analysis of copy number variation (CNV) inferred from Illumina® High Density SNP-chip data for a Nelore population of 723 males. We detected >2,600 CNV regions (CNVRs) representing ≈6.5% of the genome. Comparing our results with previous studies revealed an overlap in ≈1400 CNVRs (>50%). A total of 1,155 CNVRs (43.6%) overlapped 2,750 genes. They were enriched for processes involving guanosine triphosphate (GTP), previously reported to influence skeletal muscle physiology and morphology. Nelore CNVRs also overlapped QTLs for MT reported in other breeds (8.9%, 236 CNVRs) and from a previous study with this population (4.1%, 109 CNVRs). Two CNVRs were also proximal to glutathione metabolism genes that were previously associated with MT. Genome-wide association study of CN state with estimated breeding values derived from meat shear force identified 6 regions, including a region on BTA3 that contains genes of the cAMP and cGMP pathway. Ten CNVRs that overlapped regions associated with MT were successfully validated by qPCR. Our results represent the first comprehensive CNV study in Bos taurus indicus cattle and identify regions in which copy number changes are potentially of importance for the MT phenotype.
Collapse
Affiliation(s)
- Vinicius Henrique da Silva
- Animal Biotechnology Laboratory, Animal Science Department, University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ), Piracicaba, São Paulo, Brazil
- * E-mail: (LLC); (VHS)
| | | | - Ludwig Geistlinger
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München (LMU), Amalienstrasse 17, 80333, München, Germany
| | - Fábio Pértille
- Animal Biotechnology Laboratory, Animal Science Department, University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ), Piracicaba, São Paulo, Brazil
| | | | - Ricardo Augusto Brassaloti
- Animal Biotechnology Laboratory, Animal Science Department, University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ), Piracicaba, São Paulo, Brazil
| | - Natália Silva Morosini
- Animal Biotechnology Laboratory, Animal Science Department, University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ), Piracicaba, São Paulo, Brazil
| | - Ralf Zimmer
- Institute of Bioinformatics, Department of Informatics, Ludwig-Maximilians-Universität München (LMU), Amalienstrasse 17, 80333, München, Germany
| | - Luiz Lehmann Coutinho
- Animal Biotechnology Laboratory, Animal Science Department, University of São Paulo (USP)/Luiz de Queiroz College of Agriculture (ESALQ), Piracicaba, São Paulo, Brazil
- * E-mail: (LLC); (VHS)
| |
Collapse
|
7
|
Patel A, Edge P, Selvaraj S, Bansal V, Bafna V. InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms. Nucleic Acids Res 2016; 44:e111. [PMID: 27105843 PMCID: PMC4937317 DOI: 10.1093/nar/gkw281] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 04/06/2016] [Indexed: 11/23/2022] Open
Abstract
Phasing of single nucleotide (SNV), and structural variations into chromosome-wide haplotypes in humans has been challenging, and required either trio sequencing or restricting phasing to population-based haplotypes. Selvaraj et al. demonstrated single individual SNV phasing is possible with proximity ligated (HiC) sequencing. Here, we demonstrate HiC can phase structural variants into phased scaffolds of SNVs. Since HiC data is noisy, and SV calling is challenging, we applied a range of supervised classification techniques, including Support Vector Machines and Random Forest, to phase deletions. Our approach was demonstrated on deletion calls and phasings on the NA12878 human genome. We used three NA12878 chromosomes and simulated chromosomes to train model parameters. The remaining NA12878 chromosomes withheld from training were used to evaluate phasing accuracy. Random Forest had the highest accuracy and correctly phased 86% of the deletions with allele-specific read evidence. Allele-specific read evidence was found for 76% of the deletions. HiC provides significant read evidence for accurately phasing 33% of the deletions. Also, eight of eight top ranked deletions phased by only HiC were validated using long range polymerase chain reaction and Sanger. Thus, deletions from a single individual can be accurately phased using a combination of shotgun and proximity ligation sequencing. InPhaDel software is available at: http://l337x911.github.io/inphadel/.
Collapse
Affiliation(s)
- Anand Patel
- Bioinformatics and Systems Biology Program, University of California, San Diego 9500 Gilman Drive, La Jolla, CA 92093, USA Department of Computer Science and Engineering, University of California, San Diego 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Peter Edge
- Department of Computer Science and Engineering, University of California, San Diego 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Siddarth Selvaraj
- Bioinformatics and Systems Biology Program, University of California, San Diego 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Vikas Bansal
- Department of Pediatrics, School of Medicine, University of California, San Diego 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Vineet Bafna
- Bioinformatics and Systems Biology Program, University of California, San Diego 9500 Gilman Drive, La Jolla, CA 92093, USA Department of Computer Science and Engineering, University of California, San Diego 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
8
|
Haplotype phasing and inheritance of copy number variants in nuclear families. PLoS One 2015; 10:e0122713. [PMID: 25853576 PMCID: PMC4390228 DOI: 10.1371/journal.pone.0122713] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2014] [Accepted: 02/12/2015] [Indexed: 11/19/2022] Open
Abstract
DNA copy number variants (CNVs) that alter the copy number of a particular DNA segment in the genome play an important role in human phenotypic variability and disease susceptibility. A number of CNVs overlapping with genes have been shown to confer risk to a variety of human diseases thus highlighting the relevance of addressing the variability of CNVs at a higher resolution. So far, it has not been possible to deterministically infer the allelic composition of different haplotypes present within the CNV regions. We have developed a novel computational method, called PiCNV, which enables to resolve the haplotype sequence composition within CNV regions in nuclear families based on SNP genotyping microarray data. The algorithm allows to i) phase normal and CNV-carrying haplotypes in the copy number variable regions, ii) resolve the allelic copies of rearranged DNA sequence within the haplotypes and iii) infer the heritability of identified haplotypes in trios or larger nuclear families. To our knowledge this is the first program available that can deterministically phase null, mono-, di-, tri- and tetraploid genotypes in CNV loci. We applied our method to study the composition and inheritance of haplotypes in CNV regions of 30 HapMap Yoruban trios and 34 Estonian families. For 93.6% of the CNV loci, PiCNV enabled to unambiguously phase normal and CNV-carrying haplotypes and follow their transmission in the corresponding families. Furthermore, allelic composition analysis identified the co-occurrence of alternative allelic copies within 66.7% of haplotypes carrying copy number gains. We also observed less frequent transmission of CNV-carrying haplotypes from parents to children compared to normal haplotypes and identified an emergence of several de novo deletions and duplications in the offspring.
Collapse
|
9
|
Zhan Y, Zi X, Hu Z, Peng Y, Wu L, Li X, Jiang M, Liu L, Xie Y, Xia K, Tang B, Zhang R. PMP22-Related neuropathies and other clinical manifestations in Chinese han patients with charcot-marie-tooth disease type 1. Muscle Nerve 2015; 52:69-75. [PMID: 25522693 DOI: 10.1002/mus.24550] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/15/2014] [Indexed: 11/09/2022]
Abstract
INTRODUCTION Most cases of Charcot-Marie-Tooth (CMT) disease are caused by mutations in the peripheral myelin protein 22 gene (PMP22), including heterozygous duplications (CMT1A), deletions (HNPP), and point mutations (CMT1E). METHODS Single-nucleotide polymorphism (SNP) arrays were used to study PMP22 mutations based on the results of multiplex ligation-dependent probe amplification (MLPA) and polymerase chain reaction-restriction fragment length polymorphism methods in 77 Chinese Han families with CMT1. PMP22 sequencing was performed in MLPA-negative probands. Clinical characteristics were collected for all CMT1A/HNPP probands and their family members. RESULTS Twenty-one of 77 CMT1 probands (27.3%) carried duplication/deletion (dup/del) copynumber variants. No point mutations were detected. SNP array and MLPA seem to have similar sensitivity. Fifty-seven patients from 19 CMT1A families had the classical CMT phenotype, except for 1 with concomitant CIDP. Two HNPP probands presented with acute ulnar nerve palsy or recurrent sural nerve palsy, respectively. CONCLUSIONS The SNP array has wide coverage, high sensitivity, and high resolution and can be used as a screening tool to detect PMP22 dup/del as shown in this Chinese Han population.
Collapse
Affiliation(s)
- Yajing Zhan
- Department of Neurology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan Province, People's Republic of China
| | - Xiaohong Zi
- Department of Neurology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan Province, People's Republic of China
| | - Zhengmao Hu
- National Key Lab of Medical Genetics, Central South University, Changsha, People's Republic of China
| | - Ying Peng
- National Key Lab of Medical Genetics, Central South University, Changsha, People's Republic of China
| | - Lingqian Wu
- National Key Lab of Medical Genetics, Central South University, Changsha, People's Republic of China
| | - Xiaobo Li
- Department of Neurology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan Province, People's Republic of China
| | - Mingming Jiang
- Department of Neurology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan Province, People's Republic of China
| | - Lei Liu
- Department of Neurology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan Province, People's Republic of China
| | - Yongzhi Xie
- Department of Neurology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan Province, People's Republic of China
| | - Kun Xia
- National Key Lab of Medical Genetics, Central South University, Changsha, People's Republic of China
| | - Beisha Tang
- Department of Neurology, Xiangya Hospital, Central South University, Changsha, People's Republic of China
| | - Ruxu Zhang
- Department of Neurology, Third Xiangya Hospital, Central South University, Changsha, 410013, Hunan Province, People's Republic of China
| |
Collapse
|
10
|
Hardy-Weinberg equilibrium revisited for inferences on genotypes featuring allele and copy-number variations. Sci Rep 2015; 5:9066. [PMID: 25765626 PMCID: PMC4357990 DOI: 10.1038/srep09066] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 02/11/2015] [Indexed: 12/22/2022] Open
Abstract
Copy number variations represent a substantial source of genetic variation and are associated with a plethora of physiological and pathophysiological conditions. Joint copy number and allelic variations (CNAVs) are difficult to analyze and require new strategies to unravel the properties of genotype distributions. We developed a Bayesian hidden Markov model (HMM) approach that allows dissecting intrinsic properties and metastructures of the distribution of CNAVs within populations, in particular haplotype phases of genes with varying copy numbers. As a key feature, this approach incorporates an extension of the Hardy-Weinberg equilibrium, allowing both a comprehensive and parsimonious model design. We demonstrate the quality of performance and applicability of the HMM approach with a real data set describing the Fcγ receptor (FcγR) gene region. Our concept, using a dynamic process to analyze a static distribution, establishes the basis for a novel understanding of complex genomic data sets.
Collapse
|
11
|
Estimating copy numbers of alleles from population-scale high-throughput sequencing data. BMC Bioinformatics 2015; 16 Suppl 1:S4. [PMID: 25707811 PMCID: PMC4331703 DOI: 10.1186/1471-2105-16-s1-s4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci. Results We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring. Conclusions Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases.
Collapse
|
12
|
Large multiallelic copy number variations in humans. Nat Genet 2015; 47:296-303. [PMID: 25621458 PMCID: PMC4405206 DOI: 10.1038/ng.3200] [Citation(s) in RCA: 258] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2014] [Accepted: 12/31/2014] [Indexed: 12/14/2022]
Abstract
Thousands of genomic segments appear to be present in widely varying copy numbers in different human genomes. We developed ways to use increasingly abundant whole-genome sequence data to identify the copy numbers, alleles and haplotypes present at most large multiallelic CNVs (mCNVs). We analyzed 849 genomes sequenced by the 1000 Genomes Project to identify most large (>5-kb) mCNVs, including 3,878 duplications, of which 1,356 appear to have 3 or more segregating alleles. We find that mCNVs give rise to most human variation in gene dosage-seven times the combined contribution of deletions and biallelic duplications-and that this variation in gene dosage generates abundant variation in gene expression. We describe 'runaway duplication haplotypes' in which genes, including HPR and ORM1, have mutated to high copy number on specific haplotypes. We also describe partially successful initial strategies for analyzing mCNVs via imputation and provide an initial data resource to support such analyses.
Collapse
|
13
|
Wu J, Chen GB, Zhi D, Liu N, Zhang K. A hidden Markov model for haplotype inference for present-absent data of clustered genes using identified haplotypes and haplotype patterns. Front Genet 2014; 5:267. [PMID: 25161663 PMCID: PMC4129397 DOI: 10.3389/fgene.2014.00267] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Accepted: 07/21/2014] [Indexed: 11/21/2022] Open
Abstract
The majority of killer cell immunoglobin-like receptor (KIR) genes are detected as either present or absent using locus-specific genotyping technology. Ambiguity arises from the presence of a specific KIR gene since the exact copy number (one or two) of that gene is unknown. Therefore, haplotype inference for these genes is becoming more challenging due to such large portion of missing information. Meantime, many haplotypes and partial haplotype patterns have been previously identified due to tight linkage disequilibrium (LD) among these clustered genes thus can be incorporated to facilitate haplotype inference. In this paper, we developed a hidden Markov model (HMM) based method that can incorporate identified haplotypes or partial haplotype patterns for haplotype inference from present-absent data of clustered genes (e.g., KIR genes). We compared its performance with an expectation maximization (EM) based method previously developed in terms of haplotype assignments and haplotype frequency estimation through extensive simulations for KIR genes. The simulation results showed that the new HMM based method outperformed the previous method when some incorrect haplotypes were included as identified haplotypes and/or the standard deviation of haplotype frequencies were small. We also compared the performance of our method with two methods that do not use previously identified haplotypes and haplotype patterns, including an EM based method, HPALORE, and a HMM based method, MaCH. Our simulation results showed that the incorporation of identified haplotypes and partial haplotype patterns can improve accuracy for haplotype inference. The new software package HaploHMM is available and can be downloaded at http://www.soph.uab.edu/ssg/files/People/KZhang/HaploHMM/haplohmm-index.html.
Collapse
Affiliation(s)
- Jihua Wu
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| | - Guo-Bo Chen
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA ; Queensland Brain Institute, The University of Queensland St. Lucia, QLD, Australia
| | - Degui Zhi
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| | - Nianjun Liu
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| | - Kui Zhang
- Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| |
Collapse
|
14
|
Efficiency of haplotype-based methods to fine-map QTLs and embryonic lethal variants affecting fertility: Illustration with a deletion segregating in Nordic Red cattle. Livest Sci 2014. [DOI: 10.1016/j.livsci.2014.04.030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
15
|
Lou H, Li S, Jin W, Fu R, Lu D, Pan X, Zhou H, Ping Y, Jin L, Xu S. Copy number variations and genetic admixtures in three Xinjiang ethnic minority groups. Eur J Hum Genet 2014; 23:536-42. [PMID: 25026903 DOI: 10.1038/ejhg.2014.134] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Revised: 06/06/2014] [Accepted: 06/12/2014] [Indexed: 11/09/2022] Open
Abstract
Xinjiang is geographically located in central Asia, and it has played an important historical role in connecting eastern Eurasian (EEA) and western Eurasian (WEA) people. However, human population genomic studies in this region have been largely underrepresented, especially with respect to studies of copy number variations (CNVs). Here we constructed the first CNV map of the three major ethnic minority groups, the Uyghur, Kazakh and Kirgiz, using Affymetrix Genome-Wide Human SNP Array 6.0. We systematically compared the properties of CNVs we identified in the three groups with the data from representatives of EEA and WEA. The analyses indicated a typical genetic admixture pattern in all three groups with ancestries from both EEA and WEA. We also identified several CNV regions showing significant deviation of allele frequency from the expected genome-wide distribution, which might be associated with population-specific phenotypes. Our study provides the first genome-wide perspective on the CNVs of three major Xinjiang ethnic minority groups and has implications for both evolutionary and medical studies.
Collapse
Affiliation(s)
- Haiyi Lou
- 1] Max Planck Independent Research Group on Population Genomics Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China [2] Chinese Academy of Sciences Key Laboratory of Computational Biology, Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Shilin Li
- Ministry of Education (MOE) Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| | - Wenfei Jin
- Max Planck Independent Research Group on Population Genomics Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Ruiqing Fu
- Max Planck Independent Research Group on Population Genomics Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Dongsheng Lu
- Max Planck Independent Research Group on Population Genomics Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Xinwei Pan
- Ministry of Education (MOE) Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
| | - Huaigu Zhou
- Key Laboratory of Forensic Evidence and Scene Technology, Ministry of Public Security and Shanghai Key Laboratory of Crime Scene Evidence, Shanghai, China
| | - Yuan Ping
- Key Laboratory of Forensic Evidence and Scene Technology, Ministry of Public Security and Shanghai Key Laboratory of Crime Scene Evidence, Shanghai, China
| | - Li Jin
- 1] Chinese Academy of Sciences Key Laboratory of Computational Biology, Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China [2] Ministry of Education (MOE) Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, China [3] Key Laboratory of Forensic Evidence and Scene Technology, Ministry of Public Security and Shanghai Key Laboratory of Crime Scene Evidence, Shanghai, China
| | - Shuhua Xu
- 1] Max Planck Independent Research Group on Population Genomics Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China [2] Chinese Academy of Sciences Key Laboratory of Computational Biology, Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
16
|
Scharpf RB, Mireles L, Yang Q, Köttgen A, Ruczinski I, Susztak K, Halper-Stromberg E, Tin A, Cristiano S, Chakravarti A, Boerwinkle E, Fox CS, Coresh J, Linda Kao WH. Copy number polymorphisms near SLC2A9 are associated with serum uric acid concentrations. BMC Genet 2014; 15:81. [PMID: 25007794 PMCID: PMC4118309 DOI: 10.1186/1471-2156-15-81] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 06/30/2014] [Indexed: 11/10/2022] Open
Abstract
Background Hyperuricemia is associated with multiple diseases, including gout, cardiovascular disease, and renal disease. Serum urate is highly heritable, yet association studies of single nucleotide polymorphisms (SNPs) and serum uric acid explain a small fraction of the heritability. Whether copy number polymorphisms (CNPs) contribute to uric acid levels is unknown. Results We assessed copy number on a genome-wide scale among 8,411 individuals of European ancestry (EA) who participated in the Atherosclerosis Risk in Communities (ARIC) study. CNPs upstream of the urate transporter SLC2A9 on chromosome 4p16.1 are associated with uric acid (χ2df2=3545, p=3.19×10-23). Effect sizes, expressed as the percentage change in uric acid per deleted copy, are most pronounced among women (3.974.935.87 [ 2.55097.5 denoting percentiles], p=4.57×10-23) and independent of previously reported SNPs in SLC2A9 as assessed by SNP and CNP regression models and the phasing SNP and CNP haplotypes (χ2df2=3190,p=7.23×10-08). Our finding is replicated in the Framingham Heart Study (FHS), where the effect size estimated from 4,089 women is comparable to ARIC in direction and magnitude (1.414.707.88, p=5.46×10-03). Conclusions This is the first study to characterize CNPs in ARIC and the first genome-wide analysis of CNPs and uric acid. Our findings suggests a novel, non-coding regulatory mechanism for SLC2A9-mediated modulation of serum uric acid, and detail a bioinformatic approach for assessing the contribution of CNPs to heritable traits in large population-based studies where technical sources of variation are substantial.
Collapse
Affiliation(s)
- Robert B Scharpf
- 550 N, Broadway, Suite 1101, Department of Oncology, Johns Hopkins School of Medicine, Baltimore, Maryland 21205, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Iliadis A, Anastassiou D, Wang X. A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2014; 2014:7. [PMID: 24868199 PMCID: PMC4017783 DOI: 10.1186/1687-4153-2014-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2013] [Accepted: 03/26/2014] [Indexed: 11/25/2022]
Abstract
Copy number variations (CNVs) are abundant in the human genome. They have been associated with complex traits in genome-wide association studies (GWAS) and expected to continue playing an important role in identifying the etiology of disease phenotypes. As a result of current high throughput whole-genome single-nucleotide polymorphism (SNP) arrays, we currently have datasets that simultaneously have integer copy numbers in CNV regions as well as SNP genotypes. At the same time, haplotypes that have been shown to offer advantages over genotypes in identifying disease traits even though available for SNP genotypes are largely not available for CNV/SNP data due to insufficient computational tools. We introduce a new framework for inferring haplotypes in CNV/SNP data using a sequential Monte Carlo sampling scheme ‘Tree-Based Deterministic Sampling CNV’ (TDSCNV). We compare our method with polyHap(v2.0), the only currently available software able to perform inference in CNV/SNP genotypes, on datasets of varying number of markers. We have found that both algorithms show similar accuracy but TDSCNV is an order of magnitude faster while scaling linearly with the number of markers and number of individuals and thus could be the method of choice for haplotype inference in such datasets. Our method is implemented in the TDSCNV package which is available for download at http://www.ee.columbia.edu/~anastas/tdscnv.
Collapse
Affiliation(s)
- Alexandros Iliadis
- Department of Electrical Engineering, Center for Computational Biology Bioinformatics and Columbia University, New York, NY 10027, USA
| | - Dimitris Anastassiou
- Department of Electrical Engineering, Center for Computational Biology Bioinformatics and Columbia University, New York, NY 10027, USA
| | - Xiaodong Wang
- Department of Electrical Engineering, Center for Computational Biology Bioinformatics and Columbia University, New York, NY 10027, USA
| |
Collapse
|
18
|
Ho Jang G, Christie JD, Feng R. A method for calling copy number polymorphism using haplotypes. Front Genet 2013; 4:165. [PMID: 24069028 PMCID: PMC3780619 DOI: 10.3389/fgene.2013.00165] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 08/07/2013] [Indexed: 12/15/2022] Open
Abstract
Single nucleotide polymorphism (SNP) and copy number variation (CNV) are both widespread characteristic of the human genome, but are often called separately on common genotyping platforms. To capture integrated SNP and CNV information, methods have been developed for calling allelic specific copy numbers or so called copy number polymorphism (CNP), using limited inter-marker correlation. In this paper, we proposed a haplotype-based maximum likelihood method to call CNP, which takes advantage of the valuable multi-locus linkage disequilibrium (LD) information in the population. We also developed a computationally efficient algorithm to estimate haplotype frequencies and optimize individual CNP calls iteratively, even at presence of missing data. Through simulations, we demonstrated our model is more sensitive and accurate in detecting various CNV regions, compared with commonly-used CNV calling methods including PennCNV, another hidden Markov model (HMM) using CNP, a scan statistic, segCNV, and cnvHap. Our method often performs better in the regions with higher LD, in longer CNV regions, and in common CNV than the opposite. We implemented our method on the genotypes of 90 HapMap CEU samples and 23 patients with acute lung injury (ALI). For each ALI patient the genotyping was performed twice. The CNPs from our method show good consistency and accuracy comparable to others.
Collapse
Affiliation(s)
- Gun Ho Jang
- Department of Biostatistics and Epidemiology, Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania Philadelphia, PA, USA
| | | | | |
Collapse
|
19
|
Impact of allele copy number of polymorphisms in FCGR3A and FCGR3B genes on susceptibility to ulcerative colitis. Inflamm Bowel Dis 2013; 19:2061-8. [PMID: 23917248 DOI: 10.1097/mib.0b013e318298118e] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
BACKGROUND Polymorphisms in the Fcγ receptor genes have been implicated in several autoimmune diseases, including ulcerative colitis (UC). However, most of these reports had not taken into account the effect of copy number variation at this region. METHODS We investigated the combined effect of allele and gene copy number of FCGR3A-158F/V and FCGR3B-NA1/NA2 on susceptibility to UC. Study subjects were composed of a total of 752 Japanese patients with UC and 2062 Japanese control subjects. To estimate allele copy number of the 2 polymorphisms, we integrated the results of PCR-based real-time Invader assay (PCR-RETINA) that measures the allelic ratio and Taqman assay that detects the total copy number. We analyzed the associations of allele copy number with UC using logistic regression model. RESULTS Gene and allele copy numbers of FCGR3A and FCGR3B were successfully determined in more than 99.5% of the study subjects. Allele copy number of FCGR3A-158F/V demonstrated significant association with susceptibility to UC (P = 0.02), although each single-nucleotide polymorphism and copy number variation alone did not show significant association. Although allele copy number of FCGR3B-NA1/NA2 (P = 0.002) also showed significant association with UC susceptibility, this association seemed to reflect the effect of FCGR3B gene copy number. Subsequent haplotype analyses revealed a strong association of a haplotype FCGR2A-131H/R and copy number of FCGR3B gene (P = 6.5 × 10). CONCLUSIONS Allele copy number of FCGR3A-158F/V and FCGR3B gene copy number were associated with UC susceptibility. Our results suggest that organizing handling of immune complex by FCGR3A, FCGR3B, and FCGR2A may play a crucial role in the pathogenesis of UC.
Collapse
|
20
|
Bánlaki Z, Szabó JA, Szilágyi Á, Patócs A, Prohászka Z, Füst G, Doleschall M. Intraspecific evolution of human RCCX copy number variation traced by haplotypes of the CYP21A2 gene. Genome Biol Evol 2013; 5:98-112. [PMID: 23241443 PMCID: PMC3595039 DOI: 10.1093/gbe/evs121] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The RCCX region is a complex, multiallelic, tandem copy number variation (CNV). Two complete genes, complement component 4 (C4) and steroid 21-hydroxylase (CYP21A2, formerly CYP21B), reside in its variable region. RCCX is prone to nonallelic homologous recombination (NAHR) such as unequal crossover, generating duplications and deletions of RCCX modules, and gene conversion. A series of allele-specific long-range polymerase chain reaction coupled to the whole-gene sequencing of CYP21A2 was developed for molecular haplotyping. By means of the developed techniques, 35 different kinds of CYP21A2 haplotype variant were experimentally determined from 112 unrelated European subjects. The number of the resolved CYP21A2 haplotype variants was increased to 61 by bioinformatic haplotype reconstruction. The CYP21A2 haplotype variants could be assigned to the haplotypic RCCX CNV structures (the copy number of RCCX modules) in most cases. The genealogy network constructed from the CYP21A2 haplotype variants delineated the origin of RCCX structures. The different RCCX structures were located in tight groups. The minority of groups with identical RCCX structure occurred once in the network, implying monophyletic origin, but the majority of groups occurred several times and in different locations, indicating polyphyletic origin. The monophyletic groups were often created by single unequal crossover, whereas recurrent unequal crossover events generated some of the polyphyletic groups. As a result of recurrent NAHR events, more CYP21A2 haplotype variants with different allele patterns belonged to the same RCCX structure. The intraspecific evolution of RCCX CNV described here has provided a reasonable expectation for that of complex, multiallelic, tandem CNVs in humans.
Collapse
Affiliation(s)
- Zsófia Bánlaki
- 3rd Department of Internal Medicine, Semmelweis University, Budapest, Hungary
| | | | | | | | | | | | | |
Collapse
|
21
|
Pandey RV, Franssen SU, Futschik A, Schlötterer C. Allelic imbalance metre (Allim), a new tool for measuring allele-specific gene expression with RNA-seq data. Mol Ecol Resour 2013; 13:740-5. [PMID: 23615333 PMCID: PMC3739924 DOI: 10.1111/1755-0998.12110] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Revised: 03/18/2013] [Accepted: 03/22/2013] [Indexed: 11/29/2022]
Abstract
Estimating differences in gene expression among alleles is of high interest for many areas in biology and medicine. Here, we present a user-friendly software tool, Allim, to estimate allele-specific gene expression. Because mapping bias is a major problem for reliable estimates of allele-specific gene expression using RNA-seq, Allim combines two different strategies to account for the mapping biases. In order to reduce the mapping bias, Allim first generates a polymorphism-aware reference genome that accounts for the sequence variation between the alleles. Then, a sequence-specific simulation tool estimates the residual mapping bias. Statistical tests for allelic imbalance are provided that can be used with the bias corrected RNA-seq data.
Collapse
Affiliation(s)
- Ram Vinay Pandey
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | | | | | | |
Collapse
|
22
|
Scharpf RB, Beaty TH, Schwender H, Younkin SG, Scott AF, Ruczinski I. Fast detection of de novo copy number variants from SNP arrays for case-parent trios. BMC Bioinformatics 2012; 13:330. [PMID: 23234608 PMCID: PMC3576329 DOI: 10.1186/1471-2105-13-330] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Accepted: 12/07/2012] [Indexed: 11/10/2022] Open
Abstract
Background In studies of case-parent trios, we define copy number variants (CNVs) in the offspring that differ from the parental copy numbers as de novo and of interest for their potential functional role in disease. Among the leading array-based methods for discovery of de novo CNVs in case-parent trios is the joint hidden Markov model (HMM) implemented in the PennCNV software. However, the computational demands of the joint HMM are substantial and the extent to which false positive identifications occur in case-parent trios has not been well described. We evaluate these issues in a study of oral cleft case-parent trios. Results Our analysis of the oral cleft trios reveals that genomic waves represent a substantial source of false positive identifications in the joint HMM, despite a wave-correction implementation in PennCNV. In addition, the noise of low-level summaries of relative copy number (log R ratios) is strongly associated with batch and correlated with the frequency of de novo CNV calls. Exploiting the trio design, we propose a univariate statistic for relative copy number referred to as the minimum distance that can reduce technical variation from probe effects and genomic waves. We use circular binary segmentation to segment the minimum distance and maximum a posteriori estimation to infer de novo CNVs from the segmented genome. Compared to PennCNV on simulated data, MinimumDistance identifies fewer false positives on average and is comparable to PennCNV with respect to false negatives. Genomic waves contribute to discordance of PennCNV and MinimumDistance for high coverage de novo calls, while highly concordant calls on chromosome 22 were validated by quantitative PCR. Computationally, MinimumDistance provides a nearly 8-fold increase in speed relative to the joint HMM in a study of oral cleft trios. Conclusions Our results indicate that batch effects and genomic waves are important considerations for case-parent studies of de novo CNV, and that the minimum distance is an effective statistic for reducing technical variation contributing to false de novo discoveries. Coupled with segmentation and maximum a posteriori estimation, our algorithm compares favorably to the joint HMM with MinimumDistance being much faster.
Collapse
Affiliation(s)
- Robert B Scharpf
- Department of Oncology, Johns Hopkins University, Baltimore, MD, USA.
| | | | | | | | | | | |
Collapse
|
23
|
Liao B, Li X, Zhu W, Cao Z. A novel method to select informative SNPs and their application in genetic association studies. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1529-1534. [PMID: 22585142 DOI: 10.1109/tcbb.2012.70] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The association studies between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes have recently received great attention. However, these studies are limited by the cost of genotyping all SNPs. Therefore, it is essential to find a small subset of tag SNPs representing the rest of the SNPs. The presence of linkage disequilibrium between tag SNPs and the disease variant (genotyped or not), may allow fine mapping study. In this paper, we combine a nearest-means classifier (NMC) and ant colony algorithm to select tags. Results show that our method (ACO/NMC) can get a similar prediction accuracy with method BPSO/SVM and is better than BPSO/STAMPA for small data sets. For large data sets, although the prediction accuracy of our method is lower than BPSO/SVM, ACO/NMC can reach a high accuracy (>99 percent) in a relatively short time. when the number of tags increases, the time complexity of NMC is nearly linear growth. To find out that the ability of tags to locate disease locus, we simulate a case-control study and use two-locus haplotype analysis to quantitatively assess the power. The result showed that 20 percent of all SNPs selected by NMC have about 10 percent higher power than random tags, on average.
Collapse
Affiliation(s)
- Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China.
| | | | | | | |
Collapse
|
24
|
Abstract
Determination of haplotype phase is becoming increasingly important as we enter the era of large-scale sequencing because many of its applications, such as imputing low-frequency variants and characterizing the relationship between genetic variation and disease susceptibility, are particularly relevant to sequence data. Haplotype phase can be generated through laboratory-based experimental methods, or it can be estimated using computational approaches. We assess the haplotype phasing methods that are available, focusing in particular on statistical methods, and we discuss the practical aspects of their application. We also describe recent developments that may transform this field, particularly the use of identity-by-descent for computational phasing.
Collapse
|
25
|
Trewick AL, Moustafa JSES, de Smith AJ, Froguel P, Greve G, Njølstad PR, Coin LJM, Blakemore AIF. Accurate single-nucleotide polymorphism allele assignment in trisomic or duplicated regions by using a single base-extension assay with MALDI-TOF mass spectrometry. Clin Chem 2011; 57:1188-95. [PMID: 21677093 DOI: 10.1373/clinchem.2010.159558] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
BACKGROUND The accurate assignment of alleles embedded within trisomic or duplicated regions is an essential prerequisite for assessing the combined effects of single-nucleotide polymorphisms (SNPs) and genomic copy number. Such an integrated analysis is challenging because heterozygotes for such a SNP may be one of 2 genotypes-AAB or ABB. Established methods for SNP genotyping, however, can have difficulty discriminating between the 2 heterozygous trisomic genotypes. We developed a method for assigning heterozygous trisomic genotypes that uses the ratio of the height of the 2 allele peaks obtained by mass spectrometry after a single-base extension assay. METHODS Eighteen COL6A2 (collagen, type VI, alpha 2) SNPs were analyzed in euploid and trisomic individuals by means of a multiplexed single-base extension assay that generated allele-specific oligonucleotides of differing M(r) values for detection by MALDI-TOF mass spectrometry. Reference data (mean and SD) for the allele peak height ratios were determined from heterozygous euploid samples. The heterozygous trisomic genotypes were assigned by calculating the z score for each trisomic allele peak height ratio and by considering the sign (+/-) of the z score. RESULTS Heterozygous trisomic genotypes were assigned in 96.1% (range, 89.9%-100%) of the samples for each SNP analyzed. The genotypes obtained were reproduced in 95 (97.5%) of 97 loci retested in a second assay. Subsequently, the origin of nondisjunction was determined in 108 (82%) of 132 family trios with a Down syndrome child. CONCLUSIONS This approach enabled reliable genotyping of heterozygous trisomic samples and the determination of the origin of nondisjunction in Down syndrome family trios.
Collapse
Affiliation(s)
- Anne L Trewick
- Department of Genomics of Common Disease, School of Public Health, Imperial College London, London, UK
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Inferring haplotypes of copy number variations from high-throughput data with uncertainty. G3-GENES GENOMES GENETICS 2011; 1:35-42. [PMID: 22384316 PMCID: PMC3276117 DOI: 10.1534/g3.111.000174] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2010] [Accepted: 03/14/2011] [Indexed: 11/18/2022]
Abstract
Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals' diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1-2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12-18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs.
Collapse
|
27
|
Huang YT, Wu MH. Inference of chromosome-specific copy numbers using population haplotypes. BMC Bioinformatics 2011; 12:194. [PMID: 21605463 PMCID: PMC3128032 DOI: 10.1186/1471-2105-12-194] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2010] [Accepted: 05/24/2011] [Indexed: 12/28/2022] Open
Abstract
Background Using microarray and sequencing platforms, a large number of copy number variations (CNVs) have been identified in humans. In practice, because our human genome is a diploid, these platforms are limited to or more accurate for detecting total copy numbers rather than chromosome-specific copy numbers at each of the two homologous chromosomes. Nevertheless, the analysis of linkage disequilibrium (LD) between CNVs and SNPs indicates that distinct copy numbers often sit on their own background haplotypes. Results We propose new computational models for inferring chromosome-specific copy numbers by distinguishing background haplotypes of each copy number. The formulated problems are shown to be NP-hard and approximation/heuristic algorithms are developed. Simulation indicates that our method is accurate and outperforms the existing approach. By testing the program in 60 parent-offspring trios, the inferred chromosome-specific copy numbers are highly consistent with the law of Mendelian inheritance. The distributions of copy numbers at chromosomal level are provided for 270 individuals in three HapMap panels. Conclusions The estimation of chromosome-specific copy numbers using microarray or sequencing platforms was often confounded by a number of factors. This study showed that the integration of background haplotypes is able to improve the accuracies of copy number estimation at chromosome level, especially for the CNVs having strong LD with SNPs in proximity.
Collapse
Affiliation(s)
- Yao-Ting Huang
- Department of Computer Science and Information Engineering, National Chung Cheng University, Chia-Yi, Taiwan.
| | | |
Collapse
|
28
|
Wineinger NE, Pajewski NM, Tiwari HK. A Method to Assess Linkage Disequilibrium between CNVs and SNPs Inside Copy Number Variable Regions. Front Genet 2011; 2:17. [PMID: 21660233 PMCID: PMC3109359 DOI: 10.3389/fgene.2011.00017] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 03/31/2011] [Indexed: 11/23/2022] Open
Abstract
Since the discovery of the ubiquitous contribution of copy number variation to genetic variability, researchers have commonly used metrics such as r2 to quantify linkage disequilibrium (LD) between copy number variants (CNVs) and single nucleotide polymorphisms (SNPs). However, these reports have been restricted to SNPs outside copy number variable regions (CNVR) as current methods have not been adapted to account for SNPs displaying variable copy number. We show that traditional LD metrics inappropriately quantify SNP/CNV covariance when SNPs lie within CNVR. We derive a new method for measuring LD that solves this issue, and defaults to traditional metrics otherwise. Finally, we present a procedure to estimate CNV–SNP allele frequencies from unphased CNV–SNP genotypes. Our method allows researchers to include all SNPs in SNP/CNV LD measurements, regardless of copy number.
Collapse
Affiliation(s)
- Nathan E Wineinger
- Department of Biostatistics, University of Alabama at Birmingham Birmingham, AL, USA
| | | | | |
Collapse
|
29
|
Turro E, Su SY, Gonçalves Â, Coin LJM, Richardson S, Lewin A. Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads. Genome Biol 2011; 12:R13. [PMID: 21310039 PMCID: PMC3188795 DOI: 10.1186/gb-2011-12-2-r13] [Citation(s) in RCA: 172] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2010] [Revised: 11/17/2010] [Accepted: 02/10/2011] [Indexed: 11/11/2022] Open
Abstract
We present a novel pipeline and methodology for simultaneously estimating isoform expression and allelic imbalance in diploid organisms using RNA-seq data. We achieve this by modeling the expression of haplotype-specific isoforms. If unknown, the two parental isoform sequences can be individually reconstructed. A new statistical method, MMSEQ, deconvolves the mapping of reads to multiple transcripts (isoforms or haplotype-specific isoforms). Our software can take into account non-uniform read generation and works with paired-end reads.
Collapse
Affiliation(s)
- Ernest Turro
- Department of Epidemiology and Biostatistics, Imperial College London, Norfolk Place, London, W2 1PG, UK.
| | | | | | | | | | | |
Collapse
|
30
|
Whole-genome molecular haplotyping of single cells. Nat Biotechnol 2010; 29:51-7. [PMID: 21170043 DOI: 10.1038/nbt.1739] [Citation(s) in RCA: 270] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2010] [Accepted: 11/24/2010] [Indexed: 01/22/2023]
Abstract
Conventional experimental methods of studying the human genome are limited by the inability to independently study the combination of alleles, or haplotype, on each of the homologous copies of the chromosomes. We developed a microfluidic device capable of separating and amplifying homologous copies of each chromosome from a single human metaphase cell. Single-nucleotide polymorphism (SNP) array analysis of amplified DNA enabled us to achieve completely deterministic, whole-genome, personal haplotypes of four individuals, including a HapMap trio with European ancestry (CEU) and an unrelated European individual. The phases of alleles were determined at ∼99.8% accuracy for up to ∼96% of all assayed SNPs. We demonstrate several practical applications, including direct observation of recombination events in a family trio, deterministic phasing of deletions in individuals and direct measurement of the human leukocyte antigen haplotypes of an individual. Our approach has potential applications in personal genomics, single-cell genomics and statistical genetics.
Collapse
|