1
|
Mbo Nkoulou LF, Ngalle HB, Cros D, Adje COA, Fassinou NVH, Bell J, Achigan-Dako EG. Perspective for genomic-enabled prediction against black sigatoka disease and drought stress in polyploid species. FRONTIERS IN PLANT SCIENCE 2022; 13:953133. [PMID: 36388523 PMCID: PMC9650417 DOI: 10.3389/fpls.2022.953133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 09/28/2022] [Indexed: 06/16/2023]
Abstract
Genomic selection (GS) in plant breeding is explored as a promising tool to solve the problems related to the biotic and abiotic threats. Polyploid plants like bananas (Musa spp.) face the problem of drought and black sigatoka disease (BSD) that restrict their production. The conventional plant breeding is experiencing difficulties, particularly phenotyping costs and long generation interval. To overcome these difficulties, GS in plant breeding is explored as an alternative with a great potential for reducing costs and time in selection process. So far, GS does not have the same success in polyploid plants as with diploid plants because of the complexity of their genome. In this review, we present the main constraints to the application of GS in polyploid plants and the prospects for overcoming these constraints. Particular emphasis is placed on breeding for BSD and drought-two major threats to banana production-used in this review as a model of polyploid plant. It emerges that the difficulty in obtaining markers of good quality in polyploids is the first challenge of GS on polyploid plants, because the main tools used were developed for diploid species. In addition to that, there is a big challenge of mastering genetic interactions such as dominance and epistasis effects as well as the genotype by environment interaction, which are very common in polyploid plants. To get around these challenges, we have presented bioinformatics tools, as well as artificial intelligence approaches, including machine learning. Furthermore, a scheme for applying GS to banana for BSD and drought has been proposed. This review is of paramount impact for breeding programs that seek to reduce the selection cycle of polyploids despite the complexity of their genome.
Collapse
Affiliation(s)
- Luther Fort Mbo Nkoulou
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
- Institute of Agricultural Research for Development, Centre de Recherche Agricole de Mbalmayo (CRAM), Mbalmayo, Cameroon
| | - Hermine Bille Ngalle
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
| | - David Cros
- Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Unité Mixte de Recherche (UMR) Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP) Institut, Montpellier, France
- Unité Mixte de Recherche (UMR) Amélioration Génétique et Adaptation des Plantes méditerranéennes et tropicales (AGAP) Institut, University of Montpellier, Centre de Coopération Internationale en Recherche Agronomique pour le Développement (CIRAD), Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Institut Agro, Montpellier, France
| | - Charlotte O. A. Adje
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| | - Nicodeme V. H. Fassinou
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| | - Joseph Bell
- Unit of Genetics and Plant Breeding (UGAP), Department of Plant Biology, Faculty of Sciences, University of Yaoundé 1, Yaoundé, Cameroon
| | - Enoch G. Achigan-Dako
- Genetics, Biotechnology, and Seed Science Unit (GBioS), Department of Plant Sciences, Faculty of Agronomic Sciences, University of Abomey Calavi, Cotonou, Benin
| |
Collapse
|
2
|
Voorrips RE, Tumino G. PolyHaplotyper: haplotyping in polyploids based on bi-allelic marker dosage data. BMC Bioinformatics 2022; 23:442. [PMID: 36274121 PMCID: PMC9590153 DOI: 10.1186/s12859-022-04989-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 10/16/2022] [Indexed: 11/18/2022] Open
Abstract
Background For genetic analyses, multi-allelic markers have an advantage over bi-allelic markers like SNPs (single nucleotide polymorphisms) in that they carry more information about the genetic constitution of individuals. This is especially the case in polyploids, where individuals carry more than two alleles at each locus. Haploblocks are multi-allelic markers that can be derived by phasing sets of closely-linked SNP markers. Phased haploblocks, similarly to other multi-allelic markers, will therefore be advantageous in genetic tasks like linkage mapping, QTL mapping and genome-wide association studies. Results We present a new method to reconstruct haplotypes from SNP dosages derived from genotyping arrays, which is applicable to polyploids. This method is implemented in the software package PolyHaplotyper. In contrast to existing packages for polyploids it makes use of full-sib families among the samples to guide the haplotyping process. We show that in this situation it is much more accurate than other available software, using experimental hexaploid data and simulated tetraploid data. Conclusions Our method and the software package PolyHaplotyper in which it is implemented extend the available tools for haplotyping in polyploids. They perform especially well in situations where one or more full-sib families are present. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04989-0.
Collapse
|
3
|
Saada OA, Friedrich A, Schacherer J. Towards accurate, contiguous and complete alignment-based polyploid phasing algorithms. Genomics 2022; 114:110369. [PMID: 35483655 DOI: 10.1016/j.ygeno.2022.110369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 03/09/2022] [Accepted: 04/11/2022] [Indexed: 01/14/2023]
Abstract
Phasing, and in particular polyploid phasing, have been challenging problems held back by the limited read length of high-throughput short read sequencing methods which can't overcome the distance between heterozygous sites and labor high cost of alternative methods such as the physical separation of chromosomes for example. Recently developed single molecule long-read sequencing methods provide much longer reads which overcome this previous limitation. Here we review the alignment-based methods of polyploid phasing that rely on four main strategies: population inference methods, which leverage the genetic information of several individuals to phase a sample; objective function minimization methods, which minimize a function such as the Minimum Error Correction (MEC); graph partitioning methods, which represent the read data as a graph and split it into k haplotype subgraphs; cluster building methods, which iteratively grow clusters of similar reads into a final set of clusters that represent the haplotypes. We discuss the advantages and limitations of these methods and the metrics used to assess their performance, proposing that accuracy and contiguity are the most meaningful metrics. Finally, we propose the field of alignment-based polyploid phasing would greatly benefit from the use of a well-designed benchmarking dataset with appropriate evaluation metrics. We consider that there are still significant improvements which can be achieved to obtain more accurate and contiguous polyploid phasing results which reflect the complexity of polyploid genome architectures.
Collapse
Affiliation(s)
- Omar Abou Saada
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
| | - Anne Friedrich
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
| | - Joseph Schacherer
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France; Institut Universitaire de France (IUF), Paris, France.
| |
Collapse
|
4
|
Huang K, Huber G, Ritland K, Dunn DW, Li B. Performing parentage analysis for polysomic inheritances based on allelic phenotypes. G3-GENES GENOMES GENETICS 2021; 11:6080682. [PMID: 33585871 PMCID: PMC8022955 DOI: 10.1093/g3journal/jkaa064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 11/09/2020] [Indexed: 11/26/2022]
Abstract
Polyploidy poses several problems for parentage analysis. We present a new polysomic inheritance model for parentage analysis based on genotypes or allelic phenotypes to solve these problems. The effects of five factors are simultaneously accommodated in this model: (1) double-reduction, (2) null alleles, (3) negative amplification, (4) genotyping errors and (5) self-fertilization. To solve genotyping ambiguity (unknown allele dosage), we developed a new method to establish the likelihood formulas for allelic phenotype data and to simultaneously include the effects of our five chosen factors. We then evaluated and compared the performance of our new method with three established methods by using both simulated data and empirical data from the cultivated blueberry (Vaccinium corymbosum). We also developed and compared the performance of two additional estimators to estimate the genotyping error rate and the sample rate. We make our new methods freely available in the software package polygene, at http://github.com/huangkang1987/polygene.
Collapse
Affiliation(s)
- Kang Huang
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an 710069, China.,Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC V6T1Z4, Canada
| | - Gwendolyn Huber
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC V6T1Z4, Canada
| | - Kermit Ritland
- Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC V6T1Z4, Canada
| | - Derek W Dunn
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an 710069, China
| | - Baoguo Li
- Shaanxi Key Laboratory for Animal Conservation, College of Life Sciences, Northwest University, Xi'an 710069, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China
| |
Collapse
|
5
|
Mazrouee S, Wang W. PolyCluster: Minimum Fragment Disagreement Clustering for Polyploid Phasing. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:264-277. [PMID: 30040655 DOI: 10.1109/tcbb.2018.2858803] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Phasing is an emerging area in computational biology with important applications in clinical decision making and biomedical sciences. While machine learning techniques have shown tremendous potential in many biomedical applications, their utility in phasing has not yet been fully understood. In this paper, we investigate development of clustering-based techniques for phasing in polyploidy organisms where more than two copies of each chromosome exist in the cells of the organism under study. We develop a novel framework, called PolyCluster, based on the concept of correlation clustering followed by an effective cluster merging mechanism to minimize the amount of disagreement among short reads residing in each cluster. We first introduce a graph model to quantify the amount of similarity between each pair of DNA reads. We then present a combination of linear programming, rounding, region-growing, and cluster merging to group similar reads and reconstruct haplotypes. Our extensive analysis demonstrates the effectiveness of PolyCluster in accurate and scalable phasing. In particular, we show that PolyCluster reduces switching error of H-PoP, HapColor, and HapTree by 44.4, 51.2, and 48.3 percent, respectively. Also, the running time of PolyCluster is several orders-of-magnitude less than HapTree while it achieves a running time comparable to other algorithms.
Collapse
|
6
|
Malikic S, Mehrabadi FR, Ciccolella S, Rahman MK, Ricketts C, Haghshenas E, Seidman D, Hach F, Hajirasouliha I, Sahinalp SC. PhISCS: a combinatorial approach for subperfect tumor phylogeny reconstruction via integrative use of single-cell and bulk sequencing data. Genome Res 2019; 29:1860-1877. [PMID: 31628256 PMCID: PMC6836735 DOI: 10.1101/gr.234435.118] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 09/11/2019] [Indexed: 12/29/2022]
Abstract
Available computational methods for tumor phylogeny inference via single-cell sequencing (SCS) data typically aim to identify the most likely perfect phylogeny tree satisfying the infinite sites assumption (ISA). However, the limitations of SCS technologies including frequent allele dropout and variable sequence coverage may prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions, and convergent evolution. In order to address such limitations, we introduce the optimal subperfect phylogeny problem which asks to integrate SCS data with matching bulk sequencing data by minimizing a linear combination of potential false negatives (due to allele dropout or variance in sequence coverage), false positives (due to read errors) among mutation calls, and the number of mutations that violate ISA (real or because of incorrect copy number estimation). We then describe a combinatorial formulation to solve this problem which ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and—as a first in tumor phylogeny reconstruction—a Boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data while accounting for ISA violating mutations. In contrast to the alternative methods, typically based on probabilistic approaches, PhISCS provides a guarantee of optimality in reported solutions. Using simulated and real data sets, we demonstrate that PhISCS is more general and accurate than all available approaches.
Collapse
Affiliation(s)
- Salem Malikic
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Farid Rashidi Mehrabadi
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA.,Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Simone Ciccolella
- Department of Computer Systems and Communication, University of Milano-Bicocca, 20136 Milan, Italy.,Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA
| | - Md Khaledur Rahman
- Department of Computer Science, Indiana University, Bloomington, Indiana 47408, USA
| | - Camir Ricketts
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA.,Tri-I Computational Biology and Medicine Graduate Program, Cornell University, New York, New York 10065, USA
| | - Ehsan Haghshenas
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
| | - Daniel Seidman
- Tri-I Computational Biology and Medicine Graduate Program, Cornell University, New York, New York 10065, USA
| | - Faraz Hach
- School of Computing Science, Simon Fraser University, Burnaby, BC V5A 1S6, Canada.,Department of Urologic Sciences, University of British Columbia, Vancouver, BC V5Z 1M9, Canada.,Vancouver Prostate Centre, Vancouver, BC V6H 3Z6, Canada
| | - Iman Hajirasouliha
- Institute for Computational Biomedicine, Weill Cornell Medicine, New York, New York 10065, USA.,Department of Physiology and Biophysics, Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, New York 10065, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| |
Collapse
|
7
|
Bourke PM, Voorrips RE, Visser RGF, Maliepaard C. Tools for Genetic Studies in Experimental Populations of Polyploids. FRONTIERS IN PLANT SCIENCE 2018; 9:513. [PMID: 29720992 PMCID: PMC5915555 DOI: 10.3389/fpls.2018.00513] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 04/04/2018] [Indexed: 05/19/2023]
Abstract
Polyploid organisms carry more than two copies of each chromosome, a condition rarely tolerated in animals but which occurs relatively frequently in the plant kingdom. One of the principal challenges faced by polyploid organisms is to evolve stable meiotic mechanisms to faithfully transmit genetic information to the next generation upon which the study of inheritance is based. In this review we look at the tools available to the research community to better understand polyploid inheritance, many of which have only recently been developed. Most of these tools are intended for experimental populations (rather than natural populations), facilitating genomics-assisted crop improvement and plant breeding. This is hardly surprising given that a large proportion of domesticated plant species are polyploid. We focus on three main areas: (1) polyploid genotyping; (2) genetic and physical mapping; and (3) quantitative trait analysis and genomic selection. We also briefly review some miscellaneous topics such as the mode of inheritance and the availability of polyploid simulation software. The current polyploid analytic toolbox includes software for assigning marker genotypes (and in particular, estimating the dosage of marker alleles in the heterozygous condition), establishing chromosome-scale linkage phase among marker alleles, constructing (short-range) haplotypes, generating linkage maps, performing genome-wide association studies (GWAS) and quantitative trait locus (QTL) analyses, and simulating polyploid populations. These tools can also help elucidate the mode of inheritance (disomic, polysomic or a mixture of both as in segmental allopolyploids) or reveal whether double reduction and multivalent chromosomal pairing occur. An increasing number of polyploids (or associated diploids) are being sequenced, leading to publicly available reference genome assemblies. Much work remains in order to keep pace with developments in genomic technologies. However, such technologies also offer the promise of understanding polyploid genomes at a level which hitherto has remained elusive.
Collapse
Affiliation(s)
| | | | | | - Chris Maliepaard
- Plant Breeding, Wageningen University & Research, Wageningen, Netherlands
| |
Collapse
|
8
|
Muiruri KS, Britt A, Amugune NO, Nguu E, Chan S, Tripathi L. Dominant Allele Phylogeny and Constitutive Subgenome Haplotype Inference in Bananas Using Mitochondrial and Nuclear Markers. Genome Biol Evol 2017; 9:2510-2521. [PMID: 28992303 PMCID: PMC5629815 DOI: 10.1093/gbe/evx167] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/27/2017] [Indexed: 12/22/2022] Open
Abstract
Cultivated bananas (Musa spp.) have undergone domestication patterns involving crosses of wild progenitors followed by long periods of clonal propagation. Majority of cultivated bananas are polyploids with different constitutive subgenomes and knowledge on phylogenies to their progenitors at the species and subspecies levels is essential. Here, the mitochondrial (NAD1) and nuclear (CENH3) markers were used to phylogenetically position cultivated banana genotypes to diploid progenitors. The CENH3 nuclear marker was used to identify a minimum representative haplotype number in polyploids and diploid bananas based on single nucleotide polymorphisms. The mitochondrial marker NAD1 was observed to be ideal in differentiating bananas of different genomic constitutions based on size of amplicons as well as sequence. The genotypes phylogenetically segregated based on the dominant genome; AAB genotypes grouped with AA and AAA, and the ABB together with BB. Both markers differentiated banana sections, but could not differentiate subspecies within the A genomic group. On the basis of CENH3 marker, a total of 13 haplotypes (five in both diploid and triploid, three in diploids, and rest unique to triploids) were identified from the genotypes tested. The presence of haplotypes, which were common in diploids and triploids, stipulate possibility of a shared ancestry in the genotypes involved in this study. Furthermore, the presence of multiple haplotypes in some diploid bananas indicates their being heterozygous. The haplotypes identified in this study are of importance because they can be used to check the level of homozygozity in breeding lines as well as to track segregation in progenies.
Collapse
Affiliation(s)
- Kariuki Samwel Muiruri
- International Institute of Tropical Agriculture (IITA), Nairobi, Kenya
- School of Biological Sciences, University of Nairobi, Kenya
| | - Anne Britt
- Department of Plant Biology, University of California, Davis
| | | | - Edward Nguu
- Department of Biochemistry, University of Nairobi, Kenya
| | - Simon Chan
- Department of Plant Biology, University of California, Davis
| | - Leena Tripathi
- International Institute of Tropical Agriculture (IITA), Nairobi, Kenya
| |
Collapse
|
9
|
Shen J, Li Z, Chen J, Song Z, Zhou Z, Shi Y. SHEsisPlus, a toolset for genetic studies on polyploid species. Sci Rep 2016; 6:24095. [PMID: 27048905 PMCID: PMC4822172 DOI: 10.1038/srep24095] [Citation(s) in RCA: 65] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 03/17/2016] [Indexed: 11/09/2022] Open
Abstract
Currently, algorithms and softwares for genetic analysis of diploid organisms with bi-allelic markers are well-established, while those for polyploids are limited. Here, we present SHEsisPlus, the online algorithm toolset for both dichotomous and quantitative trait genetic analysis on polyploid species (compatible with haploids and diploids, too). SHEsisPlus is also optimized for handling multiple-allele datasets. It's free, open source and also designed to perform a range of analyses, including haplotype inference, linkage disequilibrium analysis, epistasis detection, Hardy-Weinberg equilibrium and single locus association tests. Meanwhile, we developed an accurate and efficient haplotype inference algorithm for polyploids and proposed an entropy-based algorithm to detect epistasis in the context of quantitative traits. A study of both simulated and real datasets showed that our haplotype inference algorithm was much faster and more accurate than existing ones. Our epistasis detection algorithm was the first try to apply information theory to characterizing the gene interactions in quantitative trait datasets. Results showed that its statistical power was significantly higher than conventional approaches. SHEsisPlus is freely available on the web at http://shesisplus.bio-x.cn/. Source code is freely available for download at https://github.com/celaoforever/SHEsisPlus.
Collapse
Affiliation(s)
- Jiawei Shen
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,School of Bio-medical Engineering, Shanghai Jiao Tong University, Shanghai 200230, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Zhiqiang Li
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Jianhua Chen
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Zhijian Song
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University, Shanghai 200240, P.R. China
| | - Zhaowei Zhou
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,Shandong Provincial Key Laboratory of Metabolic Disease, the Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao 266003, China.,Institute of Clinical Research, the Affiliated Hospital of Qingdao University, 16 Jiangsu Road, Qingdao 266003, China
| | - Yongyong Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education) and the Collaborative Innovation Center for Brain Science, Shanghai Jiao Tong University, Shanghai 200030, P.R. China.,School of Bio-medical Engineering, Shanghai Jiao Tong University, Shanghai 200230, P.R. China.,Shanghai Changning Mental Health Center, Shanghai 200042, P.R. China.,Department of Psychiatry, the First Teaching Hospital of Xinjiang Medical University, Urumqi 830054, P.R. China
| |
Collapse
|
10
|
Homolka A, Eder T, Kopecky D, Berenyi M, Burg K, Fluch S. Allele discovery of ten candidate drought-response genes in Austrian oak using a systematically informatics approach based on 454 amplicon sequencing. BMC Res Notes 2012; 5:175. [PMID: 22472016 PMCID: PMC3420255 DOI: 10.1186/1756-0500-5-175] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2011] [Accepted: 04/03/2012] [Indexed: 12/01/2022] Open
Abstract
Background Rise of temperatures and shortening of available water as result of predicted climate change will impose significant pressure on long-lived forest tree species. Discovering allelic variation present in drought related genes of two Austrian oak species can be the key to understand mechanisms of natural selection and provide forestry with key tools to cope with future challenges. Results In the present study we have used Roche 454 sequencing and developed a bioinformatic pipeline to process multiplexed tagged amplicons in order to identify single nucleotide polymorphisms and allelic sequences of ten candidate genes related to drought/osmotic stress from sessile oak (Quercus robur) and sessile oak (Q. petraea) individuals. Out of these, eight genes of 336 oak individuals growing in Austria have been detected with a total number of 158 polymorphic sites. Allele numbers ranged from ten to 52 with observed heterozygosity ranging from 0.115 to 0.640. All loci deviated from Hardy-Weinberg equilibrium and linkage disequilibrium was found among six combinations of loci. Conclusions We have characterized 183 alleles of drought related genes from oak species and detected first evidences of natural selection. Beside the potential for marker development, we have created an expandable bioinformatic pipeline for the analysis of next generation sequencing data.
Collapse
Affiliation(s)
- Andreas Homolka
- Health and Environment Department, AIT Austrian Institute of Technology, Tulln, A-3430, Austria.
| | | | | | | | | | | |
Collapse
|
11
|
Han Y, Khu DM, Monteros MJ. High-resolution melting analysis for SNP genotyping and mapping in tetraploid alfalfa (Medicago sativa L.). MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2012; 29:489-501. [PMID: 22363202 PMCID: PMC3275744 DOI: 10.1007/s11032-011-9566-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2010] [Accepted: 03/12/2011] [Indexed: 05/22/2023]
Abstract
Single nucleotide polymorphisms (SNPs) represent the most abundant type of genetic polymorphism in plant genomes. SNP markers are valuable tools for genetic analysis of complex traits of agronomic importance, linkage and association mapping, genome-wide selection, map-based cloning, and marker-assisted selection. Current challenges for SNP genotyping in polyploid outcrossing species include multiple alleles per loci and lack of high-throughput methods suitable for variant detection. In this study, we report on a high-resolution melting (HRM) analysis system for SNP genotyping and mapping in outcrossing tetraploid genotypes. The sensitivity and utility of this technology is demonstrated by identification of the parental genotypes and segregating progeny in six alfalfa populations based on unique melting curve profiles due to differences in allelic composition at one or multiple loci. HRM using a 384-well format is a fast, consistent, and efficient approach for SNP discovery and genotyping, useful in polyploid species with uncharacterized genomes. Possible applications of this method include variation discovery, analysis of candidate genes, genotyping for comparative and association mapping, and integration of genome-wide selection in breeding programs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11032-011-9566-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yuanhong Han
- Forage Improvement Division, The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401 USA
| | - Dong-Man Khu
- Forage Improvement Division, The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401 USA
| | - Maria J. Monteros
- Forage Improvement Division, The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore, OK 73401 USA
| |
Collapse
|
12
|
Irurozki E, Calvo B, Lozano JA. A preprocessing procedure for haplotype inference by pure parsimony. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1183-1195. [PMID: 21116044 DOI: 10.1109/tcbb.2010.125] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Haplotype data are especially important in the study of complex diseases since it contains more information than genotype data. However, obtaining haplotype data is technically difficult and costly. Computational methods have proved to be an effective way of inferring haplotype data from genotype data. One of these methods, the haplotype inference by pure parsimony approach (HIPP), casts the problem as an optimization problem and as such has been proved to be NP-hard. We have designed and developed a new preprocessing procedure for this problem. Our proposed algorithm works with groups of haplotypes rather than individual haplotypes. It iterates searching and deleting haplotypes that are not helpful in order to find the optimal solution. This preprocess can be coupled with any of the current solvers for the HIPP that need to preprocess the genotype data. In order to test it, we have used two state-of-the-art solvers, RTIP and GAHAP, and simulated and real HapMap data. Due to the computational time and memory reduction caused by our preprocess, problem instances that were previously unaffordable can be now efficiently solved.
Collapse
Affiliation(s)
- Ekhine Irurozki
- Department of Computer Science and Artificial Intelligence, University of the Basque Country, Manuel de Lardizabal, 1 - 20018 Donostia, Gipuzkoa, Spain.
| | | | | |
Collapse
|
13
|
Inferring haplotypes of copy number variations from high-throughput data with uncertainty. G3-GENES GENOMES GENETICS 2011; 1:35-42. [PMID: 22384316 PMCID: PMC3276117 DOI: 10.1534/g3.111.000174] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2010] [Accepted: 03/14/2011] [Indexed: 11/18/2022]
Abstract
Accurate information on haplotypes and diplotypes (haplotype pairs) is required for population-genetic analyses; however, microarrays do not provide data on a haplotype or diplotype at a copy number variation (CNV) locus; they only provide data on the total number of copies over a diplotype or an unphased sequence genotype (e.g., AAB, unlike AB of single nucleotide polymorphism). Moreover, such copy numbers or genotypes are often incorrectly determined when microarray signal intensities derived from different copy numbers or genotypes are not clearly separated due to noise. Here we report an algorithm to infer CNV haplotypes and individuals' diplotypes at multiple loci from noisy microarray data, utilizing the probability that a signal intensity may be derived from different underlying copy numbers or genotypes. Performing simulation studies based on known diplotypes and an error model obtained from real microarray data, we demonstrate that this probabilistic approach succeeds in accurate inference (error rate: 1-2%) from noisy data, whereas previous deterministic approaches failed (error rate: 12-18%). Applying this algorithm to real microarray data, we estimated haplotype frequencies and diplotypes in 1486 CNV regions for 100 individuals. Our algorithm will facilitate accurate population-genetic analyses and powerful disease association studies of CNVs.
Collapse
|
14
|
Draffehn AM, Meller S, Li L, Gebhardt C. Natural diversity of potato (Solanum tuberosum) invertases. BMC PLANT BIOLOGY 2010; 10:271. [PMID: 21143910 PMCID: PMC3012049 DOI: 10.1186/1471-2229-10-271] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2010] [Accepted: 12/09/2010] [Indexed: 05/18/2023]
Abstract
BACKGROUND Invertases are ubiquitous enzymes that irreversibly cleave sucrose into fructose and glucose. Plant invertases play important roles in carbohydrate metabolism, plant development, and biotic and abiotic stress responses. In potato (Solanum tuberosum), invertases are involved in 'cold-induced sweetening' of tubers, an adaptive response to cold stress, which negatively affects the quality of potato chips and French fries. Linkage and association studies have identified quantitative trait loci (QTL) for tuber sugar content and chip quality that colocalize with three independent potato invertase loci, which together encode five invertase genes. The role of natural allelic variation of these genes in controlling the variation of tuber sugar content in different genotypes is unknown. RESULTS For functional studies on natural variants of five potato invertase genes we cloned and sequenced 193 full-length cDNAs from six heterozygous individuals (three tetraploid and three diploid). Eleven, thirteen, ten, twelve and nine different cDNA alleles were obtained for the genes Pain-1, InvGE, InvGF, InvCD141 and InvCD111, respectively. Allelic cDNA sequences differed from each other by 4 to 9%, and most were genotype specific. Additional variation was identified by single nucleotide polymorphism (SNP) analysis in an association-mapping population of 219 tetraploid individuals. Haplotype modeling revealed two to three major haplotypes besides a larger number of minor frequency haplotypes. cDNA alleles associated with chip quality, tuber starch content and starch yield were identified. CONCLUSIONS Very high natural allelic variation was uncovered in a set of five potato invertase genes. This variability is a consequence of the cultivated potato's reproductive biology. Some of the structural variation found might underlie functional variation that influences important agronomic traits such as tuber sugar content. The associations found between specific invertase alleles and chip quality, tuber starch content and starch yield will facilitate the selection of superior potato genotypes in breeding programs.
Collapse
Affiliation(s)
- Astrid M Draffehn
- Max-Planck Institute for Plant Breeding Research, Carl von Linné Weg 10, 50829 Köln, Germany
| | - Sebastian Meller
- Max-Planck Institute for Plant Breeding Research, Carl von Linné Weg 10, 50829 Köln, Germany
| | - Li Li
- Max-Planck Institute for Plant Breeding Research, Carl von Linné Weg 10, 50829 Köln, Germany
| | - Christiane Gebhardt
- Max-Planck Institute for Plant Breeding Research, Carl von Linné Weg 10, 50829 Köln, Germany
| |
Collapse
|
15
|
Graça A, Lynce I, Marques-Silva J, Oliveira AL. Haplotype inference by Pure Parsimony: a survey. J Comput Biol 2010; 17:969-92. [PMID: 20726791 DOI: 10.1089/cmb.2009.0101] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Given a set of genotypes from a population, the process of recovering the haplotypes that explain the genotypes is called haplotype inference. The haplotype inference problem under the assumption of pure parsimony consists in finding the smallest number of haplotypes that explain a given set of genotypes. This problem is NP-hard. The original formulations for solving the Haplotype Inference by Pure Parsimony (HIPP) problem were based on integer linear programming and branch-and-bound techniques. More recently, solutions based on Boolean satisfiability, pseudo-Boolean optimization, and answer set programming have been shown to be remarkably more efficient. HIPP can now be regarded as a feasible approach for haplotype inference, which can be competitive with other different approaches. This article provides an overview of the methods for solving the HIPP problem, including preprocessing, bounding techniques, and heuristic approaches. The article also presents an empirical evaluation of exact HIPP solvers on a comprehensive set of synthetic and real problem instances. Moreover, the bounding techniques to the exact problem are evaluated. The final section compares and discusses the HIPP approach with a well-established statistical method that represents the reference algorithm for this problem.
Collapse
Affiliation(s)
- Ana Graça
- Instituto Superior Técnico (IST), Technical University of Lisbon, INESC-ID Lisboa, Lisbon, Portugal.
| | | | | | | |
Collapse
|
16
|
Su SY, Asher JE, Jarvelin MR, Froguel P, Blakemore AIF, Balding DJ, Coin LJM. Inferring combined CNV/SNP haplotypes from genotype data. ACTA ACUST UNITED AC 2010; 26:1437-45. [PMID: 20406911 DOI: 10.1093/bioinformatics/btq157] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
MOTIVATION Copy number variations (CNVs) are increasingly recognized as an substantial source of individual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex disease susceptibility. CNV/SNP haplotypes are critical for this research, but although many methods have been proposed for inferring integer copy number, few have been designed for inferring CNV haplotypic phase and none of these are applicable at genome-wide scale. Here, we present a method for inferring missing CNV genotypes, predicting CNV allelic configuration and for inferring CNV haplotypic phase from SNP/CNV genotype data. Our method, implemented in the software polyHap v2.0, is based on a hidden Markov model, which models the joint haplotype structure between CNVs and SNPs. Thus, haplotypic phase of CNVs and SNPs are inferred simultaneously. A sampling algorithm is employed to obtain a measure of confidence/credibility of each estimate. RESULTS We generated diploid phase-known CNV-SNP genotype datasets by pairing male X chromosome CNV-SNP haplotypes. We show that polyHap provides accurate estimates of missing CNV genotypes, allelic configuration and CNV haplotypic phase on these datasets. We applied our method to a non-simulated dataset-a region on Chromosome 2 encompassing a short deletion. The results confirm that polyHap's accuracy extends to real-life datasets. AVAILABILITY Our method is implemented in version 2.0 of the polyHap software package and can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin.
Collapse
Affiliation(s)
- Shu-Yi Su
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College, London W2 1PG, UK
| | | | | | | | | | | | | |
Collapse
|
17
|
Achenbach U, Paulo J, Ilarionova E, Lübeck J, Strahwald J, Tacke E, Hofferbert HR, Gebhardt C. Using SNP markers to dissect linkage disequilibrium at a major quantitative trait locus for resistance to the potato cyst nematode Globodera pallida on potato chromosome V. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2009; 118:619-29. [PMID: 19020852 DOI: 10.1007/s00122-008-0925-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2008] [Accepted: 10/24/2008] [Indexed: 05/18/2023]
Abstract
The damage caused by the parasitic root cyst nematode Globodera pallida is a major yield-limiting factor in potato cultivation . Breeding for resistance is facilitated by the PCR-based marker 'HC', which is diagnostic for an allele conferring high resistance against G. pallida pathotype Pa2/3 that has been introgressed from the wild potato species Solanum vernei into the Solanum tuberosum tetraploid breeding pool. The major quantitative trait locus (QTL) controlling this nematode resistance maps on potato chromosome V in a hot spot for resistance to various pathogens including nematodes and the oomycete Phytophthora infestans. An unstructured sample of 79 tetraploid, highly heterozygous varieties and breeding clones was selected based on presence (41 genotypes) or absence (38 genotypes) of the HC marker. Testing the clones for resistance to G. pallida confirmed the diagnostic power of the HC marker. The 79 individuals were genotyped for 100 single nucleotide polymorphisms (SNPs) at 10 loci distributed over 38 cM on chromosome V. Forty-five SNPs at six loci spanning 2 cM in the interval between markers GP21-GP179 were associated with resistance to G. pallida. Based on linkage disequilibrium (LD) between SNP markers, six LD groups comprising between 2 and 18 SNPs were identified. The LD groups indicated the existence of multiple alleles at a single resistance locus or at several, physically linked resistance loci. LD group C comprising 18 SNPs corresponded to the 'HC' marker. LD group E included 16 SNPs and showed an association peak, which positioned one nematode resistance locus physically close to the R1 gene family.
Collapse
Affiliation(s)
- Ute Achenbach
- Department of Plant Breeding and Genetics, Max Planck Institute for Plant Breeding Research, 50829, Cologne, Germany
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Su SY, White J, Balding DJ, Coin LJM. Inference of haplotypic phase and missing genotypes in polyploid organisms and variable copy number genomic regions. BMC Bioinformatics 2008; 9:513. [PMID: 19046436 PMCID: PMC2647950 DOI: 10.1186/1471-2105-9-513] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2008] [Accepted: 12/01/2008] [Indexed: 12/18/2022] Open
Abstract
Background The power of haplotype-based methods for association studies, identification of regions under selection, and ancestral inference, is well-established for diploid organisms. For polyploids, however, the difficulty of determining phase has limited such approaches. Polyploidy is common in plants and is also observed in animals. Partial polyploidy is sometimes observed in humans (e.g. trisomy 21; Down's syndrome), and it arises more frequently in some human tissues. Local changes in ploidy, known as copy number variations (CNV), arise throughout the genome. Here we present a method, implemented in the software polyHap, for the inference of haplotype phase and missing observations from polyploid genotypes. PolyHap allows each individual to have a different ploidy, but ploidy cannot vary over the genomic region analysed. It employs a hidden Markov model (HMM) and a sampling algorithm to infer haplotypes jointly in multiple individuals and to obtain a measure of uncertainty in its inferences. Results In the simulation study, we combine real haplotype data to create artificial diploid, triploid, and tetraploid genotypes, and use these to demonstrate that polyHap performs well, in terms of both switch error rate in recovering phase and imputation error rate for missing genotypes. To our knowledge, there is no comparable software for phasing a large, densely genotyped region of chromosome from triploids and tetraploids, while for diploids we found polyHap to be more accurate than fastPhase. We also compare the results of polyHap to SATlotyper on an experimentally haplotyped tetraploid dataset of 12 SNPs, and show that polyHap is more accurate. Conclusion With the availability of large SNP data in polyploids and CNV regions, we believe that polyHap, our proposed method for inferring haplotypic phase from genotype data, will be useful in enabling researchers analysing such data to exploit the power of haplotype-based analyses.
Collapse
Affiliation(s)
- Shu-Yi Su
- Department of Epidemiology and Public Health, Imperial College, London, W2 1PG, UK.
| | | | | | | |
Collapse
|
19
|
Riaño-Pachón DM, Nagel A, Neigenfind J, Wagner R, Basekow R, Weber E, Mueller-Roeber B, Diehl S, Kersten B. GabiPD: the GABI primary database--a plant integrative 'omics' database. Nucleic Acids Res 2008; 37:D954-9. [PMID: 18812395 PMCID: PMC2686513 DOI: 10.1093/nar/gkn611] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
The GABI Primary Database, GabiPD (http://www.gabipd.org/), was established in the frame of the German initiative for Genome Analysis of the Plant Biological System (GABI). The goal of GabiPD is to collect, integrate, analyze and visualize primary information from GABI projects. GabiPD constitutes a repository and analysis platform for a wide array of heterogeneous data from high-throughput experiments in several plant species. Data from different ‘omics’ fronts are incorporated (i.e. genomics, transcriptomics, proteomics and metabolomics), originating from 14 different model or crop species. We have developed the concept of GreenCards for text-based retrieval of all data types in GabiPD (e.g. clones, genes, mutant lines). All data types point to a central Gene GreenCard, where gene information is integrated from genome projects or NCBI UniGene sets. The centralized Gene GreenCard allows visualizing ESTs aligned to annotated transcripts as well as displaying identified protein domains and gene structure. Moreover, GabiPD makes available interactive genetic maps from potato and barley, and protein 2DE gels from Arabidopsis thaliana and Brassica napus. Gene expression and metabolic-profiling data can be visualized through MapManWeb. By the integration of complex data in a framework of existing knowledge, GabiPD provides new insights and allows for new interpretations of the data.
Collapse
Affiliation(s)
- Diego Mauricio Riaño-Pachón
- GabiPD team, Bioinformatics group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, Berlin, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|