1
|
Yan M, Li M, Wang Y, Wang X, Moeinzadeh MH, Quispe-Huamanquispe DG, Fan W, Fang Y, Wang Y, Nie H, Wang Z, Tanaka A, Heider B, Kreuze JF, Gheysen G, Wang H, Vingron M, Bock R, Yang J. Haplotype-based phylogenetic analysis and population genomics uncover the origin and domestication of sweetpotato. MOLECULAR PLANT 2024; 17:277-296. [PMID: 38155570 DOI: 10.1016/j.molp.2023.12.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 11/10/2023] [Accepted: 12/25/2023] [Indexed: 12/30/2023]
Abstract
The hexaploid sweetpotato (Ipomoea batatas) is one of the most important root crops worldwide. However, its genetic origin remains controversial, and its domestication history remains unknown. In this study, we used a range of genetic evidence and a newly developed haplotype-based phylogenetic analysis to identify two probable progenitors of sweetpotato. The diploid progenitor was likely closely related to Ipomoea aequatoriensis and contributed the B1 subgenome, IbT-DNA2, and the lineage 1 type of chloroplast genome to sweetpotato. The tetraploid progenitor of sweetpotato was most likely I. batatas 4x, which donated the B2 subgenome, IbT-DNA1, and the lineage 2 type of chloroplast genome. Sweetpotato most likely originated from reciprocal crosses between the diploid and tetraploid progenitors, followed by a subsequent whole-genome duplication. In addition, we detected biased gene exchanges between the subgenomes; the rate of B1 to B2 subgenome conversions was nearly three times higher than that of B2 to B1 subgenome conversions. Our analyses revealed that genes involved in storage root formation, maintenance of genome stability, biotic resistance, sugar transport, and potassium uptake were selected during the speciation and domestication of sweetpotato. This study sheds light on the evolution of sweetpotato and paves the way for improvement of this crop.
Collapse
Affiliation(s)
- Mengxiao Yan
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China
| | - Ming Li
- College of Life Sciences, Chongqing Normal University, Chongqing 401331, China; Biotechnology and Nuclear Technology Research Institute, Sichuan Academy of Agricultural Sciences, Chengdu 610061, China
| | - Yunze Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Xinyi Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - M-Hossein Moeinzadeh
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany
| | | | - Weijuan Fan
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China
| | - Yijie Fang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Yuqin Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Haozhen Nie
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China
| | - Zhangying Wang
- Guangdong Provincial Key Laboratory of Crops Genetics and Improvement, Crop Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou 510640, China
| | - Aiko Tanaka
- Graduate School of Bioagricultural Sciences, Nagoya University, Chikusa, Nagoya 464-8601, Japan
| | | | | | | | - Hongxia Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; CAS Center for Excellence of Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai 200233, China.
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Ihnestraße 63-73, 14195 Berlin, Germany.
| | - Ralph Bock
- Max-Planck-Institut für Molekulare Pflanzenphysiologie, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany.
| | - Jun Yang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; CAS Center for Excellence of Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Chinese Academy of Sciences, Shanghai 200233, China.
| |
Collapse
|
2
|
Chen H, Pelizzola M, Futschik A. Haplotype based testing for a better understanding of the selective architecture. BMC Bioinformatics 2023; 24:322. [PMID: 37633901 PMCID: PMC10463365 DOI: 10.1186/s12859-023-05437-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 08/03/2023] [Indexed: 08/28/2023] Open
Abstract
BACKGROUND The identification of genomic regions affected by selection is one of the most important goals in population genetics. If temporal data are available, allele frequency changes at SNP positions are often used for this purpose. Here we provide a new testing approach that uses haplotype frequencies instead of allele frequencies. RESULTS Using simulated data, we show that compared to SNP based test, our approach has higher power, especially when the number of candidate haplotypes is small or moderate. To improve power when the number of haplotypes is large, we investigate methods to combine them with a moderate number of haplotype subsets. Haplotype frequencies can often be recovered with less noise than SNP frequencies, especially under pool sequencing, giving our test an additional advantage. Furthermore, spurious outlier SNPs may lead to false positives, a problem usually not encountered when working with haplotypes. Post hoc tests for the number of selected haplotypes and for differences between their selection coefficients are also provided for a better understanding of the underlying selection dynamics. An application on a real data set further illustrates the performance benefits. CONCLUSIONS Due to less multiple testing correction and noise reduction, haplotype based testing is able to outperform SNP based tests in terms of power in most scenarios.
Collapse
Affiliation(s)
- Haoyu Chen
- University of Veterinary Medicine Vienna, Vienna, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
| | | | | |
Collapse
|
3
|
Ruiz JL, Reimering S, Escobar-Prieto JD, Brancucci NMB, Echeverry DF, Abdi AI, Marti M, Gómez-Díaz E, Otto TD. From contigs towards chromosomes: automatic improvement of long read assemblies (ILRA). Brief Bioinform 2023; 24:bbad248. [PMID: 37406192 PMCID: PMC10359078 DOI: 10.1093/bib/bbad248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/24/2023] [Accepted: 06/16/2023] [Indexed: 07/07/2023] Open
Abstract
Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. Long read technologies embody the promise of overcoming scaffolding problems associated with repeats and low complexity sequences, but the number of contigs often far exceeds the number of chromosomes and they may contain many insertion and deletion errors around homopolymer tracts. To overcome these issues, we have implemented the ILRA pipeline to correct long read-based assemblies. Contigs are first reordered, renamed, merged, circularized, or filtered if erroneous or contaminated. Illumina short reads are used subsequently to correct homopolymer errors. We successfully tested our approach by improving the genome sequences of Homo sapiens, Trypanosoma brucei, and Leptosphaeria spp., and by generating four novel Plasmodium falciparum assemblies from field samples. We found that correcting homopolymer tracts reduced the number of genes incorrectly annotated as pseudogenes, but an iterative approach seems to be required to correct more sequencing errors. In summary, we describe and benchmark the performance of our new tool, which improved the quality of novel long read assemblies up to 1 Gbp. The pipeline is available at GitHub: https://github.com/ThomasDOtto/ILRA.
Collapse
Affiliation(s)
- José Luis Ruiz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Susanne Reimering
- Department for Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Nicolas M B Brancucci
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
- Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute, 4123 Allschwil, Switzerland
- University of Basel, 4001 Basel, Switzerland
| | - Diego F Echeverry
- Centro Internacional de Entrenamiento e Investigaciones Médicas (CIDEIM), Cali, Colombia
- Departamento de Microbiología, Facultad de Salud, Universidad del Valle, Cali, Colombia
| | | | - Matthias Marti
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| | - Elena Gómez-Díaz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Thomas D Otto
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| |
Collapse
|
4
|
Leal JL, Milesi P, Salojärvi J, Lascoux M. Phylogenetic Analysis of Allotetraploid Species Using Polarized Genomic Sequences. Syst Biol 2023; 72:372-390. [PMID: 36932679 PMCID: PMC10275558 DOI: 10.1093/sysbio/syad009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 10/14/2022] [Accepted: 03/10/2023] [Indexed: 03/19/2023] Open
Abstract
Phylogenetic analysis of polyploid hybrid species has long posed a formidable challenge as it requires the ability to distinguish between alleles of different ancestral origins in order to disentangle their individual evolutionary history. This problem has been previously addressed by conceiving phylogenies as reticulate networks, using a two-step phasing strategy that first identifies and segregates homoeologous loci and then, during a second phasing step, assigns each gene copy to one of the subgenomes of an allopolyploid species. Here, we propose an alternative approach, one that preserves the core idea behind phasing-to produce separate nucleotide sequences that capture the reticulate evolutionary history of a polyploid-while vastly simplifying its implementation by reducing a complex multistage procedure to a single phasing step. While most current methods used for phylogenetic reconstruction of polyploid species require sequencing reads to be pre-phased using experimental or computational methods-usually an expensive, complex, and/or time-consuming endeavor-phasing executed using our algorithm is performed directly on the multiple-sequence alignment (MSA), a key change that allows for the simultaneous segregation and sorting of gene copies. We introduce the concept of genomic polarization that, when applied to an allopolyploid species, produces nucleotide sequences that capture the fraction of a polyploid genome that deviates from that of a reference sequence, usually one of the other species present in the MSA. We show that if the reference sequence is one of the parental species, the polarized polyploid sequence has a close resemblance (high pairwise sequence identity) to the second parental species. This knowledge is harnessed to build a new heuristic algorithm where, by replacing the allopolyploid genomic sequence in the MSA by its polarized version, it is possible to identify the phylogenetic position of the polyploid's ancestral parents in an iterative process. The proposed methodology can be used with long-read and short-read high-throughput sequencing data and requires only one representative individual for each species to be included in the phylogenetic analysis. In its current form, it can be used in the analysis of phylogenies containing tetraploid and diploid species. We test the newly developed method extensively using simulated data in order to evaluate its accuracy. We show empirically that the use of polarized genomic sequences allows for the correct identification of both parental species of an allotetraploid with up to 97% certainty in phylogenies with moderate levels of incomplete lineage sorting (ILS) and 87% in phylogenies containing high levels of ILS. We then apply the polarization protocol to reconstruct the reticulate histories of Arabidopsis kamchatica and Arabidopsis suecica, two allopolyploids whose ancestry has been well documented. [Allopolyploidy; Arabidopsis; genomic polarization; homoeologs; incomplete lineage sorting; phasing; polyploid phylogenetics; reticulate evolution.].
Collapse
Affiliation(s)
- J Luis Leal
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
| | - Pascal Milesi
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
- Science for Life Laboratory (SciLifeLab), Uppsala University, 75237 Uppsala, Sweden
| | - Jarkko Salojärvi
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences, and Viikki Plant Science Centre, University of Helsinki, P.O. Box 65 (Viikinkaari 1), 00014 Helsinki, Finland
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore 637551, Singapore
| | - Martin Lascoux
- Plant Ecology and Evolution, Department of Ecology and Genetics, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden
- Science for Life Laboratory (SciLifeLab), Uppsala University, 75237 Uppsala, Sweden
| |
Collapse
|
5
|
Kong W, Wang Y, Zhang S, Yu J, Zhang X. Recent Advances in Assembly of Complex Plant Genomes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:427-439. [PMID: 37100237 PMCID: PMC10787022 DOI: 10.1016/j.gpb.2023.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Revised: 03/18/2023] [Accepted: 04/07/2023] [Indexed: 04/28/2023]
Abstract
Over the past 20 years, tremendous advances in sequencing technologies and computational algorithms have spurred plant genomic research into a thriving era with hundreds of genomes decoded already, ranging from those of nonvascular plants to those of flowering plants. However, complex plant genome assembly is still challenging and remains difficult to fully resolve with conventional sequencing and assembly methods due to high heterozygosity, highly repetitive sequences, or high ploidy characteristics of complex genomes. Herein, we summarize the challenges of and advances in complex plant genome assembly, including feasible experimental strategies, upgrades to sequencing technology, existing assembly methods, and different phasing algorithms. Moreover, we list actual cases of complex genome projects for readers to refer to and draw upon to solve future problems related to complex genomes. Finally, we expect that the accurate, gapless, telomere-to-telomere, and fully phased assembly of complex plant genomes could soon become routine.
Collapse
Affiliation(s)
- Weilong Kong
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Yibin Wang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Shengcheng Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Jiaxin Yu
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Xingtan Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China.
| |
Collapse
|
6
|
Wang Y, Yu J, Jiang M, Lei W, Zhang X, Tang H. Sequencing and Assembly of Polyploid Genomes. Methods Mol Biol 2023; 2545:429-458. [PMID: 36720827 DOI: 10.1007/978-1-0716-2561-3_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Polyploidy has been observed throughout major eukaryotic clades and has played a vital role in the evolution of angiosperms. Recent polyploidizations often result in highly complex genome structures, posing challenges to genome assembly and phasing. Recent advances in sequencing technologies and genome assembly algorithms have enabled high-quality, near-complete chromosome-level assemblies of polyploid genomes. Advances in novel sequencing technologies include highly accurate single-molecule sequencing with HiFi reads, chromosome conformation capture with Hi-C technique, and linked reads sequencing. Additionally, new computational approaches have also significantly improved the precision and reliability of polyploid genome assembly and phasing, such as HiCanu, hifiasm, ALLHiC, and PolyGembler. Herein, we review recently published polyploid genomes and compare the various sequencing, assembly, and phasing approaches that are utilized in these genome studies. Finally, we anticipate that accurate and telomere-to-telomere chromosome-level assembly of polyploid genomes could ultimately become a routine procedure in the near future.
Collapse
Affiliation(s)
- Yibin Wang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Jiaxin Yu
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Mengwei Jiang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Wenlong Lei
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Xingtan Zhang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Haibao Tang
- Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Key Laboratory of Genetics, Breeding and Multiple Utilization of Crops, Ministry of Education, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou, China
| |
Collapse
|
7
|
Shirali Hossein Zade R, Urhan A, Assis de Souza A, Singh A, Abeel T. HAT: haplotype assembly tool using short and error-prone long reads. Bioinformatics 2022; 38:5352-5359. [PMID: 36308461 PMCID: PMC9750119 DOI: 10.1093/bioinformatics/btac702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/16/2022] [Accepted: 10/25/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Haplotypes are the set of alleles co-occurring on a single chromosome and inherited together to the next generation. Because a monoploid reference genome loses this co-occurrence information, it has limited use in associating phenotypes with allelic combinations of genotypes. Therefore, methods to reconstruct the complete haplotypes from DNA sequencing data are crucial. Recently, several attempts have been made at haplotype reconstructions, but significant limitations remain. High-quality continuous haplotypes cannot be created reliably, particularly when there are few differences between the homologous chromosomes. RESULTS Here, we introduce HAT, a haplotype assembly tool that exploits short and long reads along with a reference genome to reconstruct haplotypes. HAT tries to take advantage of the accuracy of short reads and the length of the long reads to reconstruct haplotypes. We tested HAT on the aneuploid yeast strain Saccharomyces pastorianus CBS1483 and multiple simulated polyploid datasets of the same strain, showing that it outperforms existing tools. AVAILABILITY AND IMPLEMENTATION https://github.com/AbeelLab/hat/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Aysun Urhan
- Delft Bioinformatics Lab, Delft University of Technology Van Mourik, 2628 XE Delft, The Netherlands
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Alvaro Assis de Souza
- Delft Bioinformatics Lab, Delft University of Technology Van Mourik, 2628 XE Delft, The Netherlands
| | - Akash Singh
- Delft Bioinformatics Lab, Delft University of Technology Van Mourik, 2628 XE Delft, The Netherlands
| | | |
Collapse
|
8
|
Yan M, Nie H, Wang Y, Wang X, Jarret R, Zhao J, Wang H, Yang J. Exploring and exploiting genetics and genomics for sweetpotato improvement: Status and perspectives. PLANT COMMUNICATIONS 2022; 3:100332. [PMID: 35643086 PMCID: PMC9482988 DOI: 10.1016/j.xplc.2022.100332] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 04/17/2022] [Accepted: 05/02/2022] [Indexed: 05/14/2023]
Abstract
Sweetpotato (Ipomoea batatas (L.) Lam.) is one of the most important root crops cultivated worldwide. Because of its adaptability, high yield potential, and nutritional value, sweetpotato has become an important food crop, particularly in developing countries. To ensure adequate crop yields to meet increasing demand, it is essential to enhance the tolerance of sweetpotato to environmental stresses and other yield-limiting factors. The highly heterozygous hexaploid genome of I. batatas complicates genetic studies and limits improvement of sweetpotato through traditional breeding. However, application of next-generation sequencing and high-throughput genotyping and phenotyping technologies to sweetpotato genetics and genomics research has provided new tools and resources for crop improvement. In this review, we discuss the genomics resources that are available for sweetpotato, including the current reference genome, databases, and available bioinformatics tools. We systematically review the current state of knowledge on the polyploid genetics of sweetpotato, including studies of its origin and germplasm diversity and the associated mapping of important agricultural traits. We then outline the conventional and molecular breeding approaches that have been applied to sweetpotato. Finally, we discuss future goals for genetic studies of sweetpotato and crop improvement via breeding in combination with state-of-the-art multi-omics approaches such as genomic selection and gene editing. These approaches will advance and accelerate genetic improvement of this important root crop and facilitate its sustainable global production.
Collapse
Affiliation(s)
- Mengxiao Yan
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China
| | - Haozhen Nie
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China
| | - Yunze Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Xinyi Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | | | - Jiamin Zhao
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Hongxia Wang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China.
| | - Jun Yang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China; National Key Laboratory of Plant Molecular Genetics, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai 200032, China.
| |
Collapse
|
9
|
Schrinner S, Serra Mari R, Finkers R, Arens P, Usadel B, Marschall T, Klau GW. Genetic polyploid phasing from low-depth progeny samples. iScience 2022; 25:104461. [PMID: 35692633 PMCID: PMC9184567 DOI: 10.1016/j.isci.2022.104461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 04/20/2022] [Accepted: 05/16/2022] [Indexed: 11/08/2022] Open
Abstract
An important challenge in genome assembly is haplotype phasing, that is, to reconstruct the different haplotype sequences of an individual genome. Phasing becomes considerably more difficult with increasing ploidy, which makes polyploid phasing a notoriously hard computational problem. We present a novel genetic phasing method for plant breeding with the aim to phase two deep-sequenced parental samples with the help of a large number of progeny samples sequenced at low depth. The key ideas underlying our approach are to (i) integrate the individually weak Mendelian progeny signals with a Bayesian log-likelihood model, (ii) cluster alleles according to their likelihood of co-occurrence, and (iii) assign them to haplotypes via an interval scheduling approach. We show on two deep-sequenced parental and 193 low-depth progeny potato samples that our approach computes high-quality sparse phasings and that it scales to whole genomes. Allows phasing of autopolyploid species through genetic information of progenies High number of low-depth progeny samples yields significant markers for phasing Informative variant types (simplex-nulliplex) phasable with high confidence Continuity not limited by read connectivity, but rather by the recombination rate
Collapse
Affiliation(s)
- Sven Schrinner
- Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Rebecca Serra Mari
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, Wageningen, the Netherlands.,Gennovation B.V., Agro Business Park 10, 6708 PW, Wageningen, The Netherlands
| | - Paul Arens
- Plant Breeding, Wageningen University & Research, Wageningen, the Netherlands
| | - Björn Usadel
- Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany.,Forschungszentrum Jülich, Institute of Bio and Geosciences, Bioinformatics (IBG-4), Jülich, Germany.,Bioeconomy Science Center, c/o Forschungszentrum, Jülich, Germany.,Biological Data Science, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Gunnar W Klau
- Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, Germany.,Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| |
Collapse
|
10
|
Saada OA, Friedrich A, Schacherer J. Towards accurate, contiguous and complete alignment-based polyploid phasing algorithms. Genomics 2022; 114:110369. [PMID: 35483655 DOI: 10.1016/j.ygeno.2022.110369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 03/09/2022] [Accepted: 04/11/2022] [Indexed: 01/14/2023]
Abstract
Phasing, and in particular polyploid phasing, have been challenging problems held back by the limited read length of high-throughput short read sequencing methods which can't overcome the distance between heterozygous sites and labor high cost of alternative methods such as the physical separation of chromosomes for example. Recently developed single molecule long-read sequencing methods provide much longer reads which overcome this previous limitation. Here we review the alignment-based methods of polyploid phasing that rely on four main strategies: population inference methods, which leverage the genetic information of several individuals to phase a sample; objective function minimization methods, which minimize a function such as the Minimum Error Correction (MEC); graph partitioning methods, which represent the read data as a graph and split it into k haplotype subgraphs; cluster building methods, which iteratively grow clusters of similar reads into a final set of clusters that represent the haplotypes. We discuss the advantages and limitations of these methods and the metrics used to assess their performance, proposing that accuracy and contiguity are the most meaningful metrics. Finally, we propose the field of alignment-based polyploid phasing would greatly benefit from the use of a well-designed benchmarking dataset with appropriate evaluation metrics. We consider that there are still significant improvements which can be achieved to obtain more accurate and contiguous polyploid phasing results which reflect the complexity of polyploid genome architectures.
Collapse
Affiliation(s)
- Omar Abou Saada
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
| | - Anne Friedrich
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France
| | - Joseph Schacherer
- Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France; Institut Universitaire de France (IUF), Paris, France.
| |
Collapse
|
11
|
Liu Y, Li J. Hamming-shifting graph of genomic short reads: Efficient construction and its application for compression. PLoS Comput Biol 2021; 17:e1009229. [PMID: 34280186 PMCID: PMC8321399 DOI: 10.1371/journal.pcbi.1009229] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 07/29/2021] [Accepted: 06/30/2021] [Indexed: 11/21/2022] Open
Abstract
Graphs such as de Bruijn graphs and OLC (overlap-layout-consensus) graphs have been widely adopted for the de novo assembly of genomic short reads. This work studies another important problem in the field: how graphs can be used for high-performance compression of the large-scale sequencing data. We present a novel graph definition named Hamming-Shifting graph to address this problem. The definition originates from the technological characteristics of next-generation sequencing machines, aiming to link all pairs of distinct reads that have a small Hamming distance or a small shifting offset or both. We compute multiple lexicographically minimal k-mers to index the reads for an efficient search of the weight-lightest edges, and we prove a very high probability of successfully detecting these edges. The resulted graph creates a full mutual reference of the reads to cascade a code-minimized transfer of every child-read for an optimal compression. We conducted compression experiments on the minimum spanning forest of this extremely sparse graph, and achieved a 10 − 30% more file size reduction compared to the best compression results using existing algorithms. As future work, the separation and connectivity degrees of these giant graphs can be used as economical measurements or protocols for quick quality assessment of wet-lab machines, for sufficiency control of genomic library preparation, and for accurate de novo genome assembly. We present a novel graph-based algorithm to compress next-generation short sequencing reads. The novelty of the algorithm is attributed to a new definition of genomic sequence graph named Hamming-Shifting graph. It consists of edges between distinct reads that have a small Hamming distance or a small shifting offset or both. Efficient construction of Hamming-Shifting graphs is challenging. We introduce a heuristic technique to detect the weight-lightest edges through multiple minimizers from each read, then search the minimum spanning trees and forests of the Hamming-Shifting graph for a high-performance compression of the reads. Our method achieves an additional 10 − 30% file size reduction compared to contemporary compression techniques.
Collapse
Affiliation(s)
- Yuansheng Liu
- Data Science Institute, University of Technology Sydney, Sydney, Australia
| | - Jinyan Li
- Data Science Institute, University of Technology Sydney, Sydney, Australia
| |
Collapse
|
12
|
Abou Saada O, Tsouris A, Eberlein C, Friedrich A, Schacherer J. nPhase: an accurate and contiguous phasing method for polyploids. Genome Biol 2021; 22:126. [PMID: 33926549 PMCID: PMC8082856 DOI: 10.1186/s13059-021-02342-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 04/08/2021] [Indexed: 01/06/2023] Open
Abstract
While genome sequencing and assembly are now routine, we do not have a full, precise picture of polyploid genomes. No existing polyploid phasing method provides accurate and contiguous haplotype predictions. We developed nPhase, a ploidy agnostic tool that leverages long reads and accurate short reads to solve alignment-based phasing for samples of unspecified ploidy (https://github.com/OmarOakheart/nPhase). nPhase is validated by tests on simulated and real polyploids. nPhase obtains on average over 95% accuracy and a contiguous 1.25 haplotigs per haplotype to cover more than 90% of each chromosome (heterozygosity rate ≥ 0.5%). nPhase allows population genomics and hybrid studies of polyploids.
Collapse
Affiliation(s)
- Omar Abou Saada
- Université de Strasbourg, CNRS, GMGM UMR, 7156, Strasbourg, France
| | - Andreas Tsouris
- Université de Strasbourg, CNRS, GMGM UMR, 7156, Strasbourg, France
| | - Chris Eberlein
- Université de Strasbourg, CNRS, GMGM UMR, 7156, Strasbourg, France
| | - Anne Friedrich
- Université de Strasbourg, CNRS, GMGM UMR, 7156, Strasbourg, France.
| | - Joseph Schacherer
- Université de Strasbourg, CNRS, GMGM UMR, 7156, Strasbourg, France. .,Institut Universitaire de France (IUF), Paris, France.
| |
Collapse
|
13
|
Zhang W, Luo C, Scossa F, Zhang Q, Usadel B, Fernie AR, Mei H, Wen W. A phased genome based on single sperm sequencing reveals crossover pattern and complex relatedness in tea plants. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 105:197-208. [PMID: 33118252 DOI: 10.1111/tpj.15051] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/19/2020] [Accepted: 10/22/2020] [Indexed: 05/27/2023]
Abstract
For diploid organisms that are highly heterozygous, a phased haploid genome can greatly aid in functional genomic, population genetic and breeding studies. Based on the genome sequencing of 135 single sperm cells of the elite tea cultivar 'Fudingdabai', we herein phased the genome of Camellia sinensis, one of the most popular beverage crops worldwide. High-resolution genetic and recombination maps of Fudingdabai were constructed, which revealed that crossover (CO) positions were frequently located in the 5' and 3' ends of annotated genes, while CO distributions across the genome were random. The low CO frequency in tea can be explained by strong CO interference, and CO simulation revealed the proportion of interference insensitive CO ranged from 5.2% to 11.7%. We furthermore developed a method to infer the relatedness between tea accessions and detected complex kinship and genetic signatures of 106 tea accessions. Among them, 59 accessions were closely related with Fudingdabai and 31 of them were first-degree relatives. We additionally identified genes displaying allele specific expression patterns between the two haplotypes of Fudingdabai and genes displaying significantly differential expression levels between Fudingdabai and other haplotypes. These results lay the foundation for further investigation of genetic and epigenetic factors underpinning the regulation of gene expression and provide insights into the evolution of tea plants as well as a valuable genetic resource for future breeding efforts.
Collapse
Affiliation(s)
- Weiyi Zhang
- Key Laboratory of Horticultural Plant Biology (MOE), College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China
| | - Cheng Luo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Federico Scossa
- Max-Planck-Institute of Molecular Plant Physiology, Am Muehlenberg 1, Potsdam-Golm, 14476, Germany
- Council for Agricultural Research and Economics, Research Center for Genomics and Bioinformatics, Via Ardeatina 546, Rome, 00178, Italy
| | - Qinghua Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Björn Usadel
- Institute for Biological Data Science, Heinrich Heine University, Düsseldorf, Germany
- Institute of Bio- and Geosciences, IBG-4: Bioinformatics, CEPLAS, Forschungszentrum Jülich, Leo-Brandt-Straße, Jülich, 52425, Germany
| | - Alisdair R Fernie
- Max-Planck-Institute of Molecular Plant Physiology, Am Muehlenberg 1, Potsdam-Golm, 14476, Germany
- Center of Plant Systems Biology and Biotechnology, Plovdiv, 4000, Bulgaria
| | - Hanwei Mei
- Shanghai Agrobiological Gene Center, Shanghai, 201106, China
| | - Weiwei Wen
- Key Laboratory of Horticultural Plant Biology (MOE), College of Horticulture and Forestry Sciences, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
14
|
Nicholls SM, Aubrey W, De Grave K, Schietgat L, Creevey CJ, Clare A. On the complexity of haplotyping a microbial community. Bioinformatics 2020; 37:1360-1366. [PMID: 33444437 PMCID: PMC8208737 DOI: 10.1093/bioinformatics/btaa977] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 11/04/2020] [Accepted: 11/09/2020] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Population-level genetic variation enables competitiveness and niche specialization in microbial communities. Despite the difficulty in culturing many microbes from an environment, we can still study these communities by isolating and sequencing DNA directly from an environment (metagenomics). Recovering the genomic sequences of all isoforms of a given gene across all organisms in a metagenomic sample would aid evolutionary and ecological insights into microbial ecosystems with potential benefits for medicine and biotechnology. A significant obstacle to this goal arises from the lack of a computationally tractable solution that can recover these sequences from sequenced read fragments. This poses a problem analogous to reconstructing the two sequences that make up the genome of a diploid organism (i.e. haplotypes), but for an unknown number of individuals and haplotypes. RESULTS The problem of single individual haplotyping (SIH) was first formalised by Lancia et al. in 2001. Now, nearly two decades later, we discuss the complexity of "haplotyping" metagenomic samples, with a new formalisation of Lancia et al's data structure that allows us to effectively extend the single individual haplotype problem to microbial communities. This work describes and formalizes the problem of recovering genes (and other genomic subsequences) from all individuals within a complex community sample, which we term the metagenomic individual haplotyping (MIH) problem. We also provide software implementations for a pairwise single nucleotide variant (SNV) co-occurrence matrix and greedy graph traversal algorithm. AVAILABILITY AND IMPLEMENTATION Our reference implementation of the described pairwise SNV matrix (Hansel) and greedy haplotype path traversal algorithm (Gretel) are open source, MIT licensed and freely available online at github.com/samstudio8/hansel and github.com/samstudio8/gretel, respectively.
Collapse
Affiliation(s)
- Samuel M Nicholls
- Department of Computer Science, Aberystwyth University, Aberystwyth, UK.,Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium.,Institute of Biological, Rural and Environmental Sciences, Aberystwyth University, Aberystwyth, UK.,Institute of Microbiology and Infection, School of Biosciences, University of Birmingham, Birmingham, UK
| | - Wayne Aubrey
- Department of Computer Science, Aberystwyth University, Aberystwyth, UK
| | - Kurt De Grave
- Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium.,Flanders Make, Lommel, Belgium
| | - Leander Schietgat
- Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium.,Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium
| | - Christopher J Creevey
- Institute of Biological, Rural and Environmental Sciences, Aberystwyth University, Aberystwyth, UK.,Institute of Global Food Security, School of Biological Sciences, Queen's University, Belfast, UK
| | - Amanda Clare
- Department of Computer Science, Aberystwyth University, Aberystwyth, UK
| |
Collapse
|