1
|
Xiao X, Ran Z, Yan C, Gu W, Li Z. Mitochondrial genome assembly of the Chinese endemic species of Camellia luteoflora and revealing its repetitive sequence mediated recombination, codon preferences and MTPTs. BMC PLANT BIOLOGY 2025; 25:435. [PMID: 40186100 PMCID: PMC11971748 DOI: 10.1186/s12870-025-06461-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2025] [Accepted: 03/25/2025] [Indexed: 04/07/2025]
Abstract
Camellia luteoflora Y.K. Li ex Hung T. Chang & F.A. Zeng belongs to the Camellia L. genus (Theaceae Mirb.). As an endemic, rare, and critically endangered species in China, it holds significant ornamental and economic value, garnering global attention due to its ecological rarity. Despite its conservation importance, genomic investigations on this species remain limited, particularly in organelle genomics, hindering progress in phylogenetic classification and population identification. In this study, we employed high-throughput sequencing to assemble the first complete mitochondrial genome of C. luteoflora and reannotated its chloroplast genome. Through integrated bioinformatics analyses, we systematically characterized the mitochondrial genome's structural organization, gene content, interorganellar DNA transfer, sequence variation, and evolutionary relationships.Key findings revealed a circular mitochondrial genome spanning 587,847 bp with a GC content of 44.63%. The genome harbors70 unique functional genes, including 40 protein-coding genes (PCGs), 27 tRNA genes, and 3 rRNA genes. Notably, 9 PCGs contained 22 intronic regions. Codon usage analysis demonstrated a pronounced A/U bias in synonymous codon selection. Structural features included 506 dispersed repeats and 240 simple sequence repeats. Comparative genomics identified 19 chloroplast-derived transfer events, contributing 29,534 bp (3.77% of total mitochondrial DNA). RNA editing prediction revealed 539 C-to-T conversion events across PCGs. Phylogenetic reconstruction using mitochondrial PCGs positioned C. luteoflora in closest evolutionary proximity to Camellia sinensis var. sinensis. Selection pressure analysis (Ka/Ks ratios < 1 for 11 PCGs) and nucleotide diversity assessment (Pi values: 0-0.00711) indicated strong purifying selection and low sequence divergence.This study provides the first comprehensive mitochondrial genomic resource for C. luteoflora, offering critical insights for germplasm conservation, comparative organelle genomics, phylogenetic resolution, and evolutionary adaptation studies in Camellia species.
Collapse
Affiliation(s)
- Xu Xiao
- College of Forestry, Guizhou University, Guiyang, 550025, China
| | - Zhaohui Ran
- College of Forestry, Guizhou University, Guiyang, 550025, China
| | - Chao Yan
- College of Forestry, Guizhou University, Guiyang, 550025, China
| | - Weihao Gu
- College of Forestry, Guizhou University, Guiyang, 550025, China
| | - Zhi Li
- College of Forestry, Guizhou University, Guiyang, 550025, China.
| |
Collapse
|
2
|
Nesterenko L, Blassel L, Veber P, Boussau B, Jacob L. Phyloformer: Fast, Accurate, and Versatile Phylogenetic Reconstruction with Deep Neural Networks. Mol Biol Evol 2025; 42:msaf051. [PMID: 40066802 PMCID: PMC11965795 DOI: 10.1093/molbev/msaf051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 01/16/2025] [Accepted: 01/27/2025] [Indexed: 04/04/2025] Open
Abstract
Phylogenetic inference aims at reconstructing the tree describing the evolution of a set of sequences descending from a common ancestor. The high computational cost of state-of-the-art maximum likelihood and Bayesian inference methods limits their usability under realistic evolutionary models. Harnessing recent advances in likelihood-free inference and geometric deep learning, we introduce Phyloformer, a fast and accurate method for evolutionary distance estimation and phylogenetic reconstruction. Sampling many trees and sequences under an evolutionary model, we train the network to learn a function that enables predicting a tree from a multiple sequence alignment. On simulated data, we compare Phyloformer to FastME-a distance method-and two maximum likelihood methods: FastTree and IQTree. Under a commonly used model of protein sequence evolution and exploiting graphics processing unit (GPU) acceleration, Phyloformer outpaces all other approaches and exceeds their accuracy in the Kuhner-Felsenstein metric that accounts for both the topology and branch lengths. In terms of topological accuracy alone, Phyloformer outperforms FastME, but falls behind maximum likelihood approaches, especially as the number of sequences increases. When a model of sequence evolution that includes dependencies between sites is used, Phyloformer outperforms all other methods across all metrics on alignments with fewer than 80 sequences. On 3,801 empirical gene alignments from five different datasets, Phyloformer matches the topological accuracy of the two maximum likelihood implementations. Our results pave the way for the adoption of sophisticated realistic models for phylogenetic inference.
Collapse
Affiliation(s)
- Luca Nesterenko
- Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, Villeurbanne, France
| | - Luc Blassel
- Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, Villeurbanne, France
| | - Philippe Veber
- Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, Villeurbanne, France
| | - Bastien Boussau
- Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, Villeurbanne, France
| | - Laurent Jacob
- Laboratory of Computational and Quantitative Biology, Sorbonne Université, Paris, France
| |
Collapse
|
3
|
Ren L, Tu X, Luo M, Liu Q, Cui J, Gao X, Zhang H, Tai Y, Zeng Y, Li M, Wu C, Li W, Wang J, Wu D, Liu S. Genomes reveal pervasive distant hybridization in nature among cyprinid fishes. Gigascience 2025; 14:giae117. [PMID: 39880407 PMCID: PMC11779505 DOI: 10.1093/gigascience/giae117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2024] [Revised: 10/12/2024] [Accepted: 12/09/2024] [Indexed: 01/31/2025] Open
Abstract
BACKGROUND Genomic data have unveiled a fascinating aspect of the evolutionary past, showing that the mingling of different species through hybridization has left its mark on the histories of numerous life forms. However, the relationship between hybridization events and the origins of cyprinid fishes remains unclear. RESULTS In this study, we generated de novo assembled genomes of 8 cyprinid fishes and conducted phylogenetic analyses on 24 species. Widespread allele sharing across species boundaries was observed within 7 subfamilies of cyprinid fishes. Based on a systematic analysis of multiple tissues, we found that the testis exhibited a conserved pattern of divergence between the herbivorous Megalobrama amblycephala and the carnivorous Culter alburnus, suggesting a potential link to incomplete reproductive isolation. Significant differences in the expression of 4 genes (dpp2, ctrl, psb7, and ppce) in the liver and intestine, accompanied by variations in enzyme activities, indicated swift divergence in digestive enzyme secretion. Moreover, we identified introgressed genes linked to organ development in sympatric fishes with analogous feeding habits within the Cultrinae and Leuciscinae subfamilies. CONCLUSIONS Our findings highlight the significant role played by incomplete reproductive isolation and frequent gene flow events, particularly those associated with the development of digestive organs, in driving speciation among cyprinid fishes in diverse freshwater ecosystems.
Collapse
Affiliation(s)
- Li Ren
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Xiaolong Tu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650201, China
- Kunming College of Life Science, University of the Chinese Academy of Sciences, Kunming 650204, China
| | - Mengxue Luo
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Qizhi Liu
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Jialin Cui
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Xin Gao
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Hong Zhang
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Yakui Tai
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Yiyan Zeng
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Mengdan Li
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Chang Wu
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Wuhui Li
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Jing Wang
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650201, China
- Kunming Natural History Museum of Zoology, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Shaojun Liu
- State Key Laboratory of Developmental Biology of Freshwater Fish, Engineering Research Center of Polyploid Fish Reproduction and Breeding of the State Education Ministry, College of Life Sciences, Hunan Normal University, Changsha 410081, China
| |
Collapse
|
4
|
Chernomor O, Elgert C, von Haeseler A. Gentrius: Generating Trees Compatible With a Set of Unrooted Subtrees and its Application to Phylogenetic Terraces. Mol Biol Evol 2024; 41:msae219. [PMID: 39431557 PMCID: PMC11536181 DOI: 10.1093/molbev/msae219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 09/30/2024] [Accepted: 10/11/2024] [Indexed: 10/22/2024] Open
Abstract
For a set of binary unrooted subtrees generating all binary unrooted trees compatible with them, i.e. generating their stand, is one of the classical problems in phylogenetics. Here, we introduce Gentrius-an efficient algorithm to tackle this task. The algorithm has a direct application in practice. Namely, Gentrius generates phylogenetic terraces-topologically distinct, equally scoring trees due to missing data. Despite stand generation being computationally intractable, we showed on simulated and biological datasets that Gentrius generates stands with millions of trees in feasible time. We exemplify that depending on the distribution of missing data across species and loci and the inferred phylogeny, the number of equally optimal terrace trees varies tremendously. The strict consensus tree computed from them displays all the branches unaffected by the pattern of missing data. Thus, by solving the problem of stand generation, in practice Gentrius provides an important systematic assessment of phylogenetic trees inferred from incomplete data. Furthermore, Gentrius can aid theoretical research by fostering understanding of tree space structure imposed by missing data.
Collapse
Affiliation(s)
- Olga Chernomor
- Center for Integrative Bioinformatics Vienna (CIBIV), Max Perutz Laboratories, University of Vienna and Medical University of Vienna, Vienna Bio Center (VBC), Vienna, Austria
| | - Christiane Elgert
- Center for Integrative Bioinformatics Vienna (CIBIV), Max Perutz Laboratories, University of Vienna and Medical University of Vienna, Vienna Bio Center (VBC), Vienna, Austria
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna (CIBIV), Max Perutz Laboratories, University of Vienna and Medical University of Vienna, Vienna Bio Center (VBC), Vienna, Austria
- Department of Computer Science, University of Vienna, Vienna, Austria
- Ludwig Boltzmann Institute for Network Medicine, University of Vienna, Vienna, Austria
| |
Collapse
|
5
|
Liu C, Zhou X, Li Y, Hittinger CT, Pan R, Huang J, Chen XX, Rokas A, Chen Y, Shen XX. The Influence of the Number of Tree Searches on Maximum Likelihood Inference in Phylogenomics. Syst Biol 2024; 73:807-822. [PMID: 38940001 DOI: 10.1093/sysbio/syae031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 06/20/2024] [Accepted: 06/26/2024] [Indexed: 06/29/2024] Open
Abstract
Maximum likelihood (ML) phylogenetic inference is widely used in phylogenomics. As heuristic searches most likely find suboptimal trees, it is recommended to conduct multiple (e.g., 10) tree searches in phylogenetic analyses. However, beyond its positive role, how and to what extent multiple tree searches aid ML phylogenetic inference remains poorly explored. Here, we found that a random starting tree was not as effective as the BioNJ and parsimony starting trees in inferring the ML gene tree and that RAxML-NG and PhyML were less sensitive to different starting trees than IQ-TREE. We then examined the effect of the number of tree searches on ML tree inference with IQ-TREE and RAxML-NG, by running 100 tree searches on 19,414 gene alignments from 15 animal, plant, and fungal phylogenomic datasets. We found that the number of tree searches substantially impacted the recovery of the best-of-100 ML gene tree topology among 100 searches for a given ML program. In addition, all of the concatenation-based trees were topologically identical if the number of tree searches was ≥10. Quartet-based ASTRAL trees inferred from 1 to 80 tree searches differed topologically from those inferred from 100 tree searches for 6/15 phylogenomic datasets. Finally, our simulations showed that gene alignments with lower difficulty scores had a higher chance of finding the best-of-100 gene tree topology and were more likely to yield the correct trees.
Collapse
Affiliation(s)
- Chao Liu
- Department of Plant Protection, Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou 310058, China
- Centre for Evolutionary & Organismal Biology, Zhejiang University, Hangzhou 310058, China
| | - Xiaofan Zhou
- Guangdong Province Key Laboratory of Microbial Signals and Disease Control, Integrative Microbiology Research Centre, South China Agricultural University, Guangzhou 510642, China
| | - Yuanning Li
- Institute of Marine Science and Technology, Shandong University, Qingdao 266237, China
- Department of Biological Sciences and Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Chris Todd Hittinger
- Laboratory of Genetics, Wisconsin Energy Institute, Center for Genomic Science Innovation, DOE Great Lakes Bioenergy Research Center, J. F. Crow Institute for the Study of Evolution, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Ronghui Pan
- ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 310027, China
| | - Jinyan Huang
- Zhejiang Provincial Key Laboratory of Pancreatic Disease, Zhejiang University School of Medicine First Affiliated Hospital, Hangzhou 310003, China
| | - Xue-Xin Chen
- Department of Plant Protection, Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou 310058, China
| | - Antonis Rokas
- Department of Biological Sciences and Evolutionary Studies Initiative, Vanderbilt University, Nashville, TN 37235, USA
| | - Yun Chen
- Department of Plant Protection, Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou 310058, China
| | - Xing-Xing Shen
- Department of Plant Protection, Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou 310058, China
- Centre for Evolutionary & Organismal Biology, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
6
|
Ecker N, Huchon D, Mansour Y, Mayrose I, Pupko T. A machine-learning-based alternative to phylogenetic bootstrap. Bioinformatics 2024; 40:i208-i217. [PMID: 38940166 PMCID: PMC11211842 DOI: 10.1093/bioinformatics/btae255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Currently used methods for estimating branch support in phylogenetic analyses often rely on the classic Felsenstein's bootstrap, parametric tests, or their approximations. As these branch support scores are widely used in phylogenetic analyses, having accurate, fast, and interpretable scores is of high importance. RESULTS Here, we employed a data-driven approach to estimate branch support values with a probabilistic interpretation. To this end, we simulated thousands of realistic phylogenetic trees and the corresponding multiple sequence alignments. Each of the obtained alignments was used to infer the phylogeny using state-of-the-art phylogenetic inference software, which was then compared to the true tree. Using these extensive data, we trained machine-learning algorithms to estimate branch support values for each bipartition within the maximum-likelihood trees obtained by each software. Our results demonstrate that our model provides fast and more accurate probability-based branch support values than commonly used procedures. We demonstrate the applicability of our approach on empirical datasets. AVAILABILITY AND IMPLEMENTATION The data supporting this work are available in the Figshare repository at https://doi.org/10.6084/m9.figshare.25050554.v1, and the underlying code is accessible via GitHub at https://github.com/noaeker/bootstrap_repo.
Collapse
Affiliation(s)
- Noa Ecker
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Dorothée Huchon
- School of Zoology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
- The Steinhardt Museum of Natural History and National Research Center, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Yishay Mansour
- The Blavatnik School of Computer Science, Raymond & Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 6997801, Israel
| |
Collapse
|
7
|
Togkousidis A, Kozlov OM, Haag J, Höhler D, Stamatakis A. Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty. Mol Biol Evol 2023; 40:msad227. [PMID: 37804116 PMCID: PMC10584362 DOI: 10.1093/molbev/msad227] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 09/06/2023] [Accepted: 09/26/2023] [Indexed: 10/08/2023] Open
Abstract
Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the dataset difficulty with respect to phylogenetic inference. Easy multiple sequence alignments (MSAs) exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. As difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. To make use of this information, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty. Our adaptive strategy is based upon three observations. First, on easy datasets, searches converge rapidly and can hence be terminated at an earlier stage. Second, overanalyzing difficult datasets is hopeless, and thus it suffices to quickly infer only one of the numerous almost equally likely topologies to reduce overall execution time. Third, more extensive searches are justified and required on datasets with intermediate difficulty. While the likelihood surface exhibits multiple locally optimal peaks in this case, a small proportion of them is significantly better. Our experimental results for the adaptive heuristic on 9,515 empirical and 5,000 simulated datasets with varying difficulty exhibit substantial speedups, especially on easy and difficult datasets (53% of total MSAs), where we observe average speedups of more than 10×. Further, approximately 94% of the inferred trees using the adaptive strategy are statistically indistinguishable from the trees inferred under the standard strategy (RAxML-NG).
Collapse
Affiliation(s)
- Anastasis Togkousidis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - Oleksiy M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - Julia Haag
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - Dimitri Höhler
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
- Biodiversity Computing Group, Institute of Computer Science, Foundation for Research and Technology - Hellas, GR - 711 10 Heraklion, Crete, Greece
| |
Collapse
|
8
|
Wang H, Wu Y, He Y, Li G, Ma L, Li S, Huang J, Yang G. High-quality chromosome-level de novo assembly of the Trifolium repens. BMC Genomics 2023; 24:326. [PMID: 37312068 DOI: 10.1186/s12864-023-09437-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 06/08/2023] [Indexed: 06/15/2023] Open
Abstract
BACKGROUND White clover (Trifolium repens L.), an excellent perennial legume forage, is an allotetraploid native to southeastern Europe and southern Asia. It has high nutritional, ecological, genetic breeding, and medicinal values and exhibits excellent resistance to cold, drought, trample, and weed infestation. Thus, white clover is widely planted in Europe, America, and China; however, the lack of reference genome limits its breeding and cultivation. This study generated a white clover de novo genome assembly at the chromosomal level and annotated its components. RESULTS The PacBio third-generation Hi-Fi assembly and sequencing methods generated a 1096 Mb genome size of T. repens, with contigs of N50 = 14 Mb, scaffolds of N50 = 65 Mb, and BUSCO value of 98.5%. The newly assembled genome has better continuity and integrity than the previously reported white clover reference genome; thus provides important resources for the molecular breeding and evolution of white clover and other forage. Additionally, we annotated 90,128 high-confidence gene models from the genome. White clover was closely related to Trifolium pratense and Trifolium medium but distantly related to Glycine max, Vigna radiata, Medicago truncatula, and Cicer arietinum. The expansion, contraction, and GO functional enrichment analysis of the gene families showed that T. repens gene families were associated with biological processes, molecular function, cellular components, and environmental resistance, which explained its excellent agronomic traits. CONCLUSIONS This study reports a high-quality de novo assembly of white clover genome obtained at the chromosomal level using PacBio Hi-Fi sequencing, a third-generation sequencing. The generated high-quality genome assembly of white clover provides a key basis for accelerating the research and molecular breeding of this important forage crop. The genome is also valuable for future studies on legume forage biology, evolution, and genome-wide mapping of quantitative trait loci associated with the relevant agronomic traits.
Collapse
Affiliation(s)
- Hongjie Wang
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration On Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Yongqiang Wu
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration On Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Yong He
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration On Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Guoyu Li
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
| | - Lichao Ma
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration On Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Shuo Li
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration On Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | | | - Guofeng Yang
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China.
- Key Laboratory of National Forestry and Grassland Administration On Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China.
| |
Collapse
|
9
|
Data on the solution and processing time reached when constructing a phylogenetic tree using a quantum-inspired computer. Data Brief 2023; 47:108970. [PMID: 36875213 PMCID: PMC9978462 DOI: 10.1016/j.dib.2023.108970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 01/30/2023] [Accepted: 02/06/2023] [Indexed: 02/16/2023] Open
Abstract
Phylogenetic trees provide insight into the evolutionary trajectories of species and molecules. However, because (2n-5)! Phylogenetic trees can be constructed from a dataset containing n sequences, but this method of phylogenetic tree construction is not ideal from the viewpoint of a combinatorial explosion to determine the optimal tree using brute force. Therefore, we developed a method for constructing a phylogenetic tree using a Fujitsu Digital Annealer, a quantum-inspired computer that solves combinatorial optimization problems at a high speed. Specifically, phylogenetic trees are generated by repeating the process of partitioning a set of sequences into two parts (i.e., the graph-cut problem). Here, the optimality of the solution (normalized cut value) obtained by the proposed method was compared with the existing methods using simulated and real data. The simulation dataset contained 32-3200 sequences, and the average branch length according to a normal distribution or the Yule model ranged from 0.125 to 0.750, covering a wide range of sequence diversity. In addition, the statistical information of the dataset is described in terms of two indices: transitivity and average p-distance. As phylogenetic tree construction methods are expected to continue to improve, we believe that this dataset can be used as a reference for comparison and confirmation of the validity of the results. Further interpretation of these analyses is explained in W. Onodera, N. Hara, S. Aoki, T. Asahi, N. Sawamura, Phylogenetic tree reconstruction via graph cut presented using a quantum-inspired computer, Mol. Phylogenet. Evol. 178 (2023) 107636.
Collapse
|
10
|
Onodera W, Hara N, Aoki S, Asahi T, Sawamura N. Phylogenetic tree reconstruction via graph cut presented using a quantum-inspired computer. Mol Phylogenet Evol 2023; 178:107636. [PMID: 36208695 DOI: 10.1016/j.ympev.2022.107636] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2022] [Revised: 09/05/2022] [Accepted: 09/29/2022] [Indexed: 11/06/2022]
Abstract
Phylogenetic trees are essential tools in evolutionary biology that present information on evolutionary events among organisms and molecules. From a dataset of n sequences, a phylogenetic tree of (2n-5)!! possible topologies exists, and determining the optimum topology using brute force is infeasible. Recently, a recursive graph cut on a graph-represented-similarity matrix has proven accurate in reconstructing a phylogenetic tree containing distantly related sequences. However, identifying the optimum graph cut is challenging, and approximate solutions are currently utilized. Here, a phylogenetic tree was reconstructed with an improved graph cut using a quantum-inspired computer, the Fujitsu Digital Annealer (DA), and the algorithm was named the "Normalized-Minimum cut by Digital Annealer (NMcutDA) method". First, a criterion for the graph cut, the normalized cut value, was compared with existing clustering methods. Based on the cut, we verified that the simulated phylogenetic tree could be reconstructed with the highest accuracy when sequences were diverged. Moreover, for some actual data from the structure-based protein classification database, only NMcutDA could cluster sequences into correct superfamilies. Conclusively, NMcutDA reconstructed better phylogenetic trees than those using other methods by optimizing the graph cut. We anticipate that when the diversity of sequences is sufficiently high, NMcutDA can be utilized with high efficiency.
Collapse
Affiliation(s)
- Wataru Onodera
- Faculty of Science and Engineering, Waseda University, TWIns, 2-2 Wakamatsu, Shinjuku, Tokyo 162-8480, Japan
| | | | - Shiho Aoki
- Faculty of Science and Engineering, Waseda University, TWIns, 2-2 Wakamatsu, Shinjuku, Tokyo 162-8480, Japan
| | - Toru Asahi
- Faculty of Science and Engineering, Waseda University, TWIns, 2-2 Wakamatsu, Shinjuku, Tokyo 162-8480, Japan; Research Organization for Nano & Life Innovation, Waseda University, Japan
| | - Naoya Sawamura
- Research Organization for Nano & Life Innovation, Waseda University, Japan; Green Computing Systems Research Organization, Waseda University, Japan.
| |
Collapse
|
11
|
Haag J, Höhler D, Bettisworth B, Stamatakis A. From Easy to Hopeless-Predicting the Difficulty of Phylogenetic Analyses. Mol Biol Evol 2022; 39:6832260. [PMID: 36395091 PMCID: PMC9728795 DOI: 10.1093/molbev/msac254] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Phylogenetic analyzes under the Maximum-Likelihood (ML) model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topologically highly distinct yet statistically indistinguishable topologies. At present, no method exists to quantify and predict this behavior. We introduce a method to quantify the degree of difficulty for analyzing a dataset and present Pythia, a Random Forest Regressor that accurately predicts this difficulty. Pythia predicts the degree of difficulty of analyzing a dataset prior to initiating ML-based tree inferences. Pythia can be used to increase user awareness with respect to the amount of signal and uncertainty to be expected in phylogenetic analyzes, and hence inform an appropriate (post-)analysis setup. Further, it can be used to select appropriate search algorithms for easy-, intermediate-, and hard-to-analyze datasets.
Collapse
Affiliation(s)
| | - Dimitri Höhler
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Ben Bettisworth
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany,Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| |
Collapse
|
12
|
Yan Z, Sang L, Ma Y, He Y, Sun J, Ma L, Li S, Miao F, Zhang Z, Huang J, Wang Z, Yang G. A de novo assembled high-quality chromosome-scale Trifolium pratense genome and fine-scale phylogenetic analysis. BMC PLANT BIOLOGY 2022; 22:332. [PMID: 35820796 PMCID: PMC9277957 DOI: 10.1186/s12870-022-03707-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 06/20/2022] [Indexed: 05/12/2023]
Abstract
BACKGROUND Red clover (Trifolium pratense L.) is a diploid perennial temperate legume with 14 chromosomes (2n = 14) native to Europe and West Asia, with high nutritional and economic value. It is a very important forage grass and is widely grown in marine climates, such as the United States and Sweden. Genetic research and molecular breeding are limited by the lack of high-quality reference genomes. In this study, we used Illumina, PacBio HiFi, and Hi-C to obtain a high-quality chromosome-scale red clover genome and used genome annotation results to analyze evolutionary relationships among related species. RESULTS The red clover genome obtained by PacBio HiFi assembly sequencing was 423 M. The assembly quality was the highest among legume genome assemblies published to date. The contig N50 was 13 Mb, scaffold N50 was 55 Mb, and BUSCO completeness was 97.9%, accounting for 92.8% of the predicted genome. Genome annotation revealed 44,588 gene models with high confidence and 52.81% repetitive elements in red clover genome. Based on a comparison of genome annotation results, red clover was closely related to Trifolium medium and distantly related to Glycine max, Vigna radiata, Medicago truncatula, and Cicer arietinum among legumes. Analyses of gene family expansions and contractions and forward gene selection revealed gene families and genes related to environmental stress resistance and energy metabolism. CONCLUSIONS We report a high-quality de novo genome assembly for the red clover at the chromosome level, with a substantial improvement in assembly quality over those of previously published red clover genomes. These annotated gene models can provide an important resource for molecular genetic breeding and legume evolution studies. Furthermore, we analyzed the evolutionary relationships among red clover and closely related species, providing a basis for evolutionary studies of clover leaf and legumes, genomics analyses of forage grass, the improvement of agronomic traits.
Collapse
Affiliation(s)
- Zhenfei Yan
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Lijun Sang
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Yue Ma
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Yong He
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Juan Sun
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Lichao Ma
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Shuo Li
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Fuhong Miao
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China
| | - Zixin Zhang
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China
| | | | - Zengyu Wang
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China.
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China.
| | - Guofeng Yang
- College of Grassland Science, Qingdao Agricultural University, Qingdao, 266109, China.
- Key Laboratory of National Forestry and Grassland Administration on Grassland Resources and Ecology in the Yellow River Delta, Qingdao, 266109, China.
| |
Collapse
|
13
|
Zhou W, Zhang X, Wang A, Yang L, Gan Q, Yi L, Summons RE, Volkman JK, Lu Y. Widespread Sterol Methyltransferase Participates in the Biosynthesis of Both C4α- and C4β-Methyl Sterols. J Am Chem Soc 2022; 144:9023-9032. [PMID: 35561259 PMCID: PMC9136925 DOI: 10.1021/jacs.2c01401] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
The 4-methyl steranes
serve as molecular fossils and are used for
studying both eukaryotic evolution and geological history. The occurrence
of 4α-methyl steranes in sediments has long been considered
evidence of products of partial demethylation mediated by sterol methyl
oxidases (SMOs), while 4β-methyl steranes are attributed entirely
to diagenetic generation from 4α-methyl steroids since possible
biological sources of their precursor 4β-methyl sterols are
unknown. Here, we report a previously unknown C4-methyl sterol biosynthetic
pathway involving a sterol methyltransferase rather than the SMOs.
We show that both C4α- and C4β-methyl sterols are end
products of the sterol biosynthetic pathway in an endosymbiont of
reef corals, Breviolum minutum, while
this mechanism exists not only in dinoflagellates but also in eukaryotes
from alveolates, haptophytes, and aschelminthes. Our discovery provides
a previously untapped route for the generation of C4-methyl steranes
and overturns the paradigm that all 4β-methyl steranes are diagenetically
generated from the 4α isomers. This may facilitate the interpretation
of molecular fossils and understanding of the evolution of eukaryotic
life in general.
Collapse
Affiliation(s)
- Wenxu Zhou
- State Key Laboratory of Marine Resource Utilization in South China Sea, College of Oceanology, Hainan University, Haikou 570228, China
| | - Xu Zhang
- State Key Laboratory of Marine Resource Utilization in South China Sea, College of Oceanology, Hainan University, Haikou 570228, China
| | - Aoqi Wang
- State Key Laboratory of Marine Resource Utilization in South China Sea, College of Oceanology, Hainan University, Haikou 570228, China
| | - Lin Yang
- State Key Laboratory of Marine Resource Utilization in South China Sea, College of Oceanology, Hainan University, Haikou 570228, China
| | - Qinhua Gan
- State Key Laboratory of Marine Resource Utilization in South China Sea, College of Oceanology, Hainan University, Haikou 570228, China
| | - Liang Yi
- State Key Laboratory of Marine Geology, Tongji University, Shanghai 200092, China
| | - Roger E Summons
- Department of Earth, Atmospheric and Planetary Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - John K Volkman
- CSIRO Oceans and Atmosphere, GPO Box 1538, Hobart, Tasmania 7001, Australia
| | - Yandu Lu
- State Key Laboratory of Marine Resource Utilization in South China Sea, College of Oceanology, Hainan University, Haikou 570228, China
| |
Collapse
|