1
|
Hjelmen CE. Genome size and chromosome number are critical metrics for accurate genome assembly assessment in Eukaryota. Genetics 2024; 227:iyae099. [PMID: 38869251 DOI: 10.1093/genetics/iyae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Revised: 04/02/2024] [Accepted: 06/06/2024] [Indexed: 06/14/2024] Open
Abstract
The number of genome assemblies has rapidly increased in recent history, with NCBI databases reaching over 41,000 eukaryotic genome assemblies across about 2,300 species. Increases in read length and improvements in assembly algorithms have led to increased contiguity and larger genome assemblies. While this number of assemblies is impressive, only about a third of these assemblies have corresponding genome size estimations for their respective species on publicly available databases. In this paper, genome assemblies are assessed regarding their total size compared to their respective publicly available genome size estimations. These deviations in size are assessed related to genome size, kingdom, sequencing platform, and standard assembly metrics, such as N50 and BUSCO values. A large proportion of assemblies deviate from their estimated genome size by more than 10%, with increasing deviations in size with increased genome size, suggesting nonprotein coding and structural DNA may be to blame. Modest differences in performance of sequencing platforms are noted as well. While standard metrics of genome assessment are more likely to indicate an assembly approaching the estimated genome size, much of the variation in this deviation in size is not explained with these raw metrics. A new, proportional N50 metric is proposed, in which N50 values are made relative to the average chromosome size of each species. This new metric has a stronger relationship with complete genome assemblies and, due to its proportional nature, allows for a more direct comparison across assemblies for genomes with variation in sizes and architectures.
Collapse
Affiliation(s)
- Carl E Hjelmen
- Department of Biology, Utah Valley University, 800 W. University Parkway, Orem, UT 84058, USA
| |
Collapse
|
2
|
Gable SM, Bushroe N, Mendez J, Wilson A, Pinto B, Gamble T, Tollis M. Differential Conservation and Loss of CR1 Retrotransposons in Squamates Reveals Lineage-Specific Genome Dynamics across Reptiles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.09.579686. [PMID: 38405926 PMCID: PMC10888918 DOI: 10.1101/2024.02.09.579686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Transposable elements (TEs) are repetitive DNA sequences which create mutations and generate genetic diversity across the tree of life. In amniotic vertebrates, TEs have been mainly studied in mammals and birds, whose genomes generally display low TE diversity. Squamates (Order Squamata; ~11,000 extant species of lizards and snakes) show as much variation in TE abundance and activity as they do in species and phenotypes. Despite this high TE activity, squamate genomes are remarkably uniform in size. We hypothesize that novel, lineage-specific dynamics have evolved over the course of squamate evolution to constrain genome size across the order. Thus, squamates may represent a prime model for investigations into TE diversity and evolution. To understand the interplay between TEs and host genomes, we analyzed the evolutionary history of the CR1 retrotransposon, a TE family found in most tetrapod genomes. We compared 113 squamate genomes to the genomes of turtles, crocodilians, and birds, and used ancestral state reconstruction to identify shifts in the rate of CR1 copy number evolution across reptiles. We analyzed the repeat landscapes of CR1 in squamate genomes and determined that shifts in the rate of CR1 copy number evolution are associated with lineage-specific variation in CR1 activity. We then used phylogenetic reconstruction of CR1 subfamilies across amniotes to reveal both recent and ancient CR1 subclades across the squamate tree of life. The patterns of CR1 evolution in squamates contrast other amniotes, suggesting key differences in how TEs interact with different host genomes and at different points across evolutionary history.
Collapse
Affiliation(s)
- Simone M. Gable
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ, USA
| | - Nicholas Bushroe
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ, USA
| | - Jasmine Mendez
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ, USA
| | - Adam Wilson
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ, USA
| | - Brendan Pinto
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI, USA
| | - Tony Gamble
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI, USA
- Department of Biological Sciences, Marquette University, Milwaukee, WI, USA
- Bell Museum of Natural History, University of Minnesota, St. Paul, MN, USA
| | - Marc Tollis
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ, USA
| |
Collapse
|
3
|
Pei Y, Leng L, Sun W, Liu B, Feng X, Li X, Chen S. Whole-genome sequencing in medicinal plants: current progress and prospect. SCIENCE CHINA. LIFE SCIENCES 2024; 67:258-273. [PMID: 37837531 DOI: 10.1007/s11427-022-2375-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 05/23/2023] [Indexed: 10/16/2023]
Abstract
Advancements in genomics have dramatically accelerated the research on medicinal plants, and the development of herbgenomics has promoted the "Project of 1K Medicinal Plant Genome" to decipher their genetic code. However, it is difficult to obtain their high-quality whole genomes because of the prevalence of polyploidy and/or high genomic heterozygosity. Whole genomes of 123 medicinal plants were published until September 2022. These published genome sequences were investigated in this review, covering their classification, research teams, ploidy, medicinal functions, and sequencing strategies. More than 1,000 institutes or universities around the world and 50 countries are conducting research on medicinal plant genomes. Diploid species account for a majority of sequenced medicinal plants. The whole genomes of plants in the Poaceae family are the most studied. Almost 40% of the published papers studied species with tonifying, replenishing, and heat-cleaning medicinal effects. Medicinal plants are still in the process of domestication as compared with crops, thereby resulting in unclear genetic backgrounds and the lack of pure lines, thus making their genomes more difficult to complete. In addition, there is still no clear routine framework for a medicinal plant to obtain a high-quality whole genome. Herein, a clear and complete strategy has been originally proposed for creating a high-quality whole genome of medicinal plants. Moreover, whole genome-based biological studies of medicinal plants, including breeding and biosynthesis, were reviewed. We also advocate that a research platform of model medicinal plants should be established to promote the genomics research of medicinal plants.
Collapse
Affiliation(s)
- Yifei Pei
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Liang Leng
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China
| | - Wei Sun
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Baocai Liu
- Institute of Agricultural Bioresource, Fujian Academy of Agricultural Sciences, Fuzhou, 350003, China
| | - Xue Feng
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Xiwen Li
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing, 100700, China.
| | - Shilin Chen
- Institute of Herbgenomics, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China.
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, China.
| |
Collapse
|
4
|
Gable SM, Mendez JM, Bushroe NA, Wilson A, Byars MI, Tollis M. The State of Squamate Genomics: Past, Present, and Future of Genome Research in the Most Speciose Terrestrial Vertebrate Order. Genes (Basel) 2023; 14:1387. [PMID: 37510292 PMCID: PMC10379679 DOI: 10.3390/genes14071387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 06/28/2023] [Accepted: 06/29/2023] [Indexed: 07/30/2023] Open
Abstract
Squamates include more than 11,000 extant species of lizards, snakes, and amphisbaenians, and display a dazzling diversity of phenotypes across their over 200-million-year evolutionary history on Earth. Here, we introduce and define squamates (Order Squamata) and review the history and promise of genomic investigations into the patterns and processes governing squamate evolution, given recent technological advances in DNA sequencing, genome assembly, and evolutionary analysis. We survey the most recently available whole genome assemblies for squamates, including the taxonomic distribution of available squamate genomes, and assess their quality metrics and usefulness for research. We then focus on disagreements in squamate phylogenetic inference, how methods of high-throughput phylogenomics affect these inferences, and demonstrate the promise of whole genomes to settle or sustain persistent phylogenetic arguments for squamates. We review the role transposable elements play in vertebrate evolution, methods of transposable element annotation and analysis, and further demonstrate that through the understanding of the diversity, abundance, and activity of transposable elements in squamate genomes, squamates can be an ideal model for the evolution of genome size and structure in vertebrates. We discuss how squamate genomes can contribute to other areas of biological research such as venom systems, studies of phenotypic evolution, and sex determination. Because they represent more than 30% of the living species of amniote, squamates deserve a genome consortium on par with recent efforts for other amniotes (i.e., mammals and birds) that aim to sequence most of the extant families in a clade.
Collapse
Affiliation(s)
- Simone M Gable
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Jasmine M Mendez
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Nicholas A Bushroe
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Adam Wilson
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Michael I Byars
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Marc Tollis
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| |
Collapse
|
5
|
Mokhtar MM, Abd-Elhalim HM, El Allali A. A large-scale assessment of the quality of plant genome assemblies using the LTR assembly index. AOB PLANTS 2023; 15:plad015. [PMID: 37197714 PMCID: PMC10184434 DOI: 10.1093/aobpla/plad015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 04/01/2023] [Indexed: 05/19/2023]
Abstract
Recent advances in genome sequencing have led to an increase in the number of sequenced genomes. However, the presence of repetitive sequences complicates the assembly of plant genomes. The LTR assembly index (LAI) has recently been widely used to assess the quality of genome assembly, as a higher LAI is associated with a higher quality of assembly. Here, we assessed the quality of assembled genomes of 1664 plant and algal genomes using LAI and reported the results as data repository called PlantLAI (https://bioinformatics.um6p.ma/PlantLAI). A number of 55 117 586 pseudomolecules/scaffolds with a total length of 988.11 gigabase-pairs were examined using the LAI workflow. A total of 46 583 551 accurate LTR-RTs were discovered, including 2 263 188 Copia, 2 933 052 Gypsy, and 1 387 311 unknown superfamilies. Consequently, only 1136 plant genomes are suitable for LAI calculation, with values ranging from 0 to 31.59. Based on the quality classification system, 476 diploid genomes were classified as draft, 472 as reference, and 135 as gold genomes. We also provide a free webtool to calculate the LAI of newly assembled genomes and the ability to save the result in the repository. The data repository is designed to fill in the gaps in the reported LAI of existing genomes, while the webtool is designed to help researchers calculate the LAI of their newly sequenced genomes.
Collapse
Affiliation(s)
| | - Haytham M Abd-Elhalim
- Agricultural Genetic Engineering Research Institute, Agricultural Research Center, Giza 12619, Egypt
| | | |
Collapse
|
6
|
Hassan SU, Chua EG, Paz EA, Tay CY, Greeff JC, Palmer DG, Dudchenko O, Aiden EL, Martin GB, Kaur P. Chromosome-length genome assembly of Teladorsagia circumcincta - a globally important helminth parasite in livestock. BMC Genomics 2023; 24:74. [PMID: 36792983 PMCID: PMC9933375 DOI: 10.1186/s12864-023-09172-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 02/08/2023] [Indexed: 02/17/2023] Open
Abstract
BACKGROUND Gastrointestinal (GIT) helminthiasis is a global problem that affects livestock health, especially in small ruminants. One of the major helminth parasites of sheep and goats, Teladorsagia circumcincta, infects the abomasum and causes production losses, reductions in weight gain, diarrhoea and, in some cases, death in young animals. Control strategies have relied heavily on the use of anthelmintic medication but, unfortunately, T. circumcincta has developed resistance, as have many helminths. Vaccination offers a sustainable and practical solution, but there is no commercially available vaccine to prevent Teladorsagiosis. The discovery of new strategies for controlling T. circumcincta, such as novel vaccine targets and drug candidates, would be greatly accelerated by the availability of better quality, chromosome-length, genome assembly because it would allow the identification of key genetic determinants of the pathophysiology of infection and host-parasite interaction. The available draft genome assembly of T. circumcincta (GCA_002352805.1) is highly fragmented and thus impedes large-scale investigations of population and functional genomics. RESULTS We have constructed a high-quality reference genome, with chromosome-length scaffolds, by purging alternative haplotypes from the existing draft genome assembly and scaffolding the result using chromosome conformation, capture-based, in situ Hi-C technique. The improved (Hi-C) assembly resulted in six chromosome-length scaffolds with length ranging from 66.6 Mbp to 49.6 Mbp, 35% fewer sequences and reduction in size. Substantial improvements were also achieved in both the values for N50 (57.1 Mbp) and L50 (5 Mbp). A higher and comparable level of genome and proteome completeness was achieved for Hi-C assembly on BUSCO parameters. The Hi-C assembly had a greater synteny and number of orthologs with a closely related nematode, Haemonchus contortus. CONCLUSION This improved genomic resource is suitable as a foundation for the identification of potential targets for vaccine and drug development.
Collapse
Affiliation(s)
- Shamshad Ul Hassan
- UWA School of Agriculture and Environment, The University of Western Australia, 6009, Crawley, WA, Australia
- Helicobacter Research Laboratory, The Marshall Centre for Infectious Disease Research and Training, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia
| | - Eng Guan Chua
- Helicobacter Research Laboratory, The Marshall Centre for Infectious Disease Research and Training, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia
| | - Erwin A Paz
- UWA School of Agriculture and Environment, The University of Western Australia, 6009, Crawley, WA, Australia
- Helicobacter Research Laboratory, The Marshall Centre for Infectious Disease Research and Training, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia
| | - Chin Yen Tay
- Helicobacter Research Laboratory, The Marshall Centre for Infectious Disease Research and Training, School of Biomedical Sciences, University of Western Australia, Perth, WA, Australia
| | - Johan C Greeff
- Department of Primary Industries and Regional Development, Western Australia 3 Baron Hay Court, South Perth, 6151, WA, Australia
| | - Dieter G Palmer
- Department of Primary Industries and Regional Development, Western Australia 3 Baron Hay Court, South Perth, 6151, WA, Australia
| | - Olga Dudchenko
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, 77030, Houston, TX, USA
- Center for Theoretical Biological Physics, Rice University, 77005, Houston, TX, USA
| | - Erez Lieberman Aiden
- The Center for Genome Architecture, Department of Molecular and Human Genetics, Baylor College of Medicine, 77030, Houston, TX, USA
- Center for Theoretical Biological Physics, Rice University, 77005, Houston, TX, USA
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech, Pudong, China
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Graeme B Martin
- UWA School of Agriculture and Environment, The University of Western Australia, 6009, Crawley, WA, Australia
| | - Parwinder Kaur
- UWA School of Agriculture and Environment, The University of Western Australia, 6009, Crawley, WA, Australia.
| |
Collapse
|
7
|
Pathak RK, Kim JM. Vetinformatics from functional genomics to drug discovery: Insights into decoding complex molecular mechanisms of livestock systems in veterinary science. Front Vet Sci 2022; 9:1008728. [PMID: 36439342 PMCID: PMC9691653 DOI: 10.3389/fvets.2022.1008728] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 10/31/2022] [Indexed: 09/28/2023] Open
Abstract
Having played important roles in human growth and development, livestock animals are regarded as integral parts of society. However, industrialization has depleted natural resources and exacerbated climate change worldwide, spurring the emergence of various diseases that reduce livestock productivity. Meanwhile, a growing human population demands sufficient food to meet their needs, necessitating innovations in veterinary sciences that increase productivity both quantitatively and qualitatively. We have been able to address various challenges facing veterinary and farm systems with new scientific and technological advances, which might open new opportunities for research. Recent breakthroughs in multi-omics platforms have produced a wealth of genetic and genomic data for livestock that must be converted into knowledge for breeding, disease prevention and management, productivity, and sustainability. Vetinformatics is regarded as a new bioinformatics research concept or approach that is revolutionizing the field of veterinary science. It employs an interdisciplinary approach to understand the complex molecular mechanisms of animal systems in order to expedite veterinary research, ensuring food and nutritional security. This review article highlights the background, recent advances, challenges, opportunities, and application of vetinformatics for quality veterinary services.
Collapse
Affiliation(s)
| | - Jun-Mo Kim
- Department of Animal Science and Technology, Chung-Ang University, Anseong-si, South Korea
| |
Collapse
|
8
|
Mashanov V, Machado DJ, Reid R, Brouwer C, Kofsky J, Janies DA. Twinkle twinkle brittle star: the draft genome of Ophioderma brevispinum (Echinodermata: Ophiuroidea) as a resource for regeneration research. BMC Genomics 2022; 23:574. [PMID: 35953768 PMCID: PMC9367165 DOI: 10.1186/s12864-022-08750-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 07/08/2022] [Indexed: 12/13/2022] Open
Abstract
Background Echinoderms are established models in experimental and developmental biology, however genomic resources are still lacking for many species. Here, we present the draft genome of Ophioderma brevispinum, an emerging model organism in the field of regenerative biology. This new genomic resource provides a reference for experimental studies of regenerative mechanisms. Results We report a de novo nuclear genome assembly for the brittle star O. brevispinum and annotation facilitated by the transcriptome assembly. The final assembly is 2.68 Gb in length and contains 146,703 predicted protein-coding gene models. We also report a mitochondrial genome for this species, which is 15,831 bp in length, and contains 13 protein-coding, 22 tRNAs, and 2 rRNAs genes, respectively. In addition, 29 genes of the Notch signaling pathway are identified to illustrate the practical utility of the assembly for studies of regeneration. Conclusions The sequenced and annotated genome of O. brevispinum presented here provides the first such resource for an ophiuroid model species. Considering the remarkable regenerative capacity of this species, this genome will be an essential resource in future research efforts on molecular mechanisms regulating regeneration. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-022-08750-y).
Collapse
Affiliation(s)
- Vladimir Mashanov
- Wake Forest Institute for Regenerative Medicine, 391 Technology Way, Winston-Salem, 27101, NC, USA. .,University of North Florida, Department of Biology, 1 UNF Drive, Jacksonville, 32224, FL, USA.
| | - Denis Jacob Machado
- University of North Carolina at Charlotte, College of Computing and Informatics, Department of Bioinformatics and Genomics, 9201 University City Blvd, Charlotte, 28223, NC, USA
| | - Robert Reid
- University of North Carolina at Charlotte, College of Computing and Informatics, North Carolina Research Campus, 150 Research Campus Drive, Kannapolis, 28081, NC, USA
| | - Cory Brouwer
- University of North Carolina at Charlotte, College of Computing and Informatics, North Carolina Research Campus, 150 Research Campus Drive, Kannapolis, 28081, NC, USA
| | - Janice Kofsky
- University of North Carolina at Charlotte, College of Computing and Informatics, Department of Bioinformatics and Genomics, 9201 University City Blvd, Charlotte, 28223, NC, USA
| | - Daniel A Janies
- University of North Carolina at Charlotte, College of Computing and Informatics, Department of Bioinformatics and Genomics, 9201 University City Blvd, Charlotte, 28223, NC, USA
| |
Collapse
|
9
|
Guo R, Papanicolaou A, Fritz ML. Validation of reference-assisted assembly using existing and novel Heliothine genomes. Genomics 2022; 114:110441. [PMID: 35931274 DOI: 10.1016/j.ygeno.2022.110441] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 07/19/2022] [Accepted: 07/29/2022] [Indexed: 11/16/2022]
Abstract
Chloridea subflexa and Chloridea virescens are a pair of closely related noctuid species exhibiting pheromone-based sexual isolation and divergent host plant preferences. We produced a novel Illumina short read C. subflexa genome assembly and an improved C. virescens genome assembly, which offer opportunities to study the genomic basis for evolutionarily important traits in this lepidopteran family with few genomic resources. We then examined the feasibility of reference-assisted assembly, an approach that leverages existing high quality genomic resources for genome improvement in closely related taxa and applied it to our Heliothine genomes. Our work demonstrates that reference-assisted assembly has the potential to enhance contiguity and completeness of existing insect genomic resources with minimal additional laboratory costs. We conclude by discussing both the potential and pitfalls of reference-assisted assembly according to the intended downstream assembly application.
Collapse
Affiliation(s)
- Rong Guo
- Department of Entomology, University of Maryland, College Park, MD 20742, USA; Computational Biology, Bioinformatics and Genomics Program, Department of Biological Sciences, University of Maryland, College Park, MD 20742, USA
| | - Alexie Papanicolaou
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, NSW 2753, Australia.
| | - Megan L Fritz
- Department of Entomology, University of Maryland, College Park, MD 20742, USA; Computational Biology, Bioinformatics and Genomics Program, Department of Biological Sciences, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
10
|
Feron R, Waterhouse RM. Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes. Gigascience 2022; 11:6537158. [PMID: 35217859 PMCID: PMC8881204 DOI: 10.1093/gigascience/giac006] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 12/12/2021] [Accepted: 01/13/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and complete genome assemblies offer a comprehensive and reliable foundation upon which to advance our understanding of organismal biology at genetic, species, and ecosystem levels. However, ever-changing sequencing technologies and analysis methods mean that available data are often heterogeneous in quality. To guide forthcoming genome generation efforts and promote efficient prioritization of resources, it is thus essential to define and monitor taxonomic coverage and quality of the data. FINDINGS Here we present an automated analysis workflow that surveys genome assemblies from the United States NCBI, assesses their completeness using the relevant BUSCO datasets, and collates the results into an interactively browsable resource. We apply our workflow to produce a community resource of available assemblies from the phylum Arthropoda, the Arthropoda Assembly Assessment Catalogue. Using this resource, we survey current taxonomic coverage and assembly quality at the NCBI, examine how key assembly metrics relate to gene content completeness, and compare results from using different BUSCO lineage datasets. CONCLUSIONS These results demonstrate how the workflow can be used to build a community resource that enables large-scale assessments to survey species coverage and data quality of available genome assemblies, and to guide prioritizations for ongoing and future sampling, sequencing, and genome generation initiatives.
Collapse
Affiliation(s)
- Romain Feron
- Department of Ecology and Evolution, Le Biophore UNIL-Sorge, University of Lausanne, Lausanne 1015, Switzerland.,Evolutionary-Functional Genomics Group, L'Amphipole UNIL-Sorge, Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| | - Robert M Waterhouse
- Department of Ecology and Evolution, Le Biophore UNIL-Sorge, University of Lausanne, Lausanne 1015, Switzerland.,Evolutionary-Functional Genomics Group, L'Amphipole UNIL-Sorge, Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
| |
Collapse
|
11
|
Liu HL, Harris AJ, Wang ZF, Chen HF, Li ZA, Wei X. The genome of the Paleogene relic tree Bretschneidera sinensis: insights into trade-offs in gene family evolution, demographic history, and adaptive SNPs. DNA Res 2022; 29:6523039. [PMID: 35137004 PMCID: PMC8825261 DOI: 10.1093/dnares/dsac003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Indexed: 11/13/2022] Open
Abstract
Among relic species, genomic information may provide the key to inferring their long-term survival. Therefore, in this study, we investigated the genome of the Paleogene relic tree species, Bretschneidera sinensis, which is a rare endemic species within southeastern Asia. Specifically, we assembled a high-quality genome for B. sinensis using PacBio high-fidelity and high-throughput chromosome conformation capture reads and annotated it with long and short RNA sequencing reads. Using the genome, we then detected a trade-off between active and passive disease defences among the gene families. Gene families involved in salicylic acid and MAPK signalling pathways expanded as active defence mechanisms against disease, but families involved in terpene synthase activity as passive defences contracted. When inferring the long evolutionary history of B. sinensis, we detected population declines corresponding to historical climate change around the Eocene–Oligocene transition and to climatic fluctuations in the Quaternary. Additionally, based on this genome, we identified 388 single nucleotide polymorphisms (SNPs) that were likely under selection, and showed diverse functions in growth and stress responses. Among them, we further found 41 climate-associated SNPs. The genome of B. sinensis and the SNP dataset will be important resources for understanding extinction/diversification processes using comparative genomics in different lineages.
Collapse
Affiliation(s)
- Hai-Lin Liu
- Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.,University of Chinese Academy of Sciences, Beijing, 100049, China.,Environmental Horticulture Research Institute, Guangdong Academy of Agricultural Sciences, Guangzhou, 510640, China.,Key Laboratory of Ornamental Plant Germplasm Innovation and Utilization, Guangzhou, 510640, China
| | - A J Harris
- Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.,Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
| | - Zheng-Feng Wang
- Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.,Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, 511458, China.,Center of Plant Ecology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou, 510650, China.,Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
| | - Hong-Feng Chen
- Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.,Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
| | - Zhi-An Li
- Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China.,Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, 511458, China.,Center of Plant Ecology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou, 510650, China.,Key Laboratory of Vegetation Restoration and Management of Degraded Ecosystems, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, 510650, China
| | - Xiao Wei
- Guangxi Institute of Botany, Chinese Academy of Sciences, Guilin, 541006, China
| |
Collapse
|
12
|
Yamaguchi K, Kadota M, Nishimura O, Ohishi Y, Naito Y, Kuraku S. Technical considerations in Hi-C scaffolding and evaluation of chromosome-scale genome assemblies. Mol Ecol 2021; 30:5923-5934. [PMID: 34432923 PMCID: PMC9292758 DOI: 10.1111/mec.16146] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 07/28/2021] [Accepted: 08/18/2021] [Indexed: 12/15/2022]
Abstract
The recent development of ecological studies has been fueled by the introduction of massive information based on chromosome-scale genome sequences, even for species for which genetic linkage is not accessible. This was enabled mainly by the application of Hi-C, a method for genome-wide chromosome conformation capture that was originally developed for investigating the long-range interaction of chromatins. Performing genomic scaffolding using Hi-C data is highly resource-demanding and employs elaborate laboratory steps for sample preparation. It starts with building a primary genome sequence assembly as an input, which is followed by computation for genome scaffolding using Hi-C data, requiring careful validation. This article presents technical considerations for obtaining optimal Hi-C scaffolding results and provides a test case of its application to a reptile species, the Madagascar ground gecko (Paroedura picta). Among the metrics that are frequently used for evaluating scaffolding results, we investigate the validity of the completeness assessment of chromosome-scale genome assemblies using single-copy reference orthologues.
Collapse
Affiliation(s)
- Kazuaki Yamaguchi
- Laboratory for PhyloinformaticsRIKEN Center for Biosystems Dynamics ResearchKobeJapan
| | - Mitsutaka Kadota
- Laboratory for PhyloinformaticsRIKEN Center for Biosystems Dynamics ResearchKobeJapan
| | - Osamu Nishimura
- Laboratory for PhyloinformaticsRIKEN Center for Biosystems Dynamics ResearchKobeJapan
| | - Yuta Ohishi
- Laboratory for PhyloinformaticsRIKEN Center for Biosystems Dynamics ResearchKobeJapan
| | - Yuki Naito
- Database Center for Life Science (DBCLS)MishimaJapan
| | - Shigehiro Kuraku
- Laboratory for PhyloinformaticsRIKEN Center for Biosystems Dynamics ResearchKobeJapan
- Molecular Life History LaboratoryNational Institute of GeneticsMishimaJapan
- Department of GeneticsSokendai (Graduate University for Advanced Studies)MishimaJapan
| |
Collapse
|
13
|
CStone: A de novo transcriptome assembler for short-read data that identifies non-chimeric contigs based on underlying graph structure. PLoS Comput Biol 2021; 17:e1009631. [PMID: 34813594 PMCID: PMC8651127 DOI: 10.1371/journal.pcbi.1009631] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 12/07/2021] [Accepted: 11/11/2021] [Indexed: 11/19/2022] Open
Abstract
With the exponential growth of sequence information stored over the last decade, including that of de novo assembled contigs from RNA-Seq experiments, quantification of chimeric sequences has become essential when assembling read data. In transcriptomics, de novo assembled chimeras can closely resemble underlying transcripts, but patterns such as those seen between co-evolving sites, or mapped read counts, become obscured. We have created a de Bruijn based de novo assembler for RNA-Seq data that utilizes a classification system to describe the complexity of underlying graphs from which contigs are created. Each contig is labelled with one of three levels, indicating whether or not ambiguous paths exist. A by-product of this is information on the range of complexity of the underlying gene families present. As a demonstration of CStones ability to assemble high-quality contigs, and to label them in this manner, both simulated and real data were used. For simulated data, ten million read pairs were generated from cDNA libraries representing four species, Drosophila melanogaster, Panthera pardus, Rattus norvegicus and Serinus canaria. These were assembled using CStone, Trinity and rnaSPAdes; the latter two being high-quality, well established, de novo assembers. For real data, two RNA-Seq datasets, each consisting of ≈30 million read pairs, representing two adult D. melanogaster whole-body samples were used. The contigs that CStone produced were comparable in quality to those of Trinity and rnaSPAdes in terms of length, sequence identity of aligned regions and the range of cDNA transcripts represented, whilst providing additional information on chimerism. Here we describe the details of CStones assembly and classification process, and propose that similar classification systems can be incorporated into other de novo assembly tools. Within a related side study, we explore the effects that chimera’s within reference sets have on the identification of differentially expression genes. CStone is available at: https://sourceforge.net/projects/cstone/. Within transcriptome reference sets, non-chimeric sequences are representations of transcribed genes, while artificially generated chimeric ones are mosaics of two or more pieces of DNA incorrectly pieced together. One area where such sets are utilized is in the quantification of gene expression patterns; where RNA-Seq reads are mapped to the sequences within, and subsequent count values reflect expression levels. Artificial chimeras can have a negative impact on count values by erroneously increasing variation in relation to the reads being mapped. Reference sets can be created from de novo assembled contigs, but chimeras can be introduced during the assembly process via the required traversal of graphs, representing gene families, constructed from the RNA-Seq data. Graph complexity determines how likely chimeras will arise. We have created CStone, a de novo assembler that utilizes a classification system to describe such complexity. Contigs created by CStone are labelled in a manner that indicates whether or not they are non-chimeric. This encourages contig dependent results to be presented with increased objectivity by maintaining the context of ambiguity associated with the assembly process. CStone has been tested extensively. Additionally, we have quantified the relationship between chimeras within reference sets and the identification of differentially expressed genes.
Collapse
|
14
|
Wöhner TW, Emeriewen OF, Wittenberg AHJ, Schneiders H, Vrijenhoek I, Halász J, Hrotkó K, Hoff KJ, Gabriel L, Lempe J, Keilwagen J, Berner T, Schuster M, Peil A, Wünsche J, Kropop S, Flachowsky H. The draft chromosome-level genome assembly of tetraploid ground cherry (Prunus fruticosa Pall.) from long reads. Genomics 2021; 113:4173-4183. [PMID: 34774678 DOI: 10.1016/j.ygeno.2021.11.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 10/28/2021] [Accepted: 11/02/2021] [Indexed: 11/26/2022]
Abstract
Cherries are stone fruits and belong to the economically important plant family of Rosaceae with worldwide cultivation of different species. The ground cherry, Prunus fruticosa Pall., is an ancestor of cultivated sour cherry, an important tetraploid cherry species. Here, we present a long read chromosome-level draft genome assembly and related plastid sequences using the Oxford Nanopore Technology PromethION platform and R10.3 pore type. We generated a final consensus genome sequence of 366 Mb comprising eight chromosomes. The N50 scaffold was ~44 Mb with the longest chromosome being 66.5 Mb. The chloroplast and mitochondrial genomes were 158,217 bp and 383,281 bp long, which is in accordance with previously published plastid sequences. This is the first report of the genome of ground cherry (P. fruticosa) sequenced by long read technology only. The datasets obtained from this study provide a foundation for future breeding, molecular and evolutionary analysis in Prunus studies.
Collapse
Affiliation(s)
- Thomas W Wöhner
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Breeding Research on Fruit Crops, Pillnitzer Platz 3a, D-01326, Dresden, Germany.
| | - Ofere F Emeriewen
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Breeding Research on Fruit Crops, Pillnitzer Platz 3a, D-01326, Dresden, Germany
| | | | | | | | - Júlia Halász
- Department of Genetics and Plant Breeding, Faculty of Horticultural Science, Szent István University, Ménesi Str. 44, Budapest 1118, Hungary
| | - Károly Hrotkó
- Department of Floriculture and Dendrology, Institute of Landscape Architecture, Urban Planning and Ornamental Horticulture, Hungarian University of Agriculture and Life Science, Villányi Str. 35-43, Budapest 1118, Hungary
| | - Katharina J Hoff
- Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Str. 47, 17489 Greifswald, Germany; Center for Functional Genomics of Microbes, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Lars Gabriel
- Institute of Mathematics and Computer Science, University of Greifswald, Walther-Rathenau-Str. 47, 17489 Greifswald, Germany; Center for Functional Genomics of Microbes, University of Greifswald, Felix-Hausdorff-Str. 8, 17489 Greifswald, Germany
| | - Janne Lempe
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Breeding Research on Fruit Crops, Pillnitzer Platz 3a, D-01326, Dresden, Germany
| | - Jens Keilwagen
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Biosafety in Plant Biotechnology, Erwin-Baur-Str. 27, D-06484 Quedlinburg, Germany
| | - Thomas Berner
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Biosafety in Plant Biotechnology, Erwin-Baur-Str. 27, D-06484 Quedlinburg, Germany
| | - Mirko Schuster
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Breeding Research on Fruit Crops, Pillnitzer Platz 3a, D-01326, Dresden, Germany
| | - Andreas Peil
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Breeding Research on Fruit Crops, Pillnitzer Platz 3a, D-01326, Dresden, Germany
| | - Jens Wünsche
- University of Hohenheim, Institute of Special Crops and Crop Physiology, 70593 Stuttgart, Germany
| | | | - Henryk Flachowsky
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Breeding Research on Fruit Crops, Pillnitzer Platz 3a, D-01326, Dresden, Germany
| |
Collapse
|
15
|
Albuquerque P, Ribeiro I, Correia S, Mucha AP, Tamagnini P, Braga-Henriques A, Carvalho MDF, Mendes MV. Complete Genome Sequence of Two Deep-Sea Streptomyces Isolates from Madeira Archipelago and Evaluation of Their Biosynthetic Potential. Mar Drugs 2021; 19:md19110621. [PMID: 34822492 PMCID: PMC8622039 DOI: 10.3390/md19110621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/28/2021] [Accepted: 10/28/2021] [Indexed: 11/22/2022] Open
Abstract
The deep-sea constitutes a true unexplored frontier and a potential source of innovative drug scaffolds. Here, we present the genome sequence of two novel marine actinobacterial strains, MA3_2.13 and S07_1.15, isolated from deep-sea samples (sediments and sponge) and collected at Madeira archipelago (NE Atlantic Ocean; Portugal). The de novo assembly of both genomes was achieved using a hybrid strategy that combines short-reads (Illumina) and long-reads (PacBio) sequencing data. Phylogenetic analyses showed that strain MA3_2.13 is a new species of the Streptomyces genus, whereas strain S07_1.15 is closely related to the type strain of Streptomyces xinghaiensis. In silico analysis revealed that the total length of predicted biosynthetic gene clusters (BGCs) accounted for a high percentage of the MA3_2.13 genome, with several potential new metabolites identified. Strain S07_1.15 had, with a few exceptions, a predicted metabolic profile similar to S. xinghaiensis. In this work, we implemented a straightforward approach for generating high-quality genomes of new bacterial isolates and analyse in silico their potential to produce novel NPs. The inclusion of these in silico dereplication steps allows to minimize the rediscovery rates of traditional natural products screening methodologies and expedite the drug discovery process.
Collapse
Affiliation(s)
- Pedro Albuquerque
- i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal; (P.A.); (P.T.)
- IBMC—Instituto de Biologia Molecular e Celular, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal
| | - Inês Ribeiro
- CIIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos s/n, 4450-208 Matosinhos, Portugal; (I.R.); (S.C.); (A.P.M.); (M.d.F.C.)
- ICBAS—Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, Rua de Jorge Viterbo Ferreira 228, 4050-313 Porto, Portugal
| | - Sofia Correia
- CIIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos s/n, 4450-208 Matosinhos, Portugal; (I.R.); (S.C.); (A.P.M.); (M.d.F.C.)
| | - Ana Paula Mucha
- CIIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos s/n, 4450-208 Matosinhos, Portugal; (I.R.); (S.C.); (A.P.M.); (M.d.F.C.)
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, Edifício FC4, 4169-007 Porto, Portugal
| | - Paula Tamagnini
- i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal; (P.A.); (P.T.)
- IBMC—Instituto de Biologia Molecular e Celular, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, Edifício FC4, 4169-007 Porto, Portugal
| | - Andreia Braga-Henriques
- OOM—Oceanic Observatory of Madeira & MARE—Marine and Environmental Sciences Centre, ARDITI—Agência Regional para o Desenvolvimento da Investigação Tecnologia e Inovação, Caminho da Penteada, 9020-105 Funchal, Portugal;
- Regional Directorate for Fisheries, Regional Secretariat for the Sea and Fisheries, Government of the Azores, Rua Cônsul Dabney—Colónia Alemã, 9900-014 Horta, Portugal
| | - Maria de Fátima Carvalho
- CIIMAR—Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Avenida General Norton de Matos s/n, 4450-208 Matosinhos, Portugal; (I.R.); (S.C.); (A.P.M.); (M.d.F.C.)
- ICBAS—Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, Rua de Jorge Viterbo Ferreira 228, 4050-313 Porto, Portugal
| | - Marta V. Mendes
- i3S—Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal; (P.A.); (P.T.)
- IBMC—Instituto de Biologia Molecular e Celular, Universidade do Porto, Rua Alfredo Allen 208, 4200-135 Porto, Portugal
- Correspondence:
| |
Collapse
|
16
|
Heath-Heckman E, Nishiguchi M. Leveraging Short-Read Sequencing to Explore the Genomics of Sepiolid Squid. Integr Comp Biol 2021; 61:1753-1761. [PMID: 34191015 DOI: 10.1093/icb/icab152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Due to their large size (∼3-5 Gb) and high repetitive content, the study of cephalopod genomes has historically been problematic. However, with the recent sequencing of several cephalopod genomes, including the Hawaiian bobtail squid (Euprymna scolopes), whole-genome studies of these molluscs are now possible. Of particular interest are the sepiolid or bobtail squids, many of which develop photophores in which bioluminescent bacterial symbionts reside. The variable presence of the symbiosis throughout the family allows us to determine regions of the genome that are under selection in symbiotic lineages, potentially providing a mechanism for identifying genes instrumental in the evolution of these mutualistic associations. To this end, we have used high-throughput sequencing to generate sequence from five bobtail squid genomes, four of which maintain symbioses with luminescent bacteria (E. hyllebergi, E. albatrossae, E. scolopes and Rondeletiola minor), and one of which does not (Sepietta neglecta). When we performed K-mer based heterozygosity and genome size estimations, we found that the Euprymna genus has a higher predicted genome size than other bobtail squid (∼ 5 Gb as compared to ∼ 4 Gb) and lower genomic heterozygosity. When we analyzed the repetitive content of the genomes, we found that genomes in the genus Euprymna appear to have recently acquired a significant quantity of LINE elements that are not found in its sister genus Rondeletiola or the closely related Sepietta. Using Abyss-2.0 and then Chromosomer with the published E. scolopes genome as a reference, we generated E. hyllebergi and E. albatrossae genomes of 1.54-1.57 Gb in size, but containing over 78-81% of eukaryotic single-copy othologs. The data we have generated will enable future whole-genome comparisons between these species to determine gene and regulatory content that differs between symbiotic and non-symbiotic lineages, as well as genes associated with symbiosis that are under selection.
Collapse
Affiliation(s)
| | - Michele Nishiguchi
- Department of Molecular and Cell Biology, University of California Merced, Merced, CA, USA
| |
Collapse
|
17
|
Tian R, Geng Y, Yang Y, Seim I, Yang G. Oxidative stress drives divergent evolution of the glutathione peroxidase (GPX) gene family in mammals. Integr Zool 2021; 16:696-711. [PMID: 33417299 DOI: 10.1111/1749-4877.12521] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The molecular basis for adaptations to extreme environments can now be understood by interrogating the ever-increasing number of sequenced genomes. Mammals such as cetaceans, bats, and highland species can protect themselves from oxidative stress, a disruption in the balance of reactive oxygen species, which results in oxidative injury and cell damage. Here, we consider the evolution of the glutathione peroxidase (GPX) family of antioxidant enzymes by interrogating publicly available genome data from 70 mammalian species from all major clades. We identified 8 GPX subclasses ubiquitous to all mammalian groups. Mammalian GPX gene families resolved into the GPX4/7/8 and GPX1/2/3/5/6 groups and are characterized by several instances of gene duplication and loss, indicating a dynamic process of gene birth and death in mammals. Seven of the eight GPX subfamilies (all but GPX7) were under positive selection, with the residues under selection located at or close to active sites or at the dimer interface. We also reveal evidence of a correlation between ecological niches (e.g. high oxidative stress) and the divergent selection and gene copy number of GPX subclasses. Notably, a convergent expansion of GPX1 was observed in several independent lineages of mammals under oxidative stress and may be important for avoiding oxidative damage. Collectively, this study suggests that the GPX gene family has shaped the adaption of mammals to stressful environments.
Collapse
Affiliation(s)
- Ran Tian
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing, Jiangsu, China.,Integrative Biology Laboratory, College of Life Sciences, Nanjing Normal University, Nanjing, Jiangsu, China
| | - Yuepan Geng
- Integrative Biology Laboratory, College of Life Sciences, Nanjing Normal University, Nanjing, Jiangsu, China
| | - Ying Yang
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing, Jiangsu, China
| | - Inge Seim
- Integrative Biology Laboratory, College of Life Sciences, Nanjing Normal University, Nanjing, Jiangsu, China.,School of Biology and Environmental Science, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Guang Yang
- Jiangsu Key Laboratory for Biodiversity and Biotechnology, College of Life Sciences, Nanjing Normal University, Nanjing, Jiangsu, China
| |
Collapse
|