1
|
Matandela AN, Mafuna T, Ndlovu SI. Draft whole genome sequence of Alternaria alternata strain P02PL2 , an endophytic fungal species isolated from Sclerocarya birrea. Microbiol Resour Announc 2025; 14:e0086524. [PMID: 39964187 PMCID: PMC11895449 DOI: 10.1128/mra.00865-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 12/19/2024] [Indexed: 03/12/2025] Open
Abstract
Here, we report the draft whole genome sequence of an endophytic fungal species, Alternaria alternata P02PL2 isolated from Sclerocarya birrea. The genome of this isolate was sequenced using the MGISEQ 2000RS platform. The estimated genome size for A. alternata P02PL2 was 32.0 Mb with a GC content of 50.56%.
Collapse
Affiliation(s)
- Aviwe N. Matandela
- Department of Biotechnology and Food Technology, Faculty of Science, University of Johannesburg, Johannesburg, Gauteng, South Africa
| | - Thendo Mafuna
- Department of Biochemistry, Faculty of Science, Auckland Park Campus, University of Johannesburg, Auckland Park, Gauteng, South Africa
| | - Sizwe I. Ndlovu
- Department of Biotechnology and Food Technology, Faculty of Science, University of Johannesburg, Johannesburg, Gauteng, South Africa
| |
Collapse
|
2
|
Parveen A, Kumar A. Introduction to Integrated Proteogenomic Pipeline for Dealing with Pathogenic Missense SNPs. Methods Mol Biol 2025; 2859:93-107. [PMID: 39436598 DOI: 10.1007/978-1-0716-4152-1_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2024]
Abstract
Proteogenomics is a multi-omics setup combining mass spectrometry and next-generation sequencing (NGS) technologies (using genomics and/or transcriptomics) with main aims of improving genome annotation and facilitating characterization of proteo-isoforms. However, working with proteogenomic approach is a very challenging task as it is generating multi-omics data and integrating these data for interpretation of results for biological or clinical implications. There is an urgent need for the development of protocols for integrated proteogenomics approaches. Genome resequencing yields massive data for missense single-nucleotide polymorphisms (SNP), and SNPs are yet not fully covered for their pathogenic nature using proteogenomic approaches. In this chapter, we present such a protocol for dealing with pathogenic missense SNPs using an integrated proteogenomics pipeline combining several steps: DNA-Seq, RNA-Seq, mass spectroscopy (MS), making customized databases of produced datasets, and screening and filtering for useful MS spectrums. This protocol also provides users with tricks and tips for the modifications, based on the requirements of the projects.
Collapse
Affiliation(s)
- Alisha Parveen
- Manipal Academy of Higher Education (MAHE), Manipal & Institute of Bioinformatics, Bangalore, India
- , Manipal, India
| | - Abhishek Kumar
- Manipal Academy of Higher Education (MAHE), Manipal & Institute of Bioinformatics, Bangalore, India.
- , Manipal, India.
| |
Collapse
|
3
|
Kasianova AM, Penin AA, Schelkunov MI, Kasianov AS, Logacheva MD, Klepikova AV. Trans2express - de novo transcriptome assembly pipeline optimized for gene expression analysis. PLANT METHODS 2024; 20:128. [PMID: 39152473 PMCID: PMC11330051 DOI: 10.1186/s13007-024-01255-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 08/01/2024] [Indexed: 08/19/2024]
Abstract
BACKGROUND As genomes of many eukaryotic species, especially plants, are large and complex, their de novo sequencing and assembly is still a difficult task despite progress in sequencing technologies. An alternative to genome assembly is the assembly of transcriptome, the set of RNA products of the expressed genes. While a bunch of de novo transcriptome assemblers exists, the challenges of transcriptomes (the existence of isoforms, the uneven expression levels across genes) complicates the generation of high-quality assemblies suitable for downstream analyses. RESULTS We developed Trans2express - a web-based tool and a pipeline of de novo hybrid transcriptome assembly and postprocessing based on rnaSPAdes with a set of subsequent filtrations. The pipeline was tested on Arabidopsis thaliana cDNA sequencing data obtained using Illumina and Oxford Nanopore Technologies platforms and three non-model plant species. The comparison of structural characteristics of the transcriptome assembly with reference Arabidopsis genome revealed the high quality of assembled transcriptome with 86.1% of Arabidopsis expressed genes assembled as a single contig. We tested the applicability of the transcriptome assembly for gene expression analysis. For both Arabidopsis and non-model species the results showed high congruence of gene expression levels and sets of differentially expressed genes between analyses based on genome and based on the transcriptome assembly. CONCLUSIONS We present Trans2express - a protocol for de novo hybrid transcriptome assembly aimed at recovering of a single transcript per gene. We expect this protocol to promote the characterization of transcriptomes and gene expression analysis in non-model plants and web-based tool to be of use to a wide range of plant biologists.
Collapse
Affiliation(s)
- Aleksandra M Kasianova
- Institute for Information Transmission, Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Aleksey A Penin
- Institute for Information Transmission, Russian Academy of Sciences, Moscow, Russia
| | - Mikhail I Schelkunov
- Institute for Information Transmission, Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Artem S Kasianov
- Institute for Information Transmission, Russian Academy of Sciences, Moscow, Russia
| | - Maria D Logacheva
- Institute for Information Transmission, Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Moscow, Russia
| | - Anna V Klepikova
- Institute for Information Transmission, Russian Academy of Sciences, Moscow, Russia.
| |
Collapse
|
4
|
Schettini GP, Morozyuk M, Biase FH. Identification of novel cattle (Bos taurus) genes and biological insights of their function in pre-implantation embryo development. BMC Genomics 2024; 25:775. [PMID: 39118001 PMCID: PMC11313146 DOI: 10.1186/s12864-024-10685-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 08/02/2024] [Indexed: 08/10/2024] Open
Abstract
BACKGROUND Appropriate regulation of genes expressed in oocytes and embryos is essential for acquisition of developmental competence in mammals. Here, we hypothesized that several genes expressed in oocytes and pre-implantation embryos remain unknown. Our goal was to reconstruct the transcriptome of oocytes (germinal vesicle and metaphase II) and pre-implantation cattle embryos (blastocysts) using short-read and long-read sequences to identify putative new genes. RESULTS We identified 274,342 transcript sequences and 3,033 of those loci do not match a gene present in official annotations and thus are potential new genes. Notably, 63.67% (1,931/3,033) of potential novel genes exhibited coding potential. Also noteworthy, 97.92% of the putative novel genes overlapped annotation with transposable elements. Comparative analysis of transcript abundance identified that 1,840 novel genes (recently added to the annotation) or potential new genes were differentially expressed between developmental stages (FDR < 0.01). We also determined that 522 novel or potential new genes (448 and 34, respectively) were upregulated at eight-cell embryos compared to oocytes (FDR < 0.01). In eight-cell embryos, 102 novel or putative new genes were co-expressed (|r|> 0.85, P < 1 × 10-8) with several genes annotated with gene ontology biological processes related to pluripotency maintenance and embryo development. CRISPR-Cas9 genome editing confirmed that the disruption of one of the novel genes highly expressed in eight-cell embryos reduced blastocyst development (ENSBTAG00000068261, P = 1.55 × 10-7). CONCLUSIONS Our results revealed several putative new genes that need careful annotation. Many of the putative new genes have dynamic regulation during pre-implantation development and are important components of gene regulatory networks involved in pluripotency and blastocyst formation.
Collapse
Affiliation(s)
- Gustavo P Schettini
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Michael Morozyuk
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Fernando H Biase
- School of Animal Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA.
| |
Collapse
|
5
|
Xie N, Guo Q, Li H, Yuan G, Gui Q, Xiao Y, Liao M, Yang L. Integrated transcriptomic and WGCNA analyses reveal candidate genes regulating mainly flavonoid biosynthesis in Litsea coreana var. sinensis. BMC PLANT BIOLOGY 2024; 24:231. [PMID: 38561656 PMCID: PMC10985888 DOI: 10.1186/s12870-024-04949-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 03/26/2024] [Indexed: 04/04/2024]
Abstract
Litsea coreana Levl. var. sinensis (Allen) Yang et P. H. Huang is a popular ethnic herb and beverage plant known for its high flavonoid content, which has been linked to a variety of pharmacological benefits and crucial health-promoting impacts in humans. The progress in understanding the molecular mechanisms of flavonoid accumulation in this plant has been hindered due to the deficiency of genomic and transcriptomic resources. We utilized a combination of Illumina and Oxford Nanopore Technology (ONT) sequencing to generate a de novo hybrid transcriptome assembly. In total, 126,977 unigenes were characterized, out of which 107,977 were successfully annotated in seven public databases. Within the annotated unigenes, 3,781 were categorized into 58 transcription factor families. Furthermore, we investigated the presence of four valuable flavonoids-quercetin-3-O-β-D-galactoside, quercetin-3-O-β-D-glucoside, kaempferol-3-O-β-D-galactoside, and kaempferol-3-O-β-D-glucoside in 98 samples, using high-performance liquid chromatography. A weighted gene co-expression network analysis identified two co-expression modules, MEpink and MEturquoise, that showed strong positive correlation with flavonoid content. Within these modules, four transcription factor genes (R2R3-MYB, NAC, WD40, and ARF) and four key enzyme-encoding genes (CHI, F3H, PAL, and C4H) emerged as potential hub genes. Among them, the R2R3-MYB (LcsMYB123) as a homologous gene to AtMYB123/TT2, was speculated to play a significant role in flavonol biosynthesis based on phylogenetic analysis. Our findings provided a theoretical foundation for further research into the molecular mechanisms of flavonoid biosynthesis. Additionally, The hybrid transcriptome sequences will serve as a valuable molecular resource for the transcriptional annotation of L. coreana var. sinensis, which will contribute to the improvement of high-flavonoid materials.
Collapse
Affiliation(s)
- Na Xie
- Institute for Forest Resources and Environment of Guizhou, College of Forestry, Guizhou University, Guiyang, 550025, China
| | - Qiqaing Guo
- Institute for Forest Resources and Environment of Guizhou, College of Forestry, Guizhou University, Guiyang, 550025, China.
| | - Huie Li
- College of Agriculture, Guizhou University, Guiyang, 550025, China
| | - Gangyi Yuan
- Institute for Forest Resources and Environment of Guizhou, College of Forestry, Guizhou University, Guiyang, 550025, China
| | - Qin Gui
- Institute for Forest Resources and Environment of Guizhou, College of Forestry, Guizhou University, Guiyang, 550025, China
| | - Yang Xiao
- Institute for Forest Resources and Environment of Guizhou, College of Forestry, Guizhou University, Guiyang, 550025, China
| | - Mengyun Liao
- Institute for Forest Resources and Environment of Guizhou, College of Forestry, Guizhou University, Guiyang, 550025, China
| | - Lan Yang
- Institute for Forest Resources and Environment of Guizhou, College of Forestry, Guizhou University, Guiyang, 550025, China
| |
Collapse
|
6
|
Kang JN, Hur M, Kim CK, Yang SH, Lee SM. Enhancing transcriptome analysis in medicinal plants: multiple unigene sets in Astragalus membranaceus. FRONTIERS IN PLANT SCIENCE 2024; 15:1301526. [PMID: 38384760 PMCID: PMC10879423 DOI: 10.3389/fpls.2024.1301526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 01/22/2024] [Indexed: 02/23/2024]
Abstract
Astragalus membranaceus is a medicinal plant mainly used in East Asia and contains abundant secondary metabolites. Despite the importance of this plant, the available genomic and genetic information is still limited. De novo transcriptome construction is recognized as an essential method for transcriptome research when reference genome information is incomplete. In this study, we constructed three individual transcriptome sets (unigene sets) for detailed analysis of the phenylpropanoid biosynthesis pathway, a major metabolite of A. membranaceus. Set-1 was a circular consensus sequence (CCS) generated using PacBio sequencing (PacBio-seq). Set-2 consisted of hybridized assembled unigenes with Illumina sequencing (Illumina-seq) reads and PacBio CCS using rnaSPAdes. Set-3 unigenes were assembled from Illumina-seq reads using the Trinity software. Construction of multiple unigene sets provides several advantages for transcriptome analysis. First, it provides an appropriate expression filtering threshold for assembly-based unigenes: a threshold transcripts per million (TPM) ≥ 5 removed more than 88% of assembly-based unigenes, which were mostly short and low-expressing unigenes. Second, assembly-based unigenes compensated for the incomplete length of PacBio CCSs: the ends of the 5`/3` untranslated regions of phenylpropanoid-related unigenes derived from set-1 were incomplete, which suggests that PacBio CCSs are unlikely to be full-length transcripts. Third, more isoform unigenes could be obtained from multiple unigene sets; isoform unigenes missing in Set-1 were detected in set-2 and set-3. Finally, gene ontology and Kyoto Encyclopedia of Genes and Genomes analyses showed that phenylpropanoid biosynthesis and carbohydrate metabolism were highly activated in A. membranaceus roots. Various sequencing technologies and assemblers have been developed for de novo transcriptome analysis. However, no technique is perfect for de novo transcriptome analysis, suggesting the need to construct multiple unigene sets. This method enables efficient transcript filtering and detection of longer and more diverse transcripts.
Collapse
Affiliation(s)
- Ji-Nam Kang
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - Mok Hur
- Department of Herbal Crop Resources, National Institute of Horticultural & Herbal Science, Eumseong-gun, Chungcheongbuk-do, Republic of Korea
| | - Chang-Kug Kim
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - So-Hee Yang
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| | - Si-Myung Lee
- Genomics Division, National Institute of Agricultural Sciences, Jeonju-si, Jeollabuk-do, Republic of Korea
| |
Collapse
|
7
|
Chernitsyna SM, Elovskaya IS, Bukin SV, Bukin YS, Pogodaeva TV, Kwon DA, Zemskaya TI. Genomic and morphological characterization of a new Thiothrix species from a sulfide hot spring of the Zmeinaya bay (Northern Baikal, Russia). Antonie Van Leeuwenhoek 2024; 117:23. [PMID: 38217803 DOI: 10.1007/s10482-023-01918-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 12/08/2023] [Indexed: 01/15/2024]
Abstract
A survey for bacteria of the genus Thiothrix indicated that they inhabited the area where the water of the Zmeiny geothermal spring (northern basin of Lake Baikal, Russia) mixed with the lake water. In the coastal zone of the lake oxygen (8.25 g/L) and hydrogen sulfide (up to 1 mg/L) were simultaneously present at sites of massive growth of these particular Thiothrix bacteria. Based on the analysis of the morphological characteristics and sequence of individual genes (16S rRNA, rpoB and tilS), we could not attribute the Thiothrix from Lake Baikal to any of the known species of this genus. To determine metabolic capabilities and phylogenetic position of the Thiothrix sp. from Lake Baikal, we analyzed their whole genome. Like all members of this genus, the bacteria from Lake Baikal were capable of organo-heterotrophic, chemolithoheterotrophic, and chemolithoautotrophic growth and differed from its closest relatives in the spectrum of nitrogen and sulfur cycle genes as well as in the indices of average nucleotide identity (ANI < 75-94%), amino acid identity (AAI < 94%) and in silico DNA-DNA hybridization (dDDH < 17-57%), which were below the boundary of interspecies differences, allowing us to identify them as novel candidate species.
Collapse
Affiliation(s)
| | | | - S V Bukin
- Limnological Institute SB RAS, Irkutsk, Russia
| | - Yu S Bukin
- Limnological Institute SB RAS, Irkutsk, Russia
| | | | - D A Kwon
- Institute of Genome Analysis, Moscow, Russia
| | | |
Collapse
|
8
|
Valciņa O, Pūle D, Ķibilds J, Labecka L, Terentjeva M, Krūmiņa A, Bērziņš A. Evaluation of Genetic Diversity and Virulence Potential of Legionella pneumophila Isolated from Water Supply Systems of Residential Buildings in Latvia. Pathogens 2023; 12:884. [PMID: 37513731 PMCID: PMC10385952 DOI: 10.3390/pathogens12070884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 06/21/2023] [Accepted: 06/27/2023] [Indexed: 07/30/2023] Open
Abstract
Legionella is an opportunistic pathogen with a biphasic life cycle that occasionally infects humans. The aim of the study was to assess the distribution of virulence genes and genetic diversity among L. pneumophila isolated from water supply systems of residential buildings in Latvia. In total, 492 water samples from 200 residential buildings were collected. Identification of Legionella spp. was performed according to ISO 11731, and 58 isolates were subjected to whole-genome sequencing. At least one Legionella-positive sample was found in 112 out of 200 apartment buildings (56.0%). The study revealed extensive sequence-type diversity, where 58 L. pneumophila isolates fell into 36 different sequence types. A total of 420 virulence genes were identified, of which 260 genes were found in all sequenced L. pneumophila isolates. The virulence genes enhC, htpB, omp28, and mip were detected in all isolates, suggesting that adhesion, attachment, and entry into host cells are enabled for all isolates. The relative frequency of virulence genes among L. pneumophila isolates was high. The high prevalence, extensive genetic diversity, and the wide range of virulence genes indicated that the virulence potential of environmental Legionella is high, and proper risk management is of key importance to public health.
Collapse
Affiliation(s)
- Olga Valciņa
- Institute of Food Safety, Animal Health and Environment "BIOR", LV-1076 Riga, Latvia
| | - Daina Pūle
- Institute of Food Safety, Animal Health and Environment "BIOR", LV-1076 Riga, Latvia
- Department of Water Engineering and Technology, Riga Technical University, LV-1048 Riga, Latvia
| | - Juris Ķibilds
- Institute of Food Safety, Animal Health and Environment "BIOR", LV-1076 Riga, Latvia
| | - Linda Labecka
- Institute of Food Safety, Animal Health and Environment "BIOR", LV-1076 Riga, Latvia
| | - Margarita Terentjeva
- Institute of Food and Environmental Hygiene, Faculty of Veterinary Medicine, Latvia University of Life Sciences and Technologies, LV-3004 Jelgava, Latvia
| | - Angelika Krūmiņa
- Department of Infectology, Riga Stradiņš University, LV-1007 Riga, Latvia
| | - Aivars Bērziņš
- Institute of Food Safety, Animal Health and Environment "BIOR", LV-1076 Riga, Latvia
| |
Collapse
|
9
|
Valciņa O, Pūle D, Ķibilds J, Lazdāne A, Trofimova J, Makarova S, Konvisers G, Ķimse L, Krūmiņa A, Bērziņš A. Prevalence and Genetic Diversity of Legionella spp. in Hotel Water-Supply Systems in Latvia. Microorganisms 2023; 11:microorganisms11030596. [PMID: 36985170 PMCID: PMC10055240 DOI: 10.3390/microorganisms11030596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 02/22/2023] [Accepted: 02/23/2023] [Indexed: 03/02/2023] Open
Abstract
Legionella is one of the most important waterborne pathogens that can lead to both outbreaks and sporadic cases. The majority of travel-associated Legionnaires’ disease (TALD) cases are contracted during hotel stays. The aim of this study was to evaluate the prevalence and genetic diversity of Legionella spp. in hotel water supply systems in Latvia. In total, 834 hot water samples were collected from the water systems of 80 hotels in Latvia. At least one Legionella spp. positive sample was detected in 47 out of 80 hotels (58.8%). Overall, 235 out of 834 samples (28.2%) were Legionella spp. positive. The average hot water temperature in Latvian hotels was 49.8 °C. The most predominant L. pneumophila serogroup (SG) was SG3 which was found in 113 (49.8%) positive samples from 27 hotels. For 79 sequenced L. pneumophila isolates, 21 different sequence types (ST) were obtained, including 3 new types—ST2582, ST2579, and ST2580. High Legionella contamination and high genetic diversity were found in the hotel water supply systems in Latvia, which, together with the insufficient hot water temperature, may indicate that the lack of regulation and control measures may promote the proliferation of Legionella.
Collapse
Affiliation(s)
- Olga Valciņa
- Institute of Food Safety, Animal Health and Environment “BIOR”, 1076 Rīga, Latvia
| | - Daina Pūle
- Institute of Food Safety, Animal Health and Environment “BIOR”, 1076 Rīga, Latvia
- Department of Water Engineering and Technology, Riga Technical University, 1048 Rīga, Latvia
| | - Juris Ķibilds
- Institute of Food Safety, Animal Health and Environment “BIOR”, 1076 Rīga, Latvia
| | - Andžela Lazdāne
- Department of Metabolic Genetics Laboratory, Children’s Clinical University Hospital, 1004 Rīga, Latvia
| | - Jūlija Trofimova
- National Reference Laboratory, Riga East University Hospital, 1038 Rīga, Latvia
| | - Svetlana Makarova
- Institute of Food Safety, Animal Health and Environment “BIOR”, 1076 Rīga, Latvia
| | - Genadijs Konvisers
- Institute of Food Safety, Animal Health and Environment “BIOR”, 1076 Rīga, Latvia
| | - Laima Ķimse
- Institute of Food Safety, Animal Health and Environment “BIOR”, 1076 Rīga, Latvia
| | - Angelika Krūmiņa
- Department of Infectology, Riga Stradiņš University, 1007 Rīga, Latvia
| | - Aivars Bērziņš
- Institute of Food Safety, Animal Health and Environment “BIOR”, 1076 Rīga, Latvia
- Correspondence: ; Tel.: +371-6780-8972
| |
Collapse
|
10
|
Farkas C, Recabal A, Mella A, Candia-Herrera D, Olivero MG, Haigh JJ, Tarifeño-Saldivia E, Caprile T. annotate_my_genomes: an easy-to-use pipeline to improve genome annotation and uncover neglected genes by hybrid RNA sequencing. Gigascience 2022; 11:giac099. [PMID: 36472574 PMCID: PMC9724561 DOI: 10.1093/gigascience/giac099] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 07/22/2022] [Accepted: 09/28/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms. RESULTS We developed an easy-to-use genome-guided transcriptome annotation pipeline that uses assembled transcripts from hybrid sequencing data as input and distinguishes between coding and long non-coding RNAs by integration of several bioinformatic approaches, including gene reconciliation with previous annotations in GTF format. We demonstrated the efficiency of this approach by correctly assembling and annotating all exons from the chicken SCO-spondin gene (containing more than 105 exons), including the identification of missing genes in the chicken reference annotations by homology assignments. CONCLUSIONS Our method helps to improve the current transcriptome annotation of the chicken brain. Our pipeline, implemented on Anaconda/Nextflow and Docker is an easy-to-use package that can be applied to a broad range of species, tissues, and research areas helping to improve and reconcile current annotations. The code and datasets are publicly available at https://github.com/cfarkas/annotate_my_genomes.
Collapse
Affiliation(s)
- Carlos Farkas
- Laboratorio de Investigación en Ciencias Biomédicas, Departamento de Ciencias Básicas y Morfología, Facultad de Medicina, Universidad Católica de la Santísima Concepción, Concepción, Chile
| | - Antonia Recabal
- Departamento de Biología Celular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| | - Andy Mella
- Instituto de Ciencias Naturales, Universidad de las Américas, Chile
- Centro Integrativo de Biología y Química Aplicada (CIBQA), Universidad Bernardo O'Higgins, Santiago 8370854, Chile
| | - Daniel Candia-Herrera
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| | - Maryori González Olivero
- Departamento de Biología Celular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| | - Jody Jonathan Haigh
- CancerCare Manitoba Research Institute, Winnipeg, MB, Canada
- Department of Pharmacology and Therapeutics, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | - Estefanía Tarifeño-Saldivia
- Departamento de Bioquímica y Biología Molecular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| | - Teresa Caprile
- Departamento de Biología Celular, Facultad de Ciencias Biológicas, Universidad de Concepción, Chile
| |
Collapse
|
11
|
Structure of a mitochondrial ribosome with fragmented rRNA in complex with membrane-targeting elements. Nat Commun 2022; 13:6132. [PMID: 36253367 PMCID: PMC9576764 DOI: 10.1038/s41467-022-33582-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 09/22/2022] [Indexed: 12/24/2022] Open
Abstract
Mitoribosomes of green algae display a great structural divergence from their tracheophyte relatives, with fragmentation of both rRNA and proteins as a defining feature. Here, we report a 2.9 Å resolution structure of the mitoribosome from the alga Polytomella magna harbouring a reduced rRNA split into 13 fragments. We found that the rRNA contains a non-canonical reduced form of the 5S, as well as a permutation of the LSU domain I. The mt-5S rRNA is stabilised by mL40 that is also found in mitoribosomes lacking the 5S, which suggests an evolutionary pathway. Through comparison to other ribosomes with fragmented rRNAs, we observe that the pattern is shared across large evolutionary distances, and between cellular compartments, indicating an evolutionary convergence and supporting the concept of a primordial fragmented ribosome. On the protein level, eleven peripherally associated HEAT-repeat proteins are involved in the binding of 3' rRNA termini, and the structure features a prominent pseudo-trimer of one of them (mL116). Finally, in the exit tunnel, mL128 constricts the tunnel width of the vestibular area, and mL105, a homolog of a membrane targeting component mediates contacts with an inner membrane bound insertase. Together, the structural analysis provides insight into the evolution of the ribosomal machinery in mitochondria.
Collapse
|
12
|
Shumate A, Wong B, Pertea G, Pertea M. Improved transcriptome assembly using a hybrid of long and short reads with StringTie. PLoS Comput Biol 2022; 18:e1009730. [PMID: 35648784 PMCID: PMC9191730 DOI: 10.1371/journal.pcbi.1009730] [Citation(s) in RCA: 189] [Impact Index Per Article: 63.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 06/13/2022] [Accepted: 05/11/2022] [Indexed: 01/01/2023] Open
Abstract
Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.
Collapse
Affiliation(s)
- Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Brandon Wong
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Applied Math and Statistics, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Geo Pertea
- The Lieber Institute for Brain Development, Baltimore, Maryland, United States of America
| | - Mihaela Pertea
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
- Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, United States of America
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
13
|
Prevalence, virulence determinants, and genetic diversity in Yersinia enterocolitica isolated from slaughtered pigs and pig carcasses. Int J Food Microbiol 2022; 376:109756. [DOI: 10.1016/j.ijfoodmicro.2022.109756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 05/17/2022] [Accepted: 05/25/2022] [Indexed: 11/21/2022]
|
14
|
Terentjeva M, Ķibilds J, Meistere I, Gradovska S, Alksne L, Streikiša M, Ošmjana J, Valciņa O. Virulence Determinants and Genetic Diversity of Yersinia Species Isolated from Retail Meat. Pathogens 2021; 11:37. [PMID: 35055985 PMCID: PMC8778217 DOI: 10.3390/pathogens11010037] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Revised: 12/22/2021] [Accepted: 12/28/2021] [Indexed: 11/16/2022] Open
Abstract
Yersinia enterocolitica is an important foodborne pathogen, and the determination of its virulence factors and genetic diversity within the food chain could help understand the epidemiology of yersiniosis. The aim of the present study was to detect the prevalence, and characterize the virulence determinants and genetic diversity, of Yersinia species isolated from meat. A total of 330 samples of retailed beef (n = 150) and pork (n = 180) in Latvia were investigated with culture and molecular methods. Whole genome sequencing (WGS) was applied for the detection of virulence and genetic diversity. The antimicrobial resistance of pathogenic Y. enterocolitica isolates was detected in accordance with EUCAST. Yersinia species were isolated from 24% (79/330) of meats, and the prevalence of Y. enterocolitica in pork (24%, 44/180) was significantly higher (p < 0.05) than in beef (13%, 19/150). Y. enterocolitica pathogenic bioserovars 2/O:9 and 4/O:3 were isolated from pork samples (3%, 6/180). Only resistance to ampicillin was confirmed in Y. enterocolitica 4/O:3 and 2/O:9 isolates, but not in other antimicrobials. Major virulence determinants, including ail, inv, virF, ystA and myfA, were confirmed with WGS in Y. enterocolitica 2/O:9 and 4/O:3. MLST typing revealed 15 STs (sequence types) of Y. enterocolitica with ST12 and ST18, which were associated with pathogenic bioserovars. For Y. enterocolitica 1A, Y. kristensenii, Y. intermedia and Y. frederiksenii, novel STs were registered (ST680-688). The presence of virulence genes and genetic characteristics of certain Y. enterocolitica STs confirm the common knowledge that pork could be an important source of pathogenic Yersinia.
Collapse
Affiliation(s)
- Margarita Terentjeva
- Institute of Food and Environmental Hygiene, Faculty of Veterinary Medicine, Latvia University of Life Sciences and Technologies, LV-3004 Jelgava, Latvia
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.Ķ.); (I.M.); (S.G.); (L.A.); (M.S.); (J.O.); (O.V.)
| | - Juris Ķibilds
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.Ķ.); (I.M.); (S.G.); (L.A.); (M.S.); (J.O.); (O.V.)
| | - Irēna Meistere
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.Ķ.); (I.M.); (S.G.); (L.A.); (M.S.); (J.O.); (O.V.)
| | - Silva Gradovska
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.Ķ.); (I.M.); (S.G.); (L.A.); (M.S.); (J.O.); (O.V.)
| | - Laura Alksne
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.Ķ.); (I.M.); (S.G.); (L.A.); (M.S.); (J.O.); (O.V.)
| | - Madara Streikiša
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.Ķ.); (I.M.); (S.G.); (L.A.); (M.S.); (J.O.); (O.V.)
| | - Jevgēnija Ošmjana
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.Ķ.); (I.M.); (S.G.); (L.A.); (M.S.); (J.O.); (O.V.)
| | - Olga Valciņa
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.Ķ.); (I.M.); (S.G.); (L.A.); (M.S.); (J.O.); (O.V.)
| |
Collapse
|
15
|
Resolving the microalgal gene landscape at the strain level: A novel hybrid transcriptome of Emiliania huxleyi CCMP3266. Appl Environ Microbiol 2021; 88:e0141821. [PMID: 34757817 DOI: 10.1128/aem.01418-21] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Microalgae are key ecological players with a complex evolutionary history. Genomic diversity, in addition to limited availability of high-quality genomes, challenge studies that aim to elucidate molecular mechanisms underlying microalgal ecophysiology. Here, we present a novel and comprehensive transcriptomic hybrid approach to generate a reference for genetic analyses, and resolve the microalgal gene landscape at the strain level. The approach is demonstrated for a strain of the coccolithophore microalga Emiliania huxleyi, which is a species complex with considerable genome variability. The investigated strain is commonly studied as a model for algal-bacterial interactions, and was therefore sequenced in the presence of bacteria to elicit the expression of interaction-relevant genes. We applied complementary PacBio Iso-Seq full-length cDNA, and poly(A)-independent Illumina total RNA sequencing, which resulted in a de novo assembled, near complete hybrid transcriptome. In particular, hybrid sequencing improved the reconstruction of long transcripts and increased the recovery of full-length transcript isoforms. To use the resulting hybrid transcriptome as a reference for genetic analyses, we demonstrate a method that collapses the transcriptome into a genome-like dataset, termed "synthetic genome" (sGenome). We used the sGenome as a reference to visually confirm the robustness of the CCMP3266 gene assembly, to conduct differential gene expression analysis, and to characterize novel E. huxleyi genes. The newly-identified genes contribute to our understanding of E. huxleyi genome diversification, and are predicted to play a role in microbial interactions. Our transcriptomic toolkit can be implemented in various microalgae to facilitate mechanistic studies on microalgal diversity and ecology. Importance Microalgae are key players in the ecology and biogeochemistry of our oceans. Efforts to implement genomic and transcriptomic tools in laboratory studies involving microalgae suffer from the lack of published genomes. In the case of coccolithophore microalgae, the problem has long been recognized; the model species Emiliania huxleyi is a species complex with genomes composed of a core, and a large variable portion. To study the role of the variable portion in niche adaptation, and specifically in microbial interactions, strain-specific genetic information is required. Here we present a novel transcriptomic hybrid approach, and generated strain-specific genome-like information. We demonstrate our approach on an E. huxleyi strain that is co-cultivated with bacteria. By constructing a "synthetic genome", we generated comprehensive gene annotations that enabled accurate analyses of gene expression patterns. Importantly, we unveiled novel genes in the variable portion of E. huxleyi that play putative roles in microbial interactions.
Collapse
|
16
|
Šteingolde Ž, Meistere I, Avsejenko J, Ķibilds J, Bergšpica I, Streikiša M, Gradovska S, Alksne L, Roussel S, Terentjeva M, Bērziņš A. Characterization and Genetic Diversity of Listeria monocytogenes Isolated from Cattle Abortions in Latvia, 2013-2018. Vet Sci 2021; 8:195. [PMID: 34564589 PMCID: PMC8473131 DOI: 10.3390/vetsci8090195] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 09/06/2021] [Accepted: 09/10/2021] [Indexed: 01/15/2023] Open
Abstract
Listeria monocytogenes can cause disease in humans and in a wide range of animal species, especially in farm ruminants. The aim of the study was to determine the prevalence and genetic diversity of L. monocytogenes related to 1185 cattle abortion cases in Latvia during 2013-2018. The prevalence of L. monocytogenes among cattle abortions was 16.1% (191/1185). The seasonality of L. monocytogenes abortions was observed with significantly higher occurrence (p < 0.01) in spring (March-May). In 61.0% of the cases, the affected cattle were under four years of age. L. monocytogenes abortions were observed during the third (64.6%) and second (33.3%) trimesters of gestation. Overall, 27 different sequence types (ST) were detected, and four of them, ST29 (clonal complex, CC29), ST37 (CC37), ST451 (CC11) and ST7 (CC7), covered more than half of the L. monocytogenes isolates. Key virulence factors like the prfA-dependent virulence cluster and inlA, inlB were observed in all the analyzed isolates, but lntA, inlF, inlJ, vip were associated with individual sequence types. Our results confirmed that L. monocytogenes is the most important causative agent of cattle abortions in Latvia and more than 20 different STs were observed in L. monocytogenes abortions in cattle.
Collapse
Affiliation(s)
- Žanete Šteingolde
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.A.); (J.Ķ.); (I.B.); (M.S.); (S.G.); (L.A.); (A.B.)
- Institute of Food and Environmental Hygiene, Faculty of Veterinary Medicine, Latvia University of Life Sciences and Technologies, LV-3004 Jelgava, Latvia;
| | - Irēna Meistere
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.A.); (J.Ķ.); (I.B.); (M.S.); (S.G.); (L.A.); (A.B.)
| | - Jeļena Avsejenko
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.A.); (J.Ķ.); (I.B.); (M.S.); (S.G.); (L.A.); (A.B.)
| | - Juris Ķibilds
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.A.); (J.Ķ.); (I.B.); (M.S.); (S.G.); (L.A.); (A.B.)
| | - Ieva Bergšpica
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.A.); (J.Ķ.); (I.B.); (M.S.); (S.G.); (L.A.); (A.B.)
| | - Madara Streikiša
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.A.); (J.Ķ.); (I.B.); (M.S.); (S.G.); (L.A.); (A.B.)
| | - Silva Gradovska
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.A.); (J.Ķ.); (I.B.); (M.S.); (S.G.); (L.A.); (A.B.)
| | - Laura Alksne
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.A.); (J.Ķ.); (I.B.); (M.S.); (S.G.); (L.A.); (A.B.)
| | - Sophie Roussel
- Maisons-Alfort Laboratory of Food Safety, University Paris-Est, French Agency for Food, Environmental and Occupational Health (ANSES), F-94701 Maisons-Alfort, France;
| | - Margarita Terentjeva
- Institute of Food and Environmental Hygiene, Faculty of Veterinary Medicine, Latvia University of Life Sciences and Technologies, LV-3004 Jelgava, Latvia;
| | - Aivars Bērziņš
- Institute of Food Safety, Animal Health and Environment BIOR, LV-1076 Riga, Latvia; (J.A.); (J.Ķ.); (I.B.); (M.S.); (S.G.); (L.A.); (A.B.)
- Institute of Food and Environmental Hygiene, Faculty of Veterinary Medicine, Latvia University of Life Sciences and Technologies, LV-3004 Jelgava, Latvia;
| |
Collapse
|
17
|
Karimi MR, Karimi AH, Abolmaali S, Sadeghi M, Schmitz U. Prospects and challenges of cancer systems medicine: from genes to disease networks. Brief Bioinform 2021; 23:6361045. [PMID: 34471925 PMCID: PMC8769701 DOI: 10.1093/bib/bbab343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 08/02/2021] [Accepted: 08/03/2021] [Indexed: 12/20/2022] Open
Abstract
It is becoming evident that holistic perspectives toward cancer are crucial in deciphering the overwhelming complexity of tumors. Single-layer analysis of genome-wide data has greatly contributed to our understanding of cellular systems and their perturbations. However, fundamental gaps in our knowledge persist and hamper the design of effective interventions. It is becoming more apparent than ever, that cancer should not only be viewed as a disease of the genome but as a disease of the cellular system. Integrative multilayer approaches are emerging as vigorous assets in our endeavors to achieve systemic views on cancer biology. Herein, we provide a comprehensive review of the approaches, methods and technologies that can serve to achieve systemic perspectives of cancer. We start with genome-wide single-layer approaches of omics analyses of cellular systems and move on to multilayer integrative approaches in which in-depth descriptions of proteogenomics and network-based data analysis are provided. Proteogenomics is a remarkable example of how the integration of multiple levels of information can reduce our blind spots and increase the accuracy and reliability of our interpretations and network-based data analysis is a major approach for data interpretation and a robust scaffold for data integration and modeling. Overall, this review aims to increase cross-field awareness of the approaches and challenges regarding the omics-based study of cancer and to facilitate the necessary shift toward holistic approaches.
Collapse
Affiliation(s)
| | | | | | - Mehdi Sadeghi
- Department of Cell & Molecular Biology, Semnan University, Semnan, Iran
| | - Ulf Schmitz
- Department of Molecular & Cell Biology, James Cook University, Townsville, QLD 4811, Australia
| |
Collapse
|
18
|
Gatter T, Stadler PF. Ryūtō: Improved multi-sample transcript assembly for differential transcript expression analysis and more. Bioinformatics 2021; 37:4307-4313. [PMID: 34255826 DOI: 10.1093/bioinformatics/btab494] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 06/21/2021] [Accepted: 07/01/2021] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Accurate assembly of RNA-seq is a crucial step in many analytic tasks such as gene annotation or expression studies. Despite ongoing research, progress on traditional single sample assembly has brought no major breakthrough. Multi-sample RNA-Seq experiments provide more information than single sample datasets and thus constitute a promising area of research. Yet, this advantage is challenging to utilize due to the large amount of accumulating errors. RESULTS We present an extension to Ryūtō enabling the reconstruction of consensus transcriptomes from multiple RNA-seq data sets, incorporating consensus calling at low level features. We report stable improvements already at 3 replicates. Ryūtō outperforms competing approaches, providing a better and user-adjustable sensitivity-precision trade-off. Ryūtō's unique ability to utilize a (incomplete) reference for multi sample assemblies greatly increases precision. We demonstrate benefits for differential expression analysis. CONCLUSION Ryūtō consistently improves assembly on replicates of the same tissue independent of filter settings, even when mixing conditions or time series. Consensus voting in Ryūtō is especially effective at high precision assembly, while Ryūtō's conventional mode can reach higher recall. AVAILABILITY Ryūtō is available at https://github.com/studla/RYUTO. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Thomas Gatter
- Bioinformatics Group, Department of Computer Science & Interdisciplinary Center for Bioinformatics, Universität Leipzig, D-04107 Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science & Interdisciplinary Center for Bioinformatics, Universität Leipzig, D-04107 Leipzig, Germany
- Discrete Biomath Group, Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
- Santa Fe Institute, Santa Fe, NM 87501, USA
| |
Collapse
|
19
|
Puglia GD, Prjibelski AD, Vitale D, Bushmanova E, Schmid KJ, Raccuia SA. Hybrid transcriptome sequencing approach improved assembly and gene annotation in Cynara cardunculus (L.). BMC Genomics 2020; 21:317. [PMID: 32819282 PMCID: PMC7441626 DOI: 10.1186/s12864-020-6670-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 03/13/2020] [Indexed: 12/11/2022] Open
Abstract
Background The investigation of transcriptome profiles using short reads in non-model organisms, which lack of well-annotated genomes, is limited by partial gene reconstruction and isoform detection. In contrast, long-reads sequencing techniques revealed their potential to generate complete transcript assemblies even when a reference genome is lacking. Cynara cardunculus var. altilis (DC) (cultivated cardoon) is a perennial hardy crop adapted to dry environments with many industrial and nutraceutical applications due to the richness of secondary metabolites mostly produced in flower heads. The investigation of this species benefited from the recent release of a draft genome, but the transcriptome profile during the capitula formation still remains unexplored. In the present study we show a transcriptome analysis of vegetative and inflorescence organs of cultivated cardoon through a novel hybrid RNA-seq assembly approach utilizing both long and short RNA-seq reads. Results The inclusion of a single Nanopore flow-cell output in a hybrid sequencing approach determined an increase of 15% complete assembled genes and 18% transcript isoforms respect to short reads alone. Among 25,463 assembled unigenes, we identified 578 new genes and updated 13,039 gene models, 11,169 of which were alternatively spliced isoforms. During capitulum development, 3424 genes were differentially expressed and approximately two-thirds were identified as transcription factors including bHLH, MYB, NAC, C2H2 and MADS-box which were highly expressed especially after capitulum opening. We also show the expression dynamics of key genes involved in the production of valuable secondary metabolites of which capitulum is rich such as phenylpropanoids, flavonoids and sesquiterpene lactones. Most of their biosynthetic genes were strongly transcribed in the flower heads with alternative isoforms exhibiting differentially expression levels across the tissues. Conclusions This novel hybrid sequencing approach allowed to improve the transcriptome assembly, to update more than half of annotated genes and to identify many novel genes and different alternatively spliced isoforms. This study provides new insights on the flowering cycle in an Asteraceae plant, a valuable resource for plant biology and breeding in Cynara and an effective method for improving gene annotation.
Collapse
Affiliation(s)
- Giuseppe D Puglia
- Institute for Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Fruwirthstrasse 21, 70599, Stuttgart, Germany. .,Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo (CNR-ISAFOM) U.O.S. Catania, Via Empedocle, 58, 95128, Catania, Italy.
| | - Andrey D Prjibelski
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Domenico Vitale
- Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo (CNR-ISAFOM) U.O.S. Catania, Via Empedocle, 58, 95128, Catania, Italy
| | - Elena Bushmanova
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University, St. Petersburg, Russia
| | - Karl J Schmid
- Institute for Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, Fruwirthstrasse 21, 70599, Stuttgart, Germany.
| | - Salvatore A Raccuia
- Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo (CNR-ISAFOM) U.O.S. Catania, Via Empedocle, 58, 95128, Catania, Italy
| |
Collapse
|